The concept it's very simple and can be summarized in swapping the order of sse registers, for instance if we have movss xmm1, xmm2; sqrtss xmm1, xmm1 will be changed in
movss xmm3, xmm5; sqrtss xmm3, xmm3 without affecting the mathematic of your routine. Obviously you have to take in account all the mmx/sse instructions ( or at least the one you use inside your code ).
Another interesting part I've found ( only for movss instruction for now ) that can be used to change the bytes some more:
00401000 F30F10C1 MOVSS XMM0,XMM1 00401000 F30F11C8 MOVSS XMM0,XMM1