Vous êtes sur la page 1sur 17

CS220

April 25, 2007


AT&T syntax MMX
• Most MMX documents are in Intel Syntax
OPERATION DEST, SRC
• We use AT&T Syntax
OPERATION SRC, DEST
• Always remember:
DEST = DEST OPERATION SRC
(Please note the weird subtraction and division operation
direction in FP was a mistake of gcc)
Multiplication
• Except for multiplication, conversion, and
comparison, all other MMX instructions are
straightforward.
• PMADDWD mm/m64, mm
• PMULHW mm/m64, mm

Doubleword->word, keep high part

• PMULLW mm/m64, mm

Doubleword->word, keep low part


Conversion
• PACKSSDW mm/m64, mm
• PACKUSDW mm/m64, mm
doubleword->word

• PACKUSWB mm/m64, mm
word->byte
How to do interleave pack?
• PACKSSDW %mm0, %mm0
• PACKSSDW %mm1, %mm1
• PUNPKLWD %mm1, %mm0
(interleave the low end 16-bit values of the
operands)
• PUNPCKHBW mm/m64, mm

Low parts of original 64 bits are ignored


byte_src+byte_dst=word_dst

• PUNPCKLBW mm/m64/m32, mm

High parts of original 64 bits are ignored


byte_src+byte_dst=word_dst
How to do non-interleaved unpack?
• MOVQ %mm0, %mm2
• PUNPCKLDQ %mm1, %mm0
(replace the two high end words
of mm0 with the two low end
words of mm1 leave the two mm0
low end words of mm0 in place)
• PUNPCKHDQ %mm1, %mm2
(move the two high end words of
mm2 to the two low end words
of mm2; place the two high end
words of mm1 in the two high mm2
end words of mm2)
• PCMPEQW mm/m64, mm

• PCMPGTW mm/m64, mm
Rule of Thumb
• Only Shift instructions can have immediate
number
• Only movd instruction can have 32-bit
register
• Punpckl can have 32-bit memory source
• All other instructions deal with 64-bit
registers or memory. No immediate
number!
Constant numbers
• Generate a zero in mm0:
PXOR %mm0, %mm0 PANDN %mm0, %mm0

• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:
PCMPEQ %mm1, %mm1

• Generate the constant 1 in every packed-byte [or packed-word] (or packed-dword)


field:
PXOR %mm0, %mm0
PCMPEQ %mm1, %mm1
PSUBB %mm1, %mm0 [PSUBW %mm1, %mm0] (PSUBD %mm1, %mm0)

• Generate the signed constant 2n -1 in every packed-word (or packed-dword) field:


PCMPEQ %mm1, %mm1
PSRLW $(16-n), %mm1 (PSRLD $(32-n), %mm1)

• Generate the signed constant -2n in every packed-word (or packed-dword) field:
PCMPEQ %mm1, %mm1
PSLLW $n, %mm1 (PSLLD $n, %mm1)
Examples
• absolute value of a vector of signed words
movq %mm0, %mm1 #make a copy of source data
psraw $15, %mm0 #replicate sign bit
PXOR/XOR a number with all 0s, get itself
pxor %mm0, %mm1 # PXOR/XOR a number with all 1s, get NOT(itself)
psubs %mm0, %mm1 #add 1 to just the negative fields
The data in %mm0 are all 0’s and all 1’s
For positive number, it subtracts 0’s(0)
For negative number, it subtracts 1’s(-1)
Dot Production
#include<stdio.h>
main()
{
int i;
int result;
unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};
unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};

__asm__("pxor %mm7,%mm7");

for(i = 0; i < sizeof(a)/sizeof(short); i += 4){


__asm__("movq %0,%%mm0\n\t"
"movq %1,%%mm1\n\t"
"pmaddwd %%mm1,%%mm0\n\t"
"paddd %%mm0,%%mm7"
:
: "m" (a[i]), "m" (b[i])
);
}
__asm__("movq %%mm7,%%mm0\n\t"
"psrlq $32,%%mm0\n\t"
"paddd %%mm7,%%mm0\n\t"
"movd %%mm0,%0\n\t" movd moves lower 32bits of mm0
"emms"
:"=m" (result)
);
printf("dotproduction: %d\n", result);
}
Weathercaster
• PCMPEQ (packed compare for
equality) is performed on the
weathercaster and blue-screen
images, yielding a bitmask that
traces the outline of the
weathercaster.
• This bitmask image is PANDNed
(packed and not) with the
weathercaster image, yielding the
first intermediate image: now the
weathercaster has no background
behind her.
• The same bitmask image is
PANDed (packed and) with the
weather map image, yielding the
second intermediate image.
• The two intermediate images are
PORed (packed or) together,
resulting in final composite of the
weathercaster over weather map
.section .rodata
Address or Content?
mybytes:
.byte 'a','b','c','d','e','f','g','h'
mystr:
Content in %eax, %ecx and %edx:
.ascii "abcdefghijklmnopqrstuvwxyz"
.text 0x64636261==“abcd”
.globl main
.type main, @function
main:
pushl %ebp Content in %ebx:
movl %esp, %ebp
movl mybytes, %eax
movl $mybytes, %ebx
Address
movl (mybytes), %edx
movl (%ebx), %edx
xorl %ecx, %ecx
movl $mystr, %ebx Content in %mm0-%mm5:
movq (%ebx,%ecx,8),%mm0
leal mystr, %ebx 0x6867666564636261
movq (%ebx,%ecx,8),%mm1
leal (mystr), %ebx
movq (%ebx,%ecx,8),%mm2 H address L address
movq mystr(,%ecx,8),%mm3
movq mystr,%mm4 “abcdefgh”
movq (mystr),%mm5
subl $8, %esp L address H address
movq %mm0, (%esp)
leave 0x61==97==‘a’
ret
.size main, .-main
Misc
• Context Switching
– FP mode to MMX mode: 28 cycles
– MMX mode to FP mode: 53 cycles
FP_code:
…...
……
MMX_code:
…...
EMMS (*mark the FP tag word as empty*)
FP_code 1:
…...
…...
• Also FNSAVE and FRSTR
MMX Instruction Set
Category Mnemonic Different Opcodes Description
Arithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]
PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on words
PMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)
PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XOR
Shift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
register
State Mgmt EMMS 1 Empty MMX state

Vous aimerez peut-être aussi