Académique Documents
Professionnel Documents
Culture Documents
• PMULLW mm/m64, mm
• PACKUSWB mm/m64, mm
word->byte
How to do interleave pack?
• PACKSSDW %mm0, %mm0
• PACKSSDW %mm1, %mm1
• PUNPKLWD %mm1, %mm0
(interleave the low end 16-bit values of the
operands)
• PUNPCKHBW mm/m64, mm
• PUNPCKLBW mm/m64/m32, mm
• PCMPGTW mm/m64, mm
Rule of Thumb
• Only Shift instructions can have immediate
number
• Only movd instruction can have 32-bit
register
• Punpckl can have 32-bit memory source
• All other instructions deal with 64-bit
registers or memory. No immediate
number!
Constant numbers
• Generate a zero in mm0:
PXOR %mm0, %mm0 PANDN %mm0, %mm0
• Generate all 1's in register mm1, which is -1 in each of the packed data type fields:
PCMPEQ %mm1, %mm1
• Generate the signed constant -2n in every packed-word (or packed-dword) field:
PCMPEQ %mm1, %mm1
PSLLW $n, %mm1 (PSLLD $n, %mm1)
Examples
• absolute value of a vector of signed words
movq %mm0, %mm1 #make a copy of source data
psraw $15, %mm0 #replicate sign bit
PXOR/XOR a number with all 0s, get itself
pxor %mm0, %mm1 # PXOR/XOR a number with all 1s, get NOT(itself)
psubs %mm0, %mm1 #add 1 to just the negative fields
The data in %mm0 are all 0’s and all 1’s
For positive number, it subtracts 0’s(0)
For negative number, it subtracts 1’s(-1)
Dot Production
#include<stdio.h>
main()
{
int i;
int result;
unsigned short a[] = {1, 2, 3, 4, 5, 6, 7, 8};
unsigned short b[] = {2, 4, 6, 8, 10, 12, 14, 16};
__asm__("pxor %mm7,%mm7");