[Replace MOVSX/MOVZX]                               [Assembler][/][Pentium]

On the Pentium the byte extension:

        movzx   eax,[addr]

is slow. The recommended way is:

        xor     eax,eax
        mov     al,[addr]

this performs good on a PPro too. The word extension:

        movzx   eax,[addr]

can be replaced with the faster (only on Pentium) sequence:

        xor     eax,eax
        mov     ax,[addr]

or:

        xor     eax,eax
        mov     al,[addr]
        mov     ah,[addr+1]

in which case you better have enough instructions to pair with the
operation so that you can hide all the inherent stalls. Another way:

        mov     eax,[addr]
        and     eax,0ffffh

This is a win if [addr] is dword aligned, and can be OK even without
alignment, but make sure that the two extra bytes loaded won't cause an
exception. Other fast sign/zero extensions:

        movsx   eax,[addr]

can be replaced with the faster sequence:

        mov     eax,[addr-2]
        sar     eax,16

if addr - 2 is divisible by 4.
For hand-written code I would suggest you bias the input values instead, so
you can do:

        xor     eax,eax
        mov     al,[bytevar]    ; Biased by 128, so it is positive
        sub     eax,128

Some of these variants can actually be used on the 16 bit machines too -
they can however only extend 8 to 16 bits.
                                                Gem writers: Terje Mathisen
                                                              Vesa Karvonen
                                                   last updated: 1998-06-06
