[Reversing bit order]                                  [Assembler][/][8086]

In some cases it may be usefull to reverse the order of the bits. Consider
01011001 and reversed 10011010. One solution is to use a lookuptable. Andi
Smithers wrote slower but smaller and more elegant solution at the cost of
no table, this version uses 32 bit registers but these can be replaced so
that the gem may run on a 8086:

        xor     eax,eax
        mov     ebx,data2invert ; your input data
        mov     cl,revbits      ; number of bits to reverse
revloop:
        ror     ebx,1
        rcl     eax,1
        dec     cl
        jnz     revloop

This routine is small and general but slow. Vesa Karvonen presents a faster
routine for inverting 32 bit numbers:

        bswap   eax             ; Data in EAX
        mov     edx,eax         ; Trashes EDX
        and     eax,00F0F0F0Fh  ; "AND eax,0F0F0F0Fh" < "AND edx,0F0F0F0Fh"
        xor     edx,eax         ; Smaller than "AND edx,0F0F0F0F0h"!
        rol     eax,8           ; Rotate instead of 2 shifts!
        or      eax,edx
        mov     edx,eax
        and     eax,033333333h
        xor     edx,eax
        rol     eax,4
        or      eax,edx         ; Shift version could use LEAs,
        mov     edx,eax         ; but it may not be faster.
        and     eax,055555555h
        xor     edx,eax
        rol     eax,2
        or      eax,edx
        ror     eax,7           ; Rotate back...

According to my "in memory" 486 info (it has been a while since I last used
one) it should take about 21 clock cycles. It is not the fastest possible,
but it also doesn't use a LUT. The rotates are NP on Pentium so you might
want use shifts on Pentium instead and schedule the instructions
appropriately.
Finally Terje Mathisen gives a LUT solution. The LUT is calculated with
this C-snippet:

  return byterev[a >> 24] + (byterev[(a >> 16) & 255] << 8) +
     (byterev[(a >> 8) & 255] << 16) + (byterev[a & 255] << 24);

The actual bit reverse is done with the following snippet:

;
; input: eax
; output: eax
; destroys: ebx, ecx, edx

        xor     ebx,ebx
        mov     bl,al
        xor     ecx,ecx
        mov     cl,ah
        bswap   eax
        xor     edx,edx
        mov     dl,ah
        and     eax,255
        mov     bl,byterev[ebx]
        mov     bh,byterev[ecx]
        bswap   ebx
        mov     al,byterev[eax]
        mov     ah,byterev[edx]
        or      eax,ebx

                                                 Gem writers: Andi Smithers
                                                              Vesa Karvonen
                                                             Terje Mathisen
                                                   last updated: 1998-03-16
