[Remove shift by one latency]                         [Assembler][/][80486]

On the 486, there is an anomaly in the instruction latencies. Note the
following:

     SHR/SAR/SHL reg,1 (opcodes D0, D1) have latency 3
     SHR/SAR/SHL rem,imm8 (opcodes C0, C1) have latency 2

This is a well know issue. The solutions I have seen proposed as a solution
to getting the faster encoding for the shift-by-1 case are:

     make a macro
     handcode opcode using DB

Here is a method that I think is even easier. This is based on the fact
that all x86 processor since the 186 mask shift counts modulo 32. Note that
the "imm8" can hold shift counts of 0 through 255. So, we can code a shift
count of 33 to get an effective shift count of 1. To make this a little bit
more readable to the casual code reader, who might not realize right away
that we are really shifting by 1, me might do something like this:

        FASTSHIFT   EQU   32

        SHR     reg, 1+FASTSHIFT
        SHL     reg, 1+FASTSHIFT
        SAR     reg, 1+FASTSHIFT

Side note: SHL reg,1 should be replaced by the faster ADD reg,reg in all
cases. However for rightshifts this gem is indeed usefull.
                                                  Gem writer: Norbert Juffa
                                                   last updated: 1998-03-16
