![]() | ![]() ![]() ![]() |
In most cases, the 486 is free from flow-dependence penalties which mean that an instruction which uses the result of the previous instruction will not cause a slowdown: add eax,ebx
add ecx,eaxtakes two cycles. On a Pentium, however, it takes two cycles too, but the add eax,ebx
add ecx,edxtakes one cycle because the second instruction does not use the result of the first so they can be 'pair'-ed. These situations are quite well described in the application note "Intel Architecture Optimization Manual" for released by Intel. I just want to point to one interesting thing. Generally the 486 has two types of flow-dependence penalties:
(E)AX, (E)BX, (E)CX, (E)DX after AL, BH etc. has been changed).
LEA is an addressing instruction). For example, how many cycles does the following code sequence eat (in protected mode, assuming 100% cache hit): add ecx,ebp
adc bl,dl
mov al,[ebx]On the 486 the ADD is one, the ADC is another one, but the MOV takes three cycles even if the operand is already in the cache. Why? There is a double penalty: One clock for using a register after it was modified (Address Generation Interlock - AGI),; another cycle for using a register after its subregister was modified (Flow Break). So this innocent MOV instruction costs three cycles. I'm a smart coder, I'm gonna put an instruction between the ADC and the MOV, and the problem is solved! Really? The add ecx,ebp
adc bl,dl
sub esi,ebp
mov al,[ebx]sequence takes 5 clocks: the ADD, ADC and SUB take three but the MOV takes two because ONE cycle inserted BETWEEN the ADC and the MOV can save only ONE penalty, not TWO. So for a perfect on clock per one instruction ratio at least TWO instructions have to be inserted. Or, one two-cycle instruction like SHR or even a prefixed like ADD AX,BX in 32-bit code.