[Clear / Fill memory fast]                          [Assembler][/][MC68000]

A common problem is how to clear or fill a range of memory in a short time.
If there is much memory to clear the following way is very usefull:

;
; fill / clear memory fast
;
; input:
;   memend = where the memory region ends
;
; ouput:
;   none (memory cleared / filled)
;
; destroys:
;   d0-d7/a0-a6
;

        move.l  sp,TempSp       ; 20 cycles
        lea     MemEnd,sp       ;  4 cycles
        moveq   #0,d0           ;  4 cycles
        moveq   #0,d1           ;  4 cycles
        moveq   #0,d2           ;  4 cycles
        moveq   #0,d3           ;  4 cycles
        moveq   #0,d4           ;  4 cycles
        moveq   #0,d5           ;  4 cycles
        moveq   #0,d6           ;  4 cycles
        moveq   #0,d7           ;  4 cycles
        move.l  d0,a0           ;  4 cycles
        move.l  d0,a1           ;  4 cycles
        move.l  d0,a2           ;  4 cycles
        move.l  d0,a3           ;  4 cycles
        move.l  d0,a4           ;  4 cycles
        move.l  d0,a5           ;  4 cycles
        move.l  d0,a6           ;  4 cycles => setup time: 16*4+20 = 84

; after this, one instruction can clear 60 bytes of memory (15*4):

        movem.l d0-d7/a0-a6,-(sp)

The last instruction takes: 8+8*n cycles, here: 8+8*15 = 128 cycles. The
naive move.l d0,-(sp) would use 12 * n cycles, here: 12*15 = 180 cycles.
The presented gem will be a win when more than 84+128*blocks <
4+180*blocks. This happes when block is slightly larger than 1, which means
that this solution is preferred in almost every case.
Do not forget the following row when you are finished copying

        move.l   TempSp,sp      ;  4 cycles - restore stack pointer

It will restore the stack pointer to it's previous state.

There is however one problem with the solution. It is not possible to make
it into a loop without sacrificing a register. It would be possible to use
memory, but it would be faster if the following solution was used (which
does sacrifice a register):

;
; fill / clear memory fast
;
; input:
;   memtomove = number of 4*56-bytes block to fill
;   memend = where the memory region ends
;
; output:
;   none (memory cleared / filled)
;
; destorys:
;   d0-d7/a0-a6
;

        move.l  sp,TempSp       ; 20 cycles - save stack pointer
        lea     MemEnd,sp       ;  4 cycles
        moveq   #0,d0           ;  4 cycles
        moveq   #0,d1           ;  4 cycles
        moveq   #0,d2           ;  4 cycles
        moveq   #0,d3           ;  4 cycles
        moveq   #0,d4           ;  4 cycles
        moveq   #0,d5           ;  4 cycles
        moveq   #0,d6           ;  4 cycles
        move.l  d0,a0           ;  4 cycles
        move.l  d0,a1           ;  4 cycles
        move.l  d0,a2           ;  4 cycles
        move.l  d0,a3           ;  4 cycles
        move.l  d0,a4           ;  4 cycles
        move.l  d0,a5           ;  4 cycles
        move.l  d0,a6           ;  4 cycles
        move.l  #memtomove,d7   ;  4 cycles => setup time: 16*4+20 = 84

.localloop
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes
        movem.l d0-d6/a0-a6,-(sp)       ; move 56 bytes => 224 bytes
        dbf     d7,.localloop

        move.l   TempSp,sp      ;  4 cycles - restore stack pointer

Which should run in 4*(8+8*14)+12 = 492 cycles per 224 bytes, which is
about 2 cycles/byte (not including setup time). Unrolling the loop further
will not gain any noticable speed increase. The last row is not to be
forgotten since it restores the state of the stack pointer.

Making the loop fill it with a constant value is simple, just replace moveq
#0,d0 with something more appropriate, for example move.l #$FF00FF00,d0.
Note: The timings are for a MC68000 CPU and may be incorrect.
                                                  Gem writer: John Eckerdal
                                                   last updated: 1998-03-16
