This section explores how a typical bcopy optimization can be implemented with The Trick. The functioning of The Trick is explained in detail (the details are peculiar to my system, the concepts are not). Bcopy (copy a byte-addressed block of memory) is a simple kind of bitblit. Efficient implementations copy the memory word-at-a-time, but if the blocks' alignments differ then the data must be masked and shuffled in the registers. Roughly (ignoring the boundaries):
(define (bcopy from to nbytes) (let* ((diff (mod (- from to) 4)) (diff (trick-n diff 4))) (bcopy1 from to nbytes diff)))(define (bcopy1 from to nbytes diff) (let ((nwords (truncate (/ nbytes 4)))) (define (shuffle src prev) (let* ((bits (* 8 diff)) (mask (- (shift-left 1 (- 32 bits)) 1))) (bitwise-or (shift-left (bitwise-and mask src) bits) prev))) (let loop ((i 0) (prev 0)) (let* ((s (vector-ref from i)) (d (shuffle s p))) (vector-set! to i s) (loop (+ 1 i) (hibits s diff)))))) (define (trick-n x n) (cond ((< 0 n) x) ((= x n) n) (else (trick-n x (- n 1)))))
Cogen applied to this code with binding time pattern (
does the right thing: even though all the parameters are dynamic the
D
D
D
)trick-n
procedure produces four copies of the main loop, each
specialized to a different alignment. The bit-shuffling is static.
If needed one could also specialize to the length (eg when it's
small).
The trick-n
procedure implements the classic `dynamic choice of
finitely static values' PE trick [JoGoSe93]. This implementation
produces a dynamic, linear chain of if-tests. A fancier version could
produce a log-tree or a jump table. If the alignment is guaranteed,
one may avoid the tests completely by specializing bcopy1
.
Consider how a purely dynamic compiler functions like a macro. This works because of The Trick: converting a dynamic value into a finitely static one via case analysis. The finite (aka terminating) static computation is executed by the compiler (analogously to macro execution).
trick-n
in root
is
(code trick-n (x n k) ((const zero 0) (prim z>n ,> zero n) (if z>n (,@(call 'k '(x)))) (prim nn ,identity n) ; prevent n from being lifted (prim n=x ,= nn x) (if n=x (,@(call 'k '(n)))) (const one 1) (prim n-1 ,- n one) (const trick-n trick-n) (jump trick-n x n-1 k)))
Now instead of bcopy
, consider the simpler trick-n-demo
procedure. It retains the essential features: static computation
depending on a dynamic parameter of statically known range. Note that
in the bcopy example max
is not a parameter, thus this example is
somewhat more general.
(define (trick-n-demo actual max s d) (let ((a (trick actual max))) (* d (* s a))))
Even though actual
is dynamic we can eliminate the inner
multiplication provided that it is less than max
(in bcopy, this
corresponds to the offsets for the masking and shuffling in the loop).
In root
this is
(code trick-n-demo (actual max s d k) (,@(make-closure 'demo-cont '(s d k) 'cl) (const trick-n ,trick-n) (jump trick-n actual max cl)))(code demo-cont (cl arg) (,@(unmake-closure '(s d k) 'cl) (prim r3 ,* arg s) (prim r4 ,* r3 d) ,@(call 'k '(r4))))
We can generate versions of demo-cont
specialized to
values for actual
from 0 to 2 and also to the rest of the
static environment (s
--> 7).
(define tndc (cogen trick-n-demo '(dynamic static static dynamic dynamic))) (define fast-demo (eval-root tndc (list top-cont 2 7)))
The code for fast-demo
includes 3 copies of demo-cont
. In
each, the first multiplication (arg
with s
) has been
performed. See how the loop in the compiler tndc
expands in its
output fast-demo
.
fast-demo
also includes a general version of demo-cont
, in
case the actual value were greater than max
. If this is not
needed, then a version of trick-n
that produces an error instead
of returning x
could be used. Even in cases where the test
against zero will never return true, the test cannot be eliminated
because that produces a non-terminating compiler. Though this is a
good example of how the programmer must understand and account for PE
in their code, note that the accounting is limited to the trick-n
procedure, rather than demo-cont
.