Without circular binding times cogen would often try to generate an
infinite compiler. This section uses lexical environments to
demonstrate how widening and memoization on binding times are used to
replace this kind of non-termination with a loop in the compiler. It
serves as a running example of how and why cogen
works the way it
does.
Consider generating a compiler for alist-eval
, an interpreter that
uses association lists (a list of pairs of names and values) to
represent its environments. This interpreter's apply
procedure
calls zip
(see zip) to build an alist, causing cogen to process
the following non-terminating sequence:
; (code alist-eval (exp env k) ...) ; (code zip (vars vals r k) ...)(zip
S
D
(const nil)D
) (zipS
D
(cons (consS
D
bind) (const nil) spine)D
) (zipS
D
(cons (consS
D
bind) (cons (consS
D
bind) (const nil) spine) spine)D
) ...
Without grammars, cogen would be forced to lift the third argument to
D
, but how could a compiler work without the names of local
variables? The solution is to build circular binding times by
collapsing duplicate nodes. These four diagrams show how the join
nodes (labeled `or') are successively reduced, resulting in a binding
time in normal form.
But how does the loop in this binding time become a loop in the
compiler? The answer: cogen memoizes on binding time patterns. Each
code
structure contains a memo table mapping binding time patterns
to generating extensions. So the above sequence becomes:
(zipS
D
(const nil)D
) (zipS
D
(cons (consS
D
bind) (const nil) spine)D
) (zipS
D
(cons (consS
D
bind) self spine)D
) (zipS
D
(cons (consS
D
bind) self spine)D
)
In the last line, cogen finds the needed extension in the memo table
of zip
. The extension isn't complete yet, but it exists in the
table; a side-effect is used on completion [note self-app-mutation].
Thus the compiler calls itself.
Note that the loop was only found after two iterations were `unrolled'. This is a good example of excessive specialization (though there are cases where splitting off the first iteration of a loop is just what you want). Since the prototype is just a memoized forward-flow abstract interpretation, this is unavoidable. Doing better requires reverse information flow, eg a two pass BTA/generation system.
Now, think how this BT affects the compiler produced by cogen for alist-eval
. In the compiler the environment is a list of variable
names. That means the residual part of an environment is a list of
variable values. Thus the code produced by such a compiler would
contain only procedures called by passing fixed length lists. Each of
these procedures is a residual version of alist-eval
, that list
argument is the env
argument. A variable reference implemented
with a call to assoc
and a statically known variable would compile
to a fixed (at compile time) sequence of cdr
s followed by a car
. This is exactly the inefficiency variable splitting is designed
to alleviate (see splitting).
Note how this interacts with higher-order calls to unknown procedures, and handling of Scheme's varargs in general.