Eg Environments

Without circular binding times cogen would often try to generate an infinite compiler. This section uses lexical environments to demonstrate how widening and memoization on binding times are used to replace this kind of non-termination with a loop in the compiler. It serves as a running example of how and why cogen works the way it does.

Consider generating a compiler for alist-eval, an interpreter that uses association lists (a list of pairs of names and values) to represent its environments. This interpreter's apply procedure calls zip (see zip) to build an alist, causing cogen to process the following non-terminating sequence:

; (code alist-eval (exp env k) ...)
; (code zip (vars vals r k) ...)

(zip S D (const nil) D)
(zip S D (cons (cons S D bind)
		   (const nil) spine) D)
(zip S D (cons (cons S D bind)
		   (cons (cons S D bind)
			 (const nil) spine) spine) D)
...

Without grammars, cogen would be forced to lift the third argument to D, but how could a compiler work without the names of local variables? The solution is to build circular binding times by collapsing duplicate nodes. These four diagrams show how the join nodes (labeled `or') are successively reduced, resulting in a binding time in normal form.

ps ps ps ps.

But how does the loop in this binding time become a loop in the compiler? The answer: cogen memoizes on binding time patterns. Each code structure contains a memo table mapping binding time patterns to generating extensions. So the above sequence becomes:

(zip S D (const nil) D)
(zip S D (cons (cons S D bind) (const nil) spine) D)
(zip S D (cons (cons S D bind) self spine) D)
(zip S D (cons (cons S D bind) self spine) D)

In the last line, cogen finds the needed extension in the memo table of zip. The extension isn't complete yet, but it exists in the table; a side-effect is used on completion [note self-app-mutation]. Thus the compiler calls itself.

Note that the loop was only found after two iterations were `unrolled'. This is a good example of excessive specialization (though there are cases where splitting off the first iteration of a loop is just what you want). Since the prototype is just a memoized forward-flow abstract interpretation, this is unavoidable. Doing better requires reverse information flow, eg a two pass BTA/generation system.

Now, think how this BT affects the compiler produced by cogen for alist-eval. In the compiler the environment is a list of variable names. That means the residual part of an environment is a list of variable values. Thus the code produced by such a compiler would contain only procedures called by passing fixed length lists. Each of these procedures is a residual version of alist-eval, that list argument is the env argument. A variable reference implemented with a call to assoc and a statically known variable would compile to a fixed (at compile time) sequence of cdrs followed by a car. This is exactly the inefficiency variable splitting is designed to alleviate (see splitting).

Note how this interacts with higher-order calls to unknown procedures, and handling of Scheme's varargs in general.