Generation

This section describes how cogen uses the binding time information to produce the compilers. It's surprisingly easy. First we look at what cogen does with a simple procedure. Then we examine partially static instructions, then metastatic values, and finally the how control flow is handled in general.

Simple

Apply cogen to this procedure with binding time pattern of (S D D):

(code simple
      (x y k)
      ((const c 10)
       (prim s1 + c x)
       (prim s2 + s1 y)
       (jump k s2))
      ...)

The core of the resulting generating extension is a code that executes static instructions, conses up a list of residual instructions, and finally reverses and returns the list.

(code (instrs 1 simple)
       (k#-307 x)
       ((const a#-308 ())
        (const c 10)
        (prim s1 + c x)
        (prim i#-309 (lambda (v) `(const s1 ,v)) s1)
        (prim a#-310 cons i#-309 a#-308)
        (const i#-312 (prim s2 + s1 y))
        (prim a#-311 cons i#-312 a#-310)
        (const i#-313 (jump k s2))
        (prim a#-314 cons i#-313 a#-311)
        (prim rev#-315 reverse a#-314)
        (prim p#-317 car k#-307)
        (jump p#-317 k#-307 rev#-315))
       ...)

The whole compiler is diagrammed below:

The entry point is named (1 simple). Before it generates any code it checks its memo table to see if it has specialized to these static values before. If so, it just returns its previous result; this is how loops are generated.

Next (instrs 1 simple) constructs the residual instruction list, as above.

Finally (cont 1 simple) wraps the instructions in a code structure, and stores the code in the memo table. If assembly into machine code were supported, it would happen here.

Pairs

Simple instructions are either executed in the compiler, or generated unmodified. As described in this section, if an instruction has static and dynamic parts, it must appear in both the compiler and the generated code.

Here are examples of how some partially static instructions specialize. Given these bindings in the metastatic environment.

s --> S
d --> D
p --> (cons S D)
q --> (cons (cons S D) (cons S D))

If the first column appears in an interpreter, the second column is the static output (code executed by the compiler), the third column is the dynamic output (residual code produced by the compiler). Note that id is the identity primop.

source                compiler            residual
(prim r cons s d)     (prim r id s)       (prim r id d)
(prim r car p)        (prim r id p)
(prim r cdr p)                            (prim r id p)
(prim r cons p d)     (prim r id p)       (prim r cons p d)
(prim r car q)        (prim r car q)      (prim r car q)

A cons cell is eliminated at compile time if either branch is purely D. A cons cell is eliminated at run time value if either branch is purely S. Eg if the car of a cons is pure D, then that argument will not exist in the compiler, so the cons is replaced with a simple assignment to the cdr.

Note that we are generating stupid code: this id prim is just a renaming. It would be better to avoid it entirely (see splitting).

Jumps

There are two possibilities when cogen reaches a jump instruction: the destination is known or unknown.

If the binding time of the destination is metastatic, then the destination is known. In this case, cogen calls itself recursively to create an extension. The generated compiler calls this extension (passing the static arguments) and returns to a code (labeled done-with) that finishes building the jump, reverses the instruction list, and returns.

Otherwise the destination is unknown, so all the arguments are lifted to D, and a residual jump is generated.

Constants

If a conditional's predicate is metastatic, then the generated compiler only handles one branch.

To simplify the implementation, cogen tracks constants in the binding times, but the same constants are duplicated in the compiler. Since removing the metastatic values from the compiler is the same as removing the static values from the generated code, this is an artifact of the direct implementation of cogen instead of using self application. Thus could a specializer eliminate the code-pointer from closures of `known' procedures?

[note jump-conversion]

Other

In general, for each code block in the input to the compiler generator, the following output code blocks are created: the entry point (which checks the memo table), the exit code (which memoizes and builds the result), one for each linear list of instructions (which does the static instructions and accumulates a list of dynamic instructions), plus another if it ends in a metastatic jump (to generate a jump to the code just generated), plus two more for each dynamic conditional (to recursively build a tree of instructions).

Lifts are either handled as a simple instruction (trivial lifts), with a recursive call (a higher order lift thus produces an additional code block), or by modifying the source to make a call to a lifter (a procedure returned by the lift compiler (see lifting), probably producing lots more code blocks).