My system is called nitrous [note name]. This section describes its design and how the goals influence it. The design is based on the techniques developed by the off-line (aka self-applicable) Partial Evaluation (PE) community [JoGoSe93]. The following subsections provide detailed technical information.
cogen
Currently Nitrous is implemented with scheme48 [scheme48], an efficient bytecoded Scheme. Scheme is very good for prototyping languages.
The rest of this section serves as a summary of and map to the subsections.
The easiest way it to make cogen composable is to make make cogen
accept the same language as it produces. I call this language root
. To facilitate fast compilers root
is just an abstract
machine language (see root for more).
The compiler generator itself is called cogen
. It takes a
procedure and a binding time pattern as arguments. The binding time
pattern specifies which of the interpreter's arguments are the program
(S
) and which are the data (D
) (see interface for more).
How does cogen
work? It makes two passes over the interpreter.
The first pass factors the program into the generating extension and
the residual code, this pass is called binding time analysis
(see bta for more). There are two main ways of implementing BTA: with type
inference [Henglein91] or via abstract interpretation [Consel93]. The prototype currently is oriented towards the latter,
though an inference system may eventually prove better.
The second pass (usually called `specialization', though `generation'
is more accurate here since this is a direct cogen) recursively
traverses the code
structures of the interpreter and converts each
into a compiler (generating extension). Static instructions are
copied into the compiler as they are. Dynamic instructions appear
as templates: they produce the output from the compiler (see gen for more).
There are three environments (envs) to keep straight: the metastatic env, the static env, and the dynamic env. The metastatic env (also called the BT env, or the analysis env, or cogen's env) exists in the abstract interpretation, it maps all the program's variables to binding times. The static env (also called the compiler env) exists in the compiler, it maps the static variables to values. The dynamic env (also called the residual env) exists in the generated code. Note that if cogen is applied to an interpreter with an environment, then there are four.
Both cogen itself and the compilers it produces are memoizing. Memoization is used to avoid infinite programs by producing loops. Cogen memoizes on binding times. Generated compilers memoize on static values. (see envs) (see gen) (see layers).
Variable splitting is a technique for eliminating unnecessary intermediate data structures in programs. While this is a generally useful optimization, it is a necessity in PE if environments are to be treated efficiently (see splitting for more).
Inlining is another generally useful optimization that is required to get decent results from PE.
It is important that variable splitting and inlining not just be handled as general post-pass optimizations; that would be too slow. Instead specialized versions of these optimizations will be integrated into the generated compilers.
Side-effects are handled simply by forcing them all to runtime, thus compilers must be purely functional. This is an important deficiency. It appears likely that analysis can overcome this [Heintze94].
In summary, the design objectives and how they are satisfied are: