<a name="method"><h2> Methodology</h2></a>
<p>
This section summarizes the hoped-for contributions and my plan for
achieving them.  Subsections present the open questions, experiments,
preliminary results, and research schedule in greater detail.
<p>
There are a number of questions that I hope to answer.  I divide them
into two categories: what are the analytical properties of the
transformed code and its execution, and how practical are the
transformations.  The former resemble typical compiler performance
evaluation questions, the latter are fuzzy software engineering issues
(the 2nd Futamura projection has a human factor).
<p>
<ul><li> Is the code generated by these compilers good enough?  How does the
   overhead compare with the speedup?  Is the root language too low
   level?  Should it have a type system?
<p>
 <li> Can <em>needed</em> compilers be generated from interpreters?  Is the
   system too brittle?  How do generated compilers compare to
   hand-written ones?  Are binding times easier to use than fancy lisp
   macros?  Does any of this really work at all??
</ul>
<p>
<pre>

</pre>

<p>
My plan to answer these questions is to
<ol><li> implement the system
 <li> perform experiments
 <li> iterate system design and experiments based on the results
 <li> seek answers to the questions.
</ol>
<p>
I will start with the following experiments (listed with keywords)
<dl><dt><a href='/cmu-research/nitrous/loop'>loop language</a><dd> general loops, side effects
 <dt><a href='/cmu-research/nitrous/protocol'>protocol kit</a><dd> KMP searching, regexps, un/parsing, format
 <dt><a href='/cmu-research/nitrous/graphics'>2D graphics</a><dd> memory layout, neighborhoods, bit-fields
 <dt>mini-scheme<dd> letrec, exceptions, modules, varargs, towers
</dl>
but as I iterate the design, I will cull some experiments to
concentrate on the most promising subset.  Preliminary results are
presented in a following section.
<p>
<pre>

</pre>

<p>
As this is a inter-disciplinary thesis roughly between partial
evaluation (PE) and interactive graphics, we can make three categories
of the contributions:
<dl><dt>strictly PE<dd> the lift compiler, CPS-CPS and higher-order values,
   formalization of variable splitting as abstract interpretation,
   understanding the return to self-application.
<p>
 <dt>the combination<dd> loop language, bitmap layout language,
   synchronous dataflow language, perhaps more.
<p>
 <dt>strictly graphics<dd> are high-performance graphics routines worth
   the effort?
</dl>
<p>
<a name="questions"><h3> Questions</h3></a>
<p>
Is the final code good enough?  DCG assembles its somewhat higher
level IR into code locally comparable to gcc's <a href='/cmu-research/nitrous/bib#GCC'>[GCC]</a>, using only
about 350 instructions per instruction produced.  Using a lower level
representation allows more optimizations to be handled by generated
compilers, but the lack of high level constructs complicates cogen
since it can not use these same constructs.  I don't plan on
generating machine code so I will simply examine the generated
abstract code and assess how much optimization will be required to
produce fast code, and how quickly naive code could be produced.  The
root language may have to be extended to make efficient code
generation easier, eg loops may require direct support.
<p>
Does the speedup justify the time spent compiling?  Of course it
depends on the application.  I will measure abstract instruction
counts to compare interpretation to compilation and execution, and
determine the break-even point.  This ignores scheduling, cache, and
memory effects.  I hope to show that my generated compilers can be
significantly faster than typical lisp compilers and still produce
good code.
<p>
Is the root language too low level?  Some language features may not be
easily handled once they are compiled into the IR, since information
is lost by translation into a lower level language (eg short-circuit
con/disjunctions, types, pattern matching, loops, letrec, varargs,
higher-orderness, procedure call/return, etc).  I will find ways to
handle as many of these as possible.  Some of them may require special
features, new hints, or some generalization.  Some may have to be
included directly in the IR.  This is a familiar drawback to an IR: it
can handle many different languages, but none of them perfectly.
<p>
It may turn out to be helpful to impose a type system on the IR.  This
would at least obviate the <code>closure-cons</code> hint, any kind of
side-effect purity hint, and probably clean up some dark corners in
the soundness of the system.
<p>
<pre>

</pre>

<p>
Can the compilers that one wants be generated from interpreters?  In a
trivial sense this is always true, since we can always cook up an
interpreter that arbitrarily transforms its program argument, but it's
not necessarily usefully true: is cogen saving programmer time?  The
experiments provide a context for what `one wants'.
<p>
Successful compiler generation depends on the exact results of complex
analyses.  An innocuous code change may result in a critical value
being lifted, ruining the dynamic code.  This is brittleness.
Typical examples are currying a procedure and inserting a <code>let</code>
binding.  Working with code while maintaining good binding time
division may simply be too difficult.
<p>
How do the generated compilers compare to hand-written ones?
Hand-coded compilers are impossible to beat as there will always be
global invariants undiscovered by automatic means.  But <code>cogen</code>
should be able to produce compilers that are globally not stupid and
are locally tight.  I will hand-write compilers for some of the
languages, and compare them to the generated compilers.
<p>
Will <code>cogen</code> be just as complicated as advanced macro hacking in
Lisp?  Binding times attempt to alleviate <code>`@,'</code> ridden code, but
introduce their own complications.  I will make a cursory comparison
to C's macro languages: <code>cpp</code>, <code>bison</code>, <code>make</code>, and the
occasional <code>gawk</code> script...
<p>
<pre>

</pre>

<p>
These are contradictory goals.  Current techniques do not completely
automate compiler generation (and it's hard to imagine how they
could).  As one approaches full automation, predictability goes down
as more and sophisticated analyses and inferences are required.
Finding good trade-offs between competing goals is the hallmark of
engineering.
<p>
<a name="schedule"><h3> Schedule</h3></a>
<p>
Estimated tasks and weeks, roughly in order:
<p>
<ul><pre>splitting         3
inlining          3
lifting           3
loops             2
meta-writing      4
bta               3
writing i         4
protocols         3
graphics          4
cleaning          2
tower             1
scheme            2
analysis          4
writing ii        8
polishing         4
slack             4
writing iii       4
-------         ---
total            58
</pre></ul>
<p>
resulting in graduation in mid 1996.