Variable Splitting

Variable splitting is a technique for eliminating unnecessary intermediate data structures in programs. It is also known as retyping [ref] and arity raising [ref], and it's related to unboxing[ref]. Eg

(define plus (lambda (p) (+ (car p) (cdr p))))
(define dumb (lambda (x) (plus (cons (sin x) x))))

could be replaced with

(define plus1 (lambda (p1 p2) (+ p1 p2)))
(define dumb (lambda (x) (plus1 (sin x) x)))

Though this is a general-purpose optimization, like inlining it is critically important to the quality of code produced by PE (see envs).

I plan to implement this in cogen by making the following changes:

The binding times are unaffected.
The static environment and static code in the generated compiler are unaffected.
Augment the compiler's static environment with a shape environment. Variables with a dynamic component have a value in the shape environment.
A value in the shape environment is initialized to the name of the variable that will contain its dynamic value.
If any one of the arguments to a dynamic prim is not atomic, then lift it to pure D by generating cons instructions according to the shape.
The dynamic portion of a cons/car/cdr is now nothing, instead the compiler does the dynamic portion (as we already compute it) of the cons/car/cdr in the shape environment. For car/cdr, if the value in the shape env is atomic, then a dynamic instruction is produced.
When the compiler generates a known procedure call, the shapes of the dynamic arguments are known, so they can be split. If the call is inlined then the environment can simply be renamed.

The root of this problem is in the binding time lattice. At cogen time the length of a list is not known; circular binding times forget exactly this. But once the static values have been received, this information can be recovered.

While the above plan looks rather ad hoc, I believe that it could be much more simply implemented with a self-applicable specializer by using a binding time lattice that captures as static information what we put into the shape environment above. That is, D D = (cons D D), etc.

[what is the state of the art in variable splitting for other systems?]