Discussion of alternative loading formats in order to facilitate the
production of an interpretive system, and also hopefully as a side
effect to alleviate the problems of very large live register sets
which cause problems to the register allocator.

The register allocator problems are caused by trying to build tuples
of a large number of values. This requires all the values to be live
at the same time. Particularly bad cases of this occur for instance in
the case of a source file containing a large number of top level
declarations, even if they actually only constitute one topdec.

The problems under consideration for an interpretive system concern
obtaining the values from the running of a compilation unit in order
to update the environment. It would be nice to do this entirely in ML,
without any extra special additions for this phase (obviously there
will be some additions to this system, simply to allow the system to
call some code it has compiled for instance). The trick therefore
seems to be for the run of a compilation unit to return a normal ML
value.

There seem to be two possibilities for the type of this value. Firstly
it could be an array. This looks attractive, as it is very close to
what we already do, and imposes little garbage collection overhead.
The way of producing it would be ALLOCATE, (compute value, update)
loop. Unpicking it would be by means of Array.sub instead of lambda
calculus SELECT. However, this method has a number of drawbacks.
Firstly the lambda optimiser does not deal well with mutable values,
in general it can't. This is not a disaster, but there is a second
problem, in that Array.sub requires a pervasive exception Subscript,
which we can't guarantee to be present, as we may not yet have loaded
it. This is an unbreakable chicken and egg situation, unbreakable that
is without inventing more special mechanisms such as for example an
unsafe array access which didn't check the bounds. Since we're trying
to avoid producing more special features, this method should be
shelved for the moment.

The second approach involves no further mechanism, but probably a bit
more code in the creation and use of compilation units. The suggestion
is to use lists instead of tuples to hold the result of the
compilation unit. Creation would be simply by tupling up all the
required values. Use would be by extracting the elements from the
list. In order to make it safe against match failures (although these
would indicate system or bad user errors) any compilation unit wishing
to obtain the values from another compilation unit could declare an
exception which it could raise if it arrived unexpectedly at the end
of the list. This has the advantages that the lambda optimiser can
deal well creating values as required for them to go into the list,
and thus the register allocator will have an easy time, and it
involves no extra mechanisms being created. A slight disadvantage is
that somewhat more code will be required in order to access values a
long way into a compilation unit. But since most compilation units
only produce one value this should not be a problem, particuarly as
once the code has been executed it will then be discarded.

The second method is therefore proposed as a replacement for the
current method. In most cases it will make no discernable difference,
in the case of files with many topdecs it will make a considerable
improvement to their compilation time.

We now consider the other outstanding problems of an interpreted
system, viz maintaining a current environment and calling code
compiled by the compiler from within the compiler. For this to occur,
the compiler must have a toplevel function, eg:-
basis. Thus:-

fun toplevel basis =
  (let
     val (new_cb, code) = compile basis (* Compile what he types *)
     val fixed_up_code = fix_up code (* Generate runnable code *)
     val result = run fixed_up_code (* And run it *)
     val new_lambda_basis = unpick(new_cb, result) (* Decode the
result *)
   in
     augment_cb(basis, add_lambda(new_cb, new_lambda_basis))
   end) handle any_exception => (Print diagnostics; basis)

fun loop basis =
  let
    val basis = toplevel basis
  in
    loop basis
  end

val _ = loop initial_basis (* The toplevel *)

Comments on the above.

The function compile is expected to take the current environment and
compile within it the contents of the line he types. We may want to
try something like NJ's prompt for more input on a parse failure, but
it probably isn't necessary for a first pass. The result of compile is
an incremental of the parse, typecheck and lambda environments, and
some code structure, where the lambda environment refers to positions
of the items it requires in the result of running the code produced.

The code produced is not runnable as such, it requires back pointers
to be fixed up, pointers into the middles of strings to be produced
(for closures of mutually recursive structures), external values to be
resolved etc.

Having produced some runnable code, it is run (using a System hook) to
produce a list of store_values, in one to one correspondence with the
domain of the compilation environment. This list is unpicked in order
to produce a new compilation environment which overrides the existing
one in new_cb. This then augments the existing compilation
environment.

Areas requiring System assistance.
1) Producing the top level closure. This requires a function
store_value list -> store_value tuple (or maybe store_value)
2) Converting the produced code strings to store_values. This requires
a function string -> store_value which is just the identity.
3) Producing code pointers into the middles of strings, for closures
of mutually recursive functions. This requires a function string * int
-> store_value, of which item 2 above is a special case.
4) Calling the code. This requires a function store_value ->
store_value list, taking the top level closure to the result of
running it.

Other compiler modifications.

The load_string primitive is unnecessary for the incremental system,
but some analogue is. A possibility is to add a lambda calculus
constructor EXT : store_value -> LambdaExp. Mir_Cg would translate
this to EXT_REF store_value, or something similar, and eventually the
store_value concerned would pop out of Mach_Cg as a (position,
store_value) pair to be placed in the top level closure. The
modifications to Mir_Cg and Mach_Cg to achieve this would be minimal,
however the effect on the lambda calculus transforming sections would
be major. It would be nice to find a better way of passing these
values through without causing such disruption.
