High Level Language Programs Run Ten Times Faster in Microstore

Henry G. Baker, Jr. and Clinton Parker

November, 1980

University of Rochester Computer Science Department and Synapse
Computer Services

Trying to wring the most performance out of mature architectures,
computer manufacturers such as Digital Equipment and Data General have
added writeable control stores as options to their microprogrammed
computers.  The DG Eclipse and the DEC LSI-11 are examples.

The writeable control store option promises the ability to run custom
microcode to get factors of 5-10 in performance over assembly language
programs.  However, this promise has not been realized because of the
very high cost of user microprogramming.

These high costs are due to many factors: the lack of good
microprogram assemblers, the complexity of microcode instructions, the
lack of microprogram debugging facilities, the lack of good
documentation, the lack of knowledgeable people, etc.  For example, a
microprogram bug may cause the machine to "go off the deep end" in
such a way that requires re-initializing the machine and possibly
losing the contents of main memory.  As a result, microprogramming a
task may require five to ten times the time and cost of programming
the task in assembly code.

To rectify this situation, we have developed a compiler which
translates programs in a higher level language directly into
executable microcode for a small minicomputer.  This compiler has many
benefits: the user need only be familiar with the higher level
language, not the microcode, in order to take advantage of its
existence; microcode developed in this way is considerably easier to
write and maintain than if it were written in microcode assembler; the
algorithm may be written and debugged using a standard high level
language compiler and recompiled into microcode to gain its speed
advantage; making improvements in the microcode compiler's code
generator means that existing programs can benefit by simply
recompiling them.

The Target Machine--a microprogrammable minicomputer

The machine we chose for our target was designed in 1970 and has 1Kx32
bits of control ROM and 1Kx32 bits of control RAM.  In addition, it
has a main memory of 64Kx16 and a pipelined microprocessor.  The
control ROM contains the emulator microcode while the control RAM
executes user microprograms.

Although there is a very good microprogram assembler for this machine
and some amount of user microprogramming has been done, e.g., floating
point and signal processing applications, the control RAM on this
machine has not lived up to its expectations.  (These RAM chips
account for a significant fraction of the machine's cost.)  Most of
this inattention is probably due to the intimidating complexity of its
arcane microcode.

Most programming on this minicomputer is done in the BCPL high level
language (a predecessor of Bell Labs' popular C language), because of
its vast superiority in productivity over assembly language.  However,
certain tasks such as image enhancement, scan line conversion, feature
extraction, real time animation, and alpha-beta game search require
the utmost speed from a computer.  For these tasks, recoding in
assembler would be expensive and would yield only a factor of 2-4 in
speed.  Recoding in microcode could gain as much as a factor of 20,
but the programming cost would be exorbitant.

For these reasons, we developed the Micro-SPL compiler.  We originally
intended the microcode compiler to translate from the BCPL language to
microcode, giving complete compatibility with the standard BCPL
compiler and system.  This approach would have had the advantage of
allowing algorithms to be debugged using the standard BCPL compiler
and symbolic debugging facilities, and then recompiled with no source
changes using the microcode compiler.

However, due to the existence of a parser for the SPL language (a
derivative of Algol-60) and the non-existence of a parser for BCPL, we
built a Micro-SPL compiler.

The Micro-SPL compiler takes subroutines and functions written in SPL
and compiles them directly into microcode that is run-time compatible
with BCPL.  This means that the programmer calls these subroutines as
if they were BCPL subroutines, and the microcode takes care of the
BCPL linkage conventions.  Thus, the programmer need not make any
provision for loading the control RAM or for linking to the microcode
because the compiler generates dummy BCPL subroutines which load the
RAM and install linkages automatically.

Micro-SPL subroutines run an average of 10 times faster than their
BCPL counterparts, and the "heapsort" routine shown in the figure runs
12 times faster.  This speed increase can be attributed to the lack of
instruction fetches, the use of scratchpad registers for local
variables and temporaries, and the overlapping of certain operations.

Although a carefully hand-coded "heapsort" routine runs almost twice
as fast as the Micro-SPL version, it took much longer to write and
used certain properties intrinsic to the heapsort algorithm to gain
its speed.  In general, one can expect only a 25% or so improvement
over Micro-SPL from careful hand-coding since for many sequences the
Micro-SPL compiler generates optimum code.  In addition, we made a
conscious decision to generate small rather than fast code because of
the limited amount of RAM (1K instructions) available.

The Micro-SPL compiler is not a research toy, but has actually been
used to produce production programs.  One early use was the
recompilation of the storage allocation routines for an in-core text
editor, making the editing functions effectively instantaneous.  (The
editor is now in everyday use).  Another use was the recompilation of
the evaluation function for an Othello game-playing program, yielding
an almost unbeatable program.  Each of these programs was recoded and
recompiled in less than one hour.  More mundane (and more valuable)
use is expected now that the system is debugged and documented.

The Micro-SPL compiler was developed and documented by us in 6 months,
making use of a previously existing parser for SPL.  Both of us were
experts in the microprogramming of this machine, so that writing the
compiler was simply a matter of embedding our expertise into the
program.

We expect that the techniques used in Micro-SPL could be easily
transported for making a Micro-FORTRAN or a Micro-PASCAL for virtually
any micro-coded machine.  The key decisions are what subset of the
language to support and what level of optimization is acceptable.

We think that this compiler re-emphasizes what has always been true;
that a computer is just a pile of copper and sand without proper
software to support it.

ENTRY INT FUNC Delete (REF INT length, INT ARRAY heap)
  INT new, lengthdiv2, root, son, cur, next, last
  IF length=0 THEN RETURN(0) FI
  last := heap(length)
  new := heap(1)
  length := length-1
  lengthdiv2 := length RSH 1
  root := 1
  son := 2
  WHILE root <= lengthdiv2 DO
    cur := heap(son)
    next := heap(son+1)
    IF next < cur THEN
      cur := next
      son := son+1
    FI
    IF cur <= last THEN
      heap(root) := cur
      root := son
      son := root LSH 1
    ELSE
      EXIT
    FI
  OD
  heap(root) := last
  heap(length) := #77777  (* #77777 is plus infinity. *)
RETURN(new)

ENTRY PROC Insert(REF INT length, INT ARRAY heap, INT new)
  INT i, idiv2, cur
  i := length+1
  length := i
  idiv2 := i RSH 1
  cur := heap(idiv2)
  WHILE new < cur DO
    heap(i) := cur
    i := idiv2
    idiv2 := i RSH 1
    cur := heap(idiv2)
  OD
  heap(i) := new
RETURN

Figure 1.  Heapsort Subroutines.  (See "The Design and Analysis of
Computer Algorithms", by Aho, Hopcroft, and Ullman, Addison-Wesley,
1974.)

References

Baker, H., and Parker, Clinton.  "Micro SPL".  Tech. Report 62,
Computer Science Dept., Univ. of Rochester, Rochester, NY, Feb., 1980.