COMFY---A Comfortable Set of Control Primitives for Machine Language Programming Henry G. Baker, Jr. Laboratory for Computer Science Massachusetts Institute of Technology 545 Technology Square Cambridge, MA 02139 Structured programming advocates the elimination of the GOTO from the set of primitives for a computer language. Yet at the machine language level, almost all machines offer only the conditional or unconditional branch as the basic control primitive. V. Prat has suggested using pseudo-non-deterministic programming (as well as the COMFY nae). We present a new set of machine language control primitives based on Pratt's suggestion which are simpler to implement tan if-then-else and do-while and are more flexible at te same time. The machine language implementation of these primitives requires nothing more complicated than a stack; the size of the stack needed (exclusive of programmer-defined subroutine control points) is O(d), where d is the operator depth of the program. This is much less space and machine complexity than would be required for a true backtracking control structure. The user can trivially simulate if-then-else, do-while, repeat-until, etc., with macros using these primitives if he so desires. This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Office of Naval Research under contract number N00014-75-C-0661. A machine language program using our primitives is built up from individual actions and tests which are combined into sequences, alternatives, and loops. Actions are primitive instructions executed for their side-effects such as "add x to accumulator 3", "move a to b", or "increment z". Tests are primitive instructions executed to produce a boolean value; e.g., "x>y", "a=0", or "overflow?". Tests are distinguished from actions in that they can either succeed or fail while actions can only succeed. The convention is that if the predicate value is true then the predicate succeeds, and otherwise fails. The semantics of success and failure depend entirely upon the semantics of sequences, alternatives, and loops. A sequence formed of the two program fragments A and B, written "A;B", is executed by first executing A and if it succeeds then executing B. If A fails, then B is not executed and the whole sequence fails. Likewise, if A succeeds and B fails, the whole sequence fails. COMFY is like a poor man's non-deterministic programming language, because any side effects produced by a program fragment before failing are not undone in the course of failing. Since ";" is obviously an associative operator, long sequences can be written without parentheses and still be unambiguously interpreted; e.g., "A;B;C;D". A long sequence will succeed only if every fragment succeeds; it will fail on the leftmost failing fragment. We will abbreviate long sequences of the form A;A;...;A (m A's) as A^m, where m is a positive integer. We also define A^0 = succeed for completeness. This notation is well-defined by the associativity of sequences. Alternatives An alternative consists of two program fragments A and B linked by the alternative operator "|" as in "A|B". The semantics of the alternative are dual to those of the sequence. First A is executed. If it succeeds then the whole alternative succeeds immediately without executing B. If A fails, then B is executed and its success or failure becomes the success or failure of the whole alternative expression. Like the sequence operator, the alternative operator is associative, allowing long chains of alternatives to be written unambiguously without parentheses. We notice that for a chain of alternatives to succeed, at least one must succeed ad only the leftmost successful alternative will be executed. Furthermore, the chain will fail if and only if all of the alternatives fail. Loops With only sequences and alternatives, only finite (tree-like) programs can be written. The loop expression allows us to write programs whose execution length is not bounded. A loop consists of a program fragment A followed by the operator "oo", as in A^oo. The semantics of this loop expression are equivalent to "(A;A;...)", where "..." means "and so on, to infinity". In other words, A is executed over and over again until it fails; then the whole loop fails. If A never fails, then the loop expression will continue executing forever; it will never succeed. Inversion Inversion is a monadic operator which takes an expression, executes it, and inverts the meaning of its success and/or failure. Inversion of an expression A is written "~A". Inversion can not be simulated by sequences, alternatives, succeed and fail, in an uninterpreted (pure) COMFY because both sequence and alternative are monotonic operations, composition preserves monotonicity, and inversion is not monotone. Subroutine Calling COMFY allows the definition of subroutines with mnemonic names. The existence of a stack for saving return points is assumed; hence subroutines can be recursive. Like most machine languages, COMFY avoids arguments by avoiding parameters. COMFY's stand is that arguments are the programmer's responsibility. Returning values from functional subroutines is a problem because subroutines can fail or succeed like any primitive instruction. Perhaps a policy has to be made which allows subroutines to return values only if they do not fail. In a machine implementation using only one stack, stack (LIFO) discipline must be maintained by both subroutine call and return as well as by the sequence, alternative, loop, and inversion primitives. Therefore, a return from a subroutine cannot be made from otherthan parenthesis level zero; this is tantamount to running off the end of the code of a sequence which implicitly succeeds or running off the end of an alternative which implicitly fails. (In this context,we point out that COMFY would make a good control structure for APL, which currently has none. The issues COMFY addresses are all orthogonal to those that APL addresses, so the two might make a good marriage. The infix notation and shallow dynamic variable binding of APL make a good interactive language. COMFY would enhance that ease-of-use-interaction feature.) Syntax We will assume from now on that the four operators "|", ";", "~", and "oo" have binding power (precedence) which increases from left to right in the preceding list. With these four constructs and the two trivial tests succeed and fail, we can simulate many other useful constructs. Some of the simulations follow. If-then-else Before trying to simulate the construct "if B then A else C" in COMFY, we must define what it means. The intuitive meaning would be: execute B; if it succeeds, execute A and its success or failure becomes the success or failure of the whole expression; if B fails, then execute C and its success or failure becomes that of the whole expression. These semantics are the same as the meaning of the Algol "if B then A else C" in the case that A and C are boolean-valued expressions. When B, A, and C have no side-effects, this is simply "B;A|C". But this construct cannot be simulated exactly in COMFY! We mean by exact simulation one which would place no restrictions on the expressions B, A, and C. However, in the important special case that B has no side-effects and A cannot fail, "if B then A else C" can be simulated exactly by "B;A|C". This can be made a bit more readable by defining "->" to be equivalent to ";": "B->A|C" (which is exactly BCPL syntax for conditionals). Do-while The semantics for the constuct "while B do E" are not defined for the case where E can fail; therefore we are free to define them as we please. We simulate this expression in COMFY by "~(B;E)^oo" which either loops forever or fails. It can fail either by B failing or E failing. (If the programmer does not wish to terminate the loop upon failure by E, he should use "~(B;(E|succeed))^oo".) Using "->" for ";", the while construct becomes "~(B->E)^oo". Repeat-until The construct "repeat E until B" can be written in our language "~(E;~B)^oo". This works because of our convention that E is executed before B, and any side-effects are not nullified if B fails. Extended if-then-else; LISP's COND The extended if-then-else expression, equivalent to LISP's (COND (B1 E1)(B2 E2)...(Bn En)) can be written as B1;E1|B2;E2|...|Bn;En or more readably B1->E1|B2->E2|...|Bn->En. Case Unfortunately, this structured programming primitive resists attempts to simulate it straight-forwardly. Therefore, it must remain a primitive in order to gain efficiency during execution. Backtracking Although COMFY does not support true non-deterministic programming, backtracking can be done explicitly in it. If, for example, in the progam A|B we would like to undo any side-effects caused by A before failing, we could insert another alternative A^-1, which is charged with undoing A, i.e. (A|A^-1;fail|B). Thus, when A fails, A^-1 is executed to undo A's side-effects and then fails so B can be executed. (A^-1 is only a mnemonic device, this superscript is not a COMFY operator.) If A can fail at more than one point, it may be more convenient to undo many side-effects at once, through saving and then restoring some state information. If "S" is a save sequence, "E" is a program which can fail, "I" is an ignore-saved-information sequence, and "R" is a true restore sequence, then S;E;I|R;fail will have no side-effects if E fails and will have E's side-effects if E succeeds. Programming Examples We give a few examples to show how the notation is used in real programming. The first program is trivial; it prints the integers from 1 to 10. In ALGOL-like notation: x:=1; while x<11 do begin print x; x:=x+1 end; In COMFY notation: x:=1; ~(x<11 -> print x; x:=x+1)^oo The second program is a subroutine tat computes the greatest common divisor of X and Y an leaves the result in Y. GCD: while X>0 do if X>Y then X:=:Y else Y:=Y-X; In COMFY notation: GCD: ~(X>0 -> (X>Y -> X:=:Y | Y:=Y-X))^oo The third program scans a floating point number in FORTRAN format from an input stream. "S'string'" tests if 'string' is next in the input stream. If not, it fails. If so, it succeeds and moves the buffer pointer past 'string'. The program is simply an edited form of the BNF syntax for these numbers where epsilon is succeed and empty is fail! We will not attempt to give an equivalent Algol-like program for this one. NUMBER: INTEGER; (S'.';DIGSEQ);(S'E';INTEGER) INTEGER: (S'+'|S'-'|succeed);DIGSEQ DIGSEQ: ~(S'0'|S'1'|S'2'|S'3'|S'4'|S'5'|S'6'|S'7'|S'8'|S'9')^oo Algebraic properties of sequences, alternatives, loops, and inversions Sequences, alternatives, and inversions have a remarkable similarity to the boolean operators and, or, and not. In fact, for any boolean expression used for program control purposes, such as loop or conditional control, the analogy is exact. Thus, "if A^B then C else D" is simulated by "(A;B);C|D" (the parentheses are used only for emphasis; they are not required here). Again, we can make these simulations much more readable by writing "^" for ";" and "V" for "|"; the above program fragments become "A^B->C|D" and "~(AVB->C)^oo" (assuming "V" is the same as "|" except for syntactic binding power). The looping operator "^oo" reminds one of the "forall x" operator of the first order predicate calculus; it is a "while" looping operator because success keeps it looping. We can define a dual operator "*" such that A* = (A|A|...). This is an "until" looping operator because success stops it from looping. We first list properties which are true, followed by some properties which are not true in general, but are true for interesting special cases. 1. (A;B);C = A;(B;C). Also (A|B)|C = A|(B|C). (Associativity). 2. A;succeed = succeed;A = A = A|fail = fail|A. (Identities). 3. succeed|A = succeed. fail;A=fail. (Nil-potence). 4. A^m;A^n = A^(m+n). (A^m)^n =A^(mn). (Laws of exponents). 5. ~~A = A (Two-valued logic). 6. ~(A|B) = ~A;~B. Also ~(A;B) = ~A|~B. (DeMorgan's Laws). 7. (A^oo)^oo = A^oo. A;A^oo = A^oo. A^oo;A^oo = A^oo. In fact, A^oo;B = A^oo, for any B! (Closure). 8. ~A^oo = (~A)*. Also ~A* = (~A)^oo. (Generalized DeMorgan laws). -1. ";" is neither idempotent nor commutative, in general. -2. "|" is neither idempotent nor commutative, in general. However, a true non-determinism would make "|" both. -3. ";" distributes over "|" only sometimes. "|" almost never distributes over ";". -4. A|succeed is not the same as succeed. Also, A;fail is not the same as fail. Redundancy and Completeness DeMorgan's laws tell us that alternatives can be simulated by sequence and inversion; similarly, sequence can be simulated by alternative and inversion. Therefore, at the machine language level, we need only implement either sequences or alternatives, and can simulate the other. However the sets {sequence, inversion, loop} and {alternative, inversion, loop} are both independent, meaning that neither set can be reduced without reducing the power of the language. Sequences, inversion, and loops are not complete because there are at least two constructs which have non-optimum COMFY simulations. One is the "case" statement or "computed GOTO" which allows a quick selection from among a host of alternatives. The other is the multiple-exit loop---a loop with several exit conditions, each going to a different place. The semantics can be simulated, but only at the cost of repeating the exit tests after loop termination to determine what to do next. For example, a loop with body B and termination conditions C and D might want to goto E and F, respectively, upon termination. This could be simulated by (CVD|~B)*;(C->E|D->F). Actually, in this case, repeating the condition for how control got to that place in the program might be a good idea for readability and ease in understanding a program. Sources Although the immediate source for COMFY was V. Pratt [1], there are some striking similarities between it and a series of syntax-directed compilers, all called META, started by Schorre [2] and continued by Schneider and Johnson [3]. These META compilers were recursive parsers for deterministic context-free languages constructed by converting each BNF syntax equation of a suitable grammar into a recursive subroutine. Non-determinism was avoided by proper choice of the equations and clever factoring (property -3). Left recursive syntax equations caused stack overflow at run time. Schorre's META II had sequences (specified by juxtaposition), alternatives (specified by "/") loops (specified by "$E", meaning ~E^oo in COMFY), and recursion without parameters. Tests and actions were extremely rudimentary; the only test besides succeed (called ".EMPTY") was the comparison of a character string with an input buffer for a match and update the buffer pointer if successful. The only action was ".OUTPUT(...)" which placed information into an output buffer. Schneider and Johnson's META-3 added tests which did not reference or affect an input stream and expanded the class of actions to include any imperative operation which did not affect the succeed/fail flag. Thus, it was the first META which could be called a programming language. The only significant difference between META's and COMFY's semantics seems to be in the treatment of sequences. META's sequences were required to start with a test; a failure at any other than the first position caused an error stop (this difference is due to Pratt). Presumably, this restriction was to keep the semantics of the language consistent with that of a true non-deterministic (backtracking) implementation. However, the addition of unrestricted tests and actions in META-3 made that consistency impossible. The duality of sequence and alternative noticed by Pratt does not appear to have been known by the META designers. Furthermore, the inversion operator which makes this duality explicit was not used by either META or Pratt; it is new. Finally, the current definition of loop in COMFY, which for completeness requires an inversion operator in the language, is also new. References 1. Pratt, V. personal communication. 2. Schorre, D.V. "META II---A Syntax-oriented Compiler Writing Language". Proc. of the 19th ACM Natl. Conf., Aug. 1964. 3. Schneider, F.W. and Johnson, G.D. "META 3---A Syntax-Directed Compiler Writing Compiler to Generate Efficient Code". Proc. of the 19th ACM Natl. Conf., Aug. 1964.