A Tachy[1] 'TAK'

Henry G. Baker
Nimble Computer Corporation, 16231 Meadow Ridge Way, Encino, CA 91436
(818) 986-1436 (818) 986-1360 (FAX)
Copyright (c) 1992 by Nimble Computer Corporation

We show how to speed up the Tak Benchmark by an order of magnitude--5X faster than the Cray-1--on a Common Lisp system (40MHz 80860-based OKIstation) using memoizing. The list-based Takl Benchmark improves even more--30X faster than the Cray-1. Given the speed attainable through memoizing, the possibility of further speedups using parallelism seems unlikely.

A. INTRODUCTION

The Tak benchmark is John McCarthy's mis-remembered version of the Takeuchi function [Gabriel85]. The Tak benchmark is one of the most commonly used benchmarks because its reliance on only recursive function-calling and integer arithmetic allows it to be used early in hardware debugging, and because it is short enough to memorize and type surreptitiously into a competitor's computer at a trade show. While some benchmarks have been criticized for running "entirely within the cache", the Tak benchmark typically runs "entirely within the register set" of a RISC architecture, and therefore deserves a double dose of the same criticism. It is generally assumed that because of its ubiquity that Tak cannot be speeded up by non-intelligent means; we show that this assumption is erroneous.

We show that Tak can be speeded up by the technique of "memoization" [Bird80] [Keller86], which requires only that the function be "functional"--i.e., contain no side-effects. Since the lack of side-effects can often be statically assured at compile time by simple syntactic tests, a compiler could decide to utilize memoization for Tak as one of its standard optimizations.

B. STANDARD 'TAK'

According to [Gabriel85], the Tak benchmark contains 63,609 recursive calls to tak, as well as 47,706 decrement operations, when performed on the arguments (18 12 6) to produce the answer 7. None of the arguments to tak ever becomes negative, nor does any ever exceed 18. The first arm of the conditional is executed 75% of the time.
(defun tak (x y z)
  (if (not (< y x)) z
    (tak (tak (1- x) y z)
         (tak (1- y) z x)
         (tak (1- z) x y))))

C. MEMOIZING 'TAK'

A simple measurement shows that tak is called with only 281 distinct combinations of arguments, so memoization can work splendidly. However, in order to memoize, we must construct a single "key" from the triple of integers passed to tak as arguments. The Lispiest way to do this is to construct a Lisp list of the 3 arguments, and then use this as a key to a Common Lisp equal hash table, as in the following code:
(defparameter *memo-table* (make-hash-table :test #'equal)
  "Those who don't remember the past are condemned to recompute it"[2])

(defun make-key (x y z) `(,x ,y ,z))

(defun tak (x y z)
  (let ((key (make-key x y z)))		; [3]
    (or (gethash key *memo-table*)
        (setf (gethash key *memo-table*)
              (if (not (< y x)) z
                (tak (tak (1- x) y z)
                     (tak (1- y) z x)
                     (tak (1- z) x y)))))))
This implementation works, and can already out-perform many standard tak implementations. It can be speeded up by the straight-forward technique of "hash consing" [Ershov58] [Goto74] [Deutsch73], which allows the equal hash table to be replaced by an eq hash table. But the fastest implementation utilizes the fact that the argument integers are bounded, and we can therefore pack them into a single fixnum:
(defparameter *memo-table* (make-hash-table :test #'eq))

(defun make-key (x y z) (+ (ash x 16) (ash y 8) z))

D. STANDARD 'TAKL'

The Gabriel Takl benchmark is obtained from the Tak benchmark by replacing integer counters with list counters; i.e., lists of length n are used to represent the integer n. Intuitively, one would presume that Takl would run a small factor slower than Tak, since list counters would appear to be only a small factor slower than fixnum counters (assuming that the lists are in the cache). However, it is much more difficult to implement the < predicate on lists than on fixnums; therefore, shorterp takes time proportional to the smaller of its arguments instead of taking only a small constant amount of time. On the standard benchmark versions, we find non-memoized Takl to be about 5.7X slower than non-memoized Tak.

E. MEMOIZING 'TAKL'

Memoizing Takl is slightly more difficult than memoizing Tak, because we cannot utilize packed integers as the keys to our memo table, but must construct unique keys using hash consing. However, our table still consists of only 281 active entries, so it will likely remain entirely within the cache.

In Takl, we actually have a choice about whether to memoize mas, shorterp or both. While memoizing shorterp should dramatically shorten its time, we would still execute the entire 63,609 number of calls to mas. If we memoize mas, then we are left with very few calls to shorterp, in which case its timing won't matter very much. Thus, it is only necessary to memoize mas to get most of the benefits of memoization.

F. RESULTS

The memoization optimization improves Tak by about an order of magnitude. We achieve a Tak time of 0.008 seconds on the 40Mhz 80860-based OKIstation(TM), which time is 5 X faster than the Cray-1 on the old benchmark.[4] By utilizing memoization with hash consing on the Takl benchmark, we achieve a Takl time of 0.01 seconds, which is 30 X faster than the Cray-1 on the old benchmark. Interestingly enough, Takl is only 25% slower than Tak when both are memoized; these numbers indicate that the memo table lookup dominates both computations.

G. REFERENCES

Anderson, J.Wayne, et al. "Implementing and Optimizing Lisp for the Cray". IEEE Software (July 1987),74-83.

Bird, R.S. "Tabulation Techniques for Recursive Programs". ACM Comp. Surv. 12,4 (Dec. 1980),403-417.

Deutsch, L. Peter. "An Interactive Program Verifier". Xerox PARC TR CSL-73-1, 1973.

[Ershov58] Ershov, A.P. "On Programming of Arithmetic Operations". Doklady, AN USSR 118,3 (1958),427-430, transl. Friedman, M.D., CACM 1,8 (Aug. 1958),3-6.

Gabriel, R.P. Performance and Evaluation of Lisp Systems. MIT Press, Camb., MA, 1985.

Goto, Eiichi. "Monocopy and Associative Algorithms in Extended Lisp". TR. 74-03, U. Tokyo, 1974.

Keller, R.M., and Sleep, M.R. "Applicative Caching". ACM TOPLAS 8,1 (Jan. 1986),88-108.

[Steele90] Steele, Guy L. Common Lisp, The Language; 2nd Ed. Digital Press, Bedford, MA, 1990,1029p.

[1] Look it up in your Funk&Wagnall's.

[2] Apologies to Santayana.

[3] One might also utilize &rest arguments to construct the key list, as in (defun tak (&rest key) ... ).

[4] We are using the "old" Cray-1 numbers from [Gabriel85]; newer numbers for the Cray-1 are given in [Anderson87].