How to Generate Self-Referential Programs

In this post, I am going to show you how to write programs that are self-referential. By self-referential, I mean programs which are able to obtain their own source code without any external input. In other words, they won’t just read from their own files. This post is based on section 6.1 of the book Introduction to the Theory of Computation.

Before we can start generating self-referential programs we are first going to need some techniques for generating programs in general. The first technique we need is a method of taking a given program and writing a second program that outputs the given program. As an example, given (+ 2 2), we would need to write a program that outputs (+ 2 2). In most languages this is easy. One way to do it in Lisp is to put a quote in front of the program:

'(+ 2 2)
=> (+ 2 2)

We are also going to need a function that automates this process. Such a function would take a program as its argument and return a new program that when ran, outputs the program that was originally passed to the function. In most languages doing this is fairly tricky. In Lisp, we can write this function easily through backquote:

(defun code-that-generates (program)
  `',program)

(code-that-generates '(+ 2 2))
=> '(+ 2 2)

If you don’t understand how backquote works, you can read this. Even though it’s for Emacs Lisp, everything there is still applicable to other Lisps. Just make sure that you understand that code-that-generates can be used to generate a program that outputs a given program.

Now that we have these two techniques, we can begin writing programs that are able to refer to themselves. The first self-referential program we will write will be an example of a quine. If you don’t know, a quine is a program that outputs its own source code. The quine we are going to write is made up of two parts, part A and part B, where part A is a function that is applied to part B:

(A B)

To describe how the quine works, it is easiest to start with part B. All that part B needs to do is return the source code of part A:

(A 'A)

Part A’s job is to take its own source code, and use it to obtain the source code of the entire quine. Since B is a program that outputs A, A can use code-that-generates on its own source code in order to obtain the source code of B. Once A has the source code of both A and B, it becomes trivial to combine the two to obtain the source code of the entire quine. Here is the complete quine, with the call to code-that-generates inlined:

((lambda (a)
   (let ((b `',a))
     `(,a ,b)))
 '(lambda (a)
    (let ((b `',a))
      `(,a ,b))))
=>
((lambda (a)
   (let ((b `',a))
     `(,a ,b)))
 '(lambda (a)
    (let ((b `',a))
      `(,a ,b))))

Now this is where things start getting interesting. A quine can be thought of as a program that generates its own source code, and immediately returns it. What if instead of immediately returning its own source code, the quine applied a function to it first, and then returned the result of that. The steps for building such a program are almost exactly the same as the steps we took for building the quine. This time, there is a third part F, for the function we want to call. The structure of the program will look like the following:

(F AB)

Where AB has a similar structure to our quine. After breaking AB into the two parts, A and B, the program looks like the following:

(F (A B))

Part B in the above program has the same responsibilities as B in the quine, it returns the source code for A:

(F (A 'A))

Then once A has the source code for itself, it can use code-that-generates to obtain the source code for B. Now that it has the source of A and B, it is easy for it to construct AB. Once part A has the code for AB, it can easily generate the source of the entire program. Here is what the program becomes after filling in everything except F:

(F
 ((lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(F ,ab))))
  '(lambda (a)
     (let ((b `',a))
       (let ((ab `(,a ,b)))
         `(F ,ab))))))

What makes this so awesome is that F can be any function we want, and the above program will run F with the source code of the entire program! For example, replacing F with identity causes the program to become a quine:

(identity
 ((lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(identity ,ab))))
  '(lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(identity ,ab))))))
=>
(identity
 ((lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(identity ,ab))))
  '(lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(identity ,ab))))))

But we can also do some much more impressive things. We can replace F with a function that lists its argument twice, and get a program that returns a list containing its own source code twice:

((lambda (x) (list x x))
 ((lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `((lambda (x) (list x x)) ,ab))))
  '(lambda (a)
     (let ((b `',a))
       (let ((ab `(,a ,b)))
         `((lambda (x) (list x x)) ,ab))))))

=>

(((lambda (x) (list x x))
  ((lambda (a)
     (let ((b `',a))
       (let ((ab `(,a ,b)))
         `((lambda (x) (list x x)) ,ab))))
   '(lambda (a)
      (let ((b `',a))
        (let ((ab `(,a ,b)))
          `((lambda (x) (list x x)) ,ab))))))
 ((lambda (x) (list x x))
  ((lambda (a)
     (let ((b `',a))
       (let ((ab `(,a ,b)))
         `((lambda (x) (list x x)) ,ab))))
   '(lambda (a)
      (let ((b `',a))
        (let ((ab `(,a ,b)))
          `((lambda (x) (list x x)) ,ab)))))))

To make writing these self-referential programs easier, we can define a function that fills in F for us. It just requires a little nested backquote trickery.1

(defun self-referential-version-of (f)
  `(,f
     ((lambda (a)
        (let ((b `',a))
          (let ((ab `(,a ,b)))
            `(,',f ,ab))))
       '(lambda (a)
          (let ((b `',a))
            (let ((ab `(,a ,b)))
              `(,',f ,ab)))))))

(self-referential-version-of '(lambda (x) (list x x))
=>
((lambda (x) (list x x))
 ((lambda (a)
    (let ((b `',a))
      (let ((ab `(,a ,b)))
        `(,'(lambda (x) (list x x)) ,ab))))
  '(lambda (a)
     (let ((b `',a))
       (let ((ab `(,a ,b)))
         `(,'(lambda (x) (list x x)) ,ab))))))

Now that we’ve got a function that can generate self-referential programs for us, I am going to show you how to build something called a quine-relay. A quine-relay is like a normal quine, except it passes through multiple languages. The quine-relay we are going to write is a Lisp program that outputs a C program that outputs the original Lisp program. All we have to do is write a function that takes its argument and writes a C program that prints the argument it was given. Then we can pass that function to self-referential-version-of to get the quine-relay! That’s it! Here is a program that will generate the quine-relay:

(self-referential-version-of
  '(lambda (self)
     (format t

"#include <stdio.h>~%int main(){printf(\"%s\",~(~s~));}"

             (remove #\newline (prin1-to-string self)))))

I’ve omitted the actual quine-relay for brevity, but you can find it here if you are curious. There are a few idiosyncrasies in the above program and in the quine-relay because of the differences in behavior between Lisp and C. For example, in C you can’t have multi-line strings, so it becomes easier to remove all of the newlines from the Lisp program, than it is to keep them.

And that’s all it takes to write self-referential programs. After seeing how easy it is to generate a quine-relay, it shouldn’t be hard to imagine how to write one with many more steps. You may even be able to get up to 100 if you work at it long enough.

Loops in Lisp Part 1: Goto

At its core, Common Lisp provides two primitives for performing iteration. The first of those primitives is recursion. Recursion is an amazing technique, but in this post I am going to focus on the other primitive – goto.

Goto is extremely powerful. It lets you manipulate the control flow of your program in anyway you can think of. This freedom to do whatever you want is also what makes goto so dangerous. In any given piece of code that uses goto, it is difficult to tell what the purpose of the goto is because it could be used for so many different reasons. Because of this, most languages provide various kinds of builtin loops instead of providing raw goto. Even though loops aren’t as general as goto, they express the intention of the code much more clearly.

As an example, let’s say you want to print all of the characters in a file. If your language provided while loops, you could do this by printing characters from the file one at a time while there are more characters left. If Common Lisp had while loops,1 the code for this procedure would look like this:

(while (peek-char file nil nil)
  (write-char (read-char file)))

If your language only had goto, it becomes much more difficult to implement the procedure. In the end, you have to, in some way, simulate a while loop. One way to code the procedure with just goto is the following. First check if there are any characters left in the file. If there aren’t any, goto the end. Otherwise print the next character and go back to the start. Here is Common Lisp code that implements this:2

(tagbody
  start
  (if (not (peek-char file nil nil))
      (go end))
  (write-char (read-char file))
  (go start)
  end)

Not only is the version with goto much more verbose, it is also much harder to understand. The code lacks clarity because goto is so general. It gives you no context into how it is being used. The reader of the code will have to think about the positioning of all of the gotos before they can think about the overall flow of the program. On the other hand, in the version with the while loop, merely the fact that a while loop is being used gives whoever is reading the code a decent idea of the control flow.

In reality all loops are eventually compiled down to gotos. Whenever the compiler for a language that provides loops sees a loop, it generates code that simulates the loop through goto. You can do the same thing with Lisp macros!

If you don’t know, Lisp macros are compile time functions which take code as their input and return code as their output. When Lisp code is being compiled, all of the macros in the code are called and each one is replaced with its result. This means you can write a macro that looks like a while loop when you use it, but at compile time generates code to simulate a while loop through goto. You are in effect adding while loops to the Lisp compiler! Here is code that defines such a macro:

(defmacro while (test &body body)
  (let ((gtop (gensym))
        (gend (gensym)))
    `(tagbody
       ,gtop
       (if (not ,test)
           (go ,gend))
       ,@body
       (go ,gtop)
       ,gend)))

With this macro, the first code example is now valid lisp code! The while macro takes as arguments a test and a body. It then generates code that uses the method used in the second example to simulate a while loop with goto. You can actually see what the first example looks like after expanding the macro by using the function macroexpand. Here is what the generated code looks like:

(tagbody
  #:g729
  (if (not (peek-char file nil nil))
      (go #:g730))
  (write-char (read-char file))
  (go #:g729)
  #:g730)

The generated code is the exact same as the code in the second example except for the names of the labels. This means the two examples are the same functionally! The only real difference between them is that the first one is expressed in terms of loops, and the second one is expressed in terms of goto. Since it is so much easier to think in terms of loops than goto, there is no reason why you wouldn’t use the first example over the second.

Macros allow you to build any feature you want as long as it is possible to simulate that feature through lower level features. With respect to goto, this means you can build any kind of control flow construct you want by simulating it with goto and then putting a macro on top. In Common Lisp, all of the looping constructs (do, do*, dotimes, dolist, loop) are really just macros that expand into goto. This is what Alan Kay meant when he said “Lisp isn’t a language, it’s a building material”. It bears repeating. In Lisp, you can build any feature you want as long as it is possible to simulate that feature in terms of lower level features.

Defmemo

In my last post I talked about memoization i.e. caching the results of a function. Memoization is a fairly common technique for optimization. It is common enough to warrant writing a macro that makes it easy to define memoized functions. When demonstrating memoization, I had a memoized Fibonacci function that looked like this:1

(let ((table (make-hash-table)))
  (defun fib (n)
    (or (gethash n table)
        (setf (gethash n table)
              (if (<= 0 n 1)
                  n
                  (+ (fib (- n 1))
                     (fib (- n 2))))))))

There are a couple problems with the above code. One problem is the boilerplate. If you wanted ten different memoized functions, you would have to copy lines 1, 3, and 4 for every single memoized function. Some people like to call programmers who do this needless duplication, “human compilers”, since they are writing code that the compiler should be writing for them.

Another issue with the above code is the lack of abstraction. If you wanted to change the caching mechanism to say, only cache the last hundred values, you would have to change the definition of every single function! Ideally you would only need to modify the code in one place in order to change how the caching is implemented.

Defmemo is one way to solve both of these problems. Here is what the above code would look like if it were were to use defmemo:

(defmemo fib (n)
  (if (<= 0 n 1)
      n
      (+ (fib (- n 1))
         (fib (- n 2)))))

Defmemo solves both of the problems extremely well. It removes all of the differences between the memoized version on the regular version except for having to use “defmemo” instead of “defun”. Defmemo also solves the abstraction problem by moving all of the code relevant to memoization into the body of defmemo. If you want to change how memoization works, all you have to do is change the code for defmemo.

Now for the implementation of defmemo. The implementation is made up of two separate parts. First, a higher order function, memo, which takes a function as an argument, and returns a memoized version of that function. The second part is the actual macro, defmemo. Instead of just defining the function like defun, defmemo first builds a lambda expression for the body. Then it generates code that calls memo on that lambda function. Finally defmemo uses the result of memo as the implementation of the function being defined.2

Here is the code for memo:34

(defun memo (f)
  (let ((cache (make-hash-table :test #'equalp)))
    (lambda (&rest args)
      (or (gethash args cache)
          (setf (gethash args cache)
                (apply f args))))))

Memo works by returning a function that has an internal hash-table. When that function is called, it first checks its hash-table to see if it has been called with the same arguments before. If so, it returns the value it had calculated the first time it was called.5 If it hasn’t been called with the same arguments before, the function will instead call the function that was passed in to memo, and then store the result of that inside the table. This way, if the memoized function is called with the same arguments a second time, it can just look up the result in the table.

Next, for defmemo itself, we need to generate code that takes the body as a lambda expression, passes that lambda function through memo, and uses that as the implementation of the function. One way to set the implementation of a function to be a lambda function is to use setf with symbol-function.6 For example, here is how you could set the implementation of square to be a lambda function that squares its argument:

(setf (symbol-function 'square) (lambda (x) (* x x)))

(square 5) => 25

Based on the paragraph above, here is the code for defmemo:

(defmacro defmemo (name args &body body)
 `(setf (symbol-function ',name) 
        (memo (lambda ,args ,@body))))

Now instead of defining a function with defun, we can define it with defmemo and it will automatically be memoized! Defmemo is a great example of how you can define your own ways to define functions. Many libraries provide similar features in which you use the same syntax as defun, only with a bit of magic thrown in.

Zap

This post makes use of places. If you are unfamiliar with how places work, see my post Getting Places.

Many languages provide syntactic sugar for evaluating an expression involving a variable and assigning the result of that expression to the variable at the same time. In these languages you can do something such as the following:

x += 5

The above expression both adds five to the value of x and writes that new value back to x. In this post, I’m going to show you how you can write a macro zap that is a generalized version of this technique. With zap the above example would look like the following:

(zap #'+ x 5)

There are a couple things that make zap really cool. First of all, it can be used with any function. For example, if you wanted to cube the value in x, you could use the following:

(zap #'expt x 3)

The other thing that makes zap so awesome is that it can be used on any place. If you want to use zap on the value stored in a hash table with key 5, you can do that:

(zap #'+ (gethash 5 table) 5)

Now that you’ve seen how zap is used, here is how it can be implemented:

(defmacro zap (fn place &rest args)
  (multiple-value-bind 
        (temps exprs stores store-expr access-expr) 
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores) 
              (funcall ,fn ,access-expr ,@args)))
       ,store-expr)))

You should be able to see that the code for zap is eerily similar to that of incf (from Getting Places). They are the exact same except instead of binding the gensym that will hold the new value to one plus the value already in the place:

(,(car stores) (+ 1 ,access-expr))

The gensym is bound to the result of calling the function with the value in the place and all of the other arguments passed to zap:

(,(car stores) (funcall ,fn ,access-expr ,@args))

Although zap is just a nice syntactic shortcut, it is a great example of the crazy things you can do with places.

Multiple-value-bind

Common Lisp is a pretty unique language. One of the many features that makes Common Lisp such an awesome language is multiple values. Yes, you read right. In Common Lisp it is possible for a function to return more than a single value. One example of a function that takes advantage of multiple values is floor. Floor takes a number as its argument and returns two values, whatever was passed in rounded down and the remainder.

(floor 3.5)
=>
3
0.5

When you use floor in the manner above, you get two values back, 3 as the first return value, and 0.5 as the second. What’s really cool is that the values besides the first are completely ignored unless you explicitly ask for them.1 This means you can pretend that floor returns only a single value as long as you don’t need the other ones. Notice how in the following example, the + function is not aware of the second value returned by floor:

(+ (floor 3.5) 10)
=> 13

Now you may be wondering, “How can I obtain other values besides the first one?”. Well, there are several macros for doing that, the main one being multiple-value-bind. To use multiple-value-bind, you specify a list of the variables you want to bind each value to, followed by the expression that will return multiple values. Let’s say you want to multiply the two values returned by floor together. Here is how you would do that with multiple-value-bind:

(multiple-value-bind (val remainder) (floor 3.5)
  (* val remainder))
=> 1.5

It is also easy to create your own function that returns multiple values. All you need to do is pass each value you want to return to the values function. Below is a function which returns both twice its argument and three times its argument:

(defun multiples (x)
  (values (* 2 x) (* 3 x)))

(multiples 10)
=>
20
30

There is just one more thing you need to know about multiple values. If the last call of a function is to another that returns multiple values, the first function will return all of the values the second one returns. If you were to write a function that doubles its argument and then uses floor to round it down, that function will return both values that are returned by floor.

(defun double-and-round-down (x)
  (floor (* 2 x)))

(double-and-round-down 5.25)
=>
10
0.5

This behavior may or may not be desired. The standard way to make sure your function only returns a single value is to wrap the function that returns multiple values with a call to valuesValues will pay attention only to the first value and will return just that and nothing else.

(defun double-and-round-down (x)
  (values (floor (* 2 x))))

(double-and-round-down 5.25)
=> 10

And that’s all you need to know to work with multiple values!

Hofeach

Last time I talked about mapeach, a macro which is a simple wrapper around mapcar. After using mapeach a couple times, I found that I wanted ‘each’ version of many other other functions, removefind, and count to name a few. One option I had was to write a macro for every single one of these functions. If I were to have done this, I would have wound up with ‘remove-each’, ‘find-each’, and so on. Instead I took door number two, creating a general macro which I call ‘hofeach’Hofeach, is just like mapeach, except it takes an extra argument for the HOF (higher order function), that you want to use. Below is one possible implementation of hofeach.

(defmacro hofeach (hof var list &body body)
  `(funcall ,hof (lambda (,var) ,@body) ,list))

Here is what code that uses hofeach as a fill in for mapeach looks like:

(hofeach #'mapcar x '(1 2 3)
  (* x x))

=> (1 4 9)

Now we get to specify which HOF we want to use! If we want to keep all of the numbers in a list that are even, here is how we could do that:1 2

(hofeach #'remove-if-not x '(1.2 5 7 2 3.5 6 9)
  (and (integerp x) (evenp x)))

=> (2 6)

So now that I have hofeach, I generally will use it instead of passing a complex lambda expression to a HOF. Most of the time I use hofeach with remove-if-not, but I have also used it with count-if as well. It gives code a nice down and to the right look, which I find pretty easy to read. You get to read the forms in the order that they appear. If you were to use a lambda expression instead, it becomes much more difficult to read since you have to jump around in order to read the code.

Mapeach

Many times when using mapcar, I find myself using a complex lambda expression for the function argument. This makes the code difficult to read since it breaks apart the flow. My code winds up looking like the following:

(mapcar (lambda (x)
          ...)
        list)

First you have to read the possibly massive lambda expression, then you finally find out what you are mapping over. As the lambda expression increases in length, it becomes harder and harder to read. A way to fix this is with the macro mapeach. Mapeach is a macro which is meant to be used when the lambda expression that would be passed to mapcar is much longer than the expression for the list. Mapeach works just like mapcar, but instead provides an alternative syntax which makes it easier to read when the lambda expression is complicated. Here is one possible implementation of mapeach:1

(defmacro mapeach (var list &body body)
  `(mapcar (lambda (,var) ,@body) ,list))

Mapeach, does two things to fix the problem. First it hides the lambda, making it easier to find the important parts of the code. Second, it inverts the order of the arguments, putting the simple list expression first and the complex body second. As a simple example of mapeach, here is how one could square each element in a list using it:

(mapeach x '(1 2 3)
  (* x x))

If one wanted to write the above code by using mapcar, it would look something like the following:

(mapcar (lambda (x)
          (* x x))
        '(1 2 3))

Although it doesn’t shine for this simple example, you can tell that mapeach makes the code a bit clearer. As the body for the lambda expression gets longer and longer, mapeach begins to make the code much easier to understand. I find that mapcar is nice to use only when the expression for the function is short. This happens either when you are either using a named function or you are using some sort of reader macro. Mapeach is another one of those macros that makes what seems like an insignificant difference. Even so, I find that it aids a lot in readability since it puts all of the simple parts in one place.

Automatically Binding Gensyms

One of the most common macros that almost everyone keeps in their utilities file is with-gensymsWith-gensyms is a macro that binds a list of variables to gensyms. That’s it! All with-gensyms does it take a list of symbols and generates code which binds each of those symbols to a gensym.  Although with-gensyms is simple, it removes a lot of boiler plate code. Here is a simple implementation of with-gensyms:

(defmacro with-gensyms (vars &body body)
  `(let ,(loop for v in vars collect `(,v (gensym)))
     ,@body))

Looking at my implementation of accum, here is how one could simplify it by using with-gensyms. Pay attention to how much boiler plate is removed.

(defmacro accum (accfn &body body)
  (with-gensyms (ghead gtail garg)
    `(let* ((,ghead (list nil))
            (,gtail ,ghead))
       (macrolet ((,accfn (,garg)
                    `(setf ,',gtail
                           (setf (cdr ,',gtail)
                                 (list ,,garg)))))
         ,@body
         (cdr ,ghead)))))

By removing so much boiler plate, with-gensyms helps greatly reduce the cognitive load in certain cases. This will be important when I introduce once-only, the next macro I plan to talk about. There are also other variations of with-gensyms such as the one in Alexandria which makes it easier to have base names associated with the gensyms created.

The Ret Macro

The ret macro is simple, but useful. It allows you to bind a variable and simultaneously specify the final value of that variable as the result of the ret expression, hence the name “ret”, a blend of let and return. Here is the implementation of ret:

(defmacro ret (var val &body body)
  `(let ((,var ,val))
     ,@body
     ,var))

And now for an example. If you want to write a function that converts a hash-table into an alist, here is how you could do that using ret.

(defun table->alist (table)
  (ret result '()
    (maphash (lambda (k v)
               (push (cons k v) result))
             table)))

Personally, I find code using ret to be much clearer than the equivalent code using let. When writing the let version, I commonly find myself forgetting to return the value I want to at the end. Ret eliminates this problem by returning the result for me. The only downside I see to using ret is that it has the same problem as all of the “do” macros (dotimes, dolist, etc), they change where the resulting value comes from. Even so, this problem is much more mild with ret because ret will always change the control flow, whereas the “do” macros may or may not change the final value depending on the number of arguments passed to them.