How to Generate Self-Referential Programs

In this post, I am going to show you how to write programs that are self-referential. By self-referential, I mean programs which are able to obtain their own source code without any external input. In other words, they won’t just read from their own files. This post is based on section 6.1 of the book Introduction to the Theory of Computation.

Before we can start generating self-referential programs we are first going to need some techniques for generating programs in general. The first technique we need is a method of taking a given program and writing a second program that outputs the given program. As an example, given (+ 2 2), we would need to write a program that outputs (+ 2 2). In most languages this is easy. One way to do it in Lisp is to put a quote in front of the program:

'(+ 2 2)
=> (+ 2 2)

We are also going to need a function that automates this process. Such a function would take a program as its argument and return a new program that when ran, outputs the program that was originally passed to the function. In most languages doing this is fairly tricky. In Lisp, we can write this function easily through backquote:

(defun code-that-generates (program)
  <code>',program)

(code-that-generates '(+ 2 2))
=&gt; '(+ 2 2)

If you don’t understand how backquote works, you can read this. Even though its for Emacs Lisp, everything there is still applicable to other Lisps. Just make sure that you understand that code-that-generates can be used to generate a program that outputs a given program.

Now that we have these two techniques, we can begin writing programs that are able to refer to themselves. The first self-referential program we will write will be an example of a quine. If you don’t know, a quine is a program that outputs its own source code. The quine we are going to write is made up of two parts, part A and part B, where part A is a function that is applied to part B:

(A B)

To describe how the quine works, it is easiest to start with part B. All that part B needs to do is return the source code of part A:

(A 'A)

Part As job is to take its own source code, and use it to obtain the source code of the entire quine. Since B is a program that outputs A, A can use code-that-generates on its own source code in order to obtain the source code of B. Once A has the source code of both A and B, it becomes trivial to combine the two to obtain the source code of the entire quine. Here is the complete quine, with the call to code-that-generates inlined:

((lambda (a)
   (let ((b </code>',a))
     <code>(,a ,b)))
 '(lambda (a)
    (let ((b </code>',a))
      <code>(,a ,b))))
=&gt;
((lambda (a)
   (let ((b </code>',a))
     <code>(,a ,b)))
 '(lambda (a)
    (let ((b </code>',a))
      <code>(,a ,b))))

Now this is where things start getting interesting. A quine can be thought of as a program that generates its own source code, and immediately returns it. What if instead of immediately returning its own source code, the quine applied a function to it first, and then returned the result of that. The steps for building such a program are almost exactly the same as the steps we took for building the quine. This time, there is a third part F, for the function we want to call. The structure of the program will look like the following:

(F AB)

Where AB has a similar structure to our quine. After breaking AB into the two parts, A and B, the program looks like the following:

(F (A B))

Part B in the above program has the same responsibilities as B in the quine, it returns the source code for A:

(F (A 'A))

Then once A has the source code for itself, it can use code-that-generates to obtain the source code for B. Now that it has the source of A and B, it is easy for it to construct AB. Once part A has the code for AB, it can easily generate the source of the entire program. Here is what the program becomes after filling in everything except F:

(F
 ((lambda (a)
    (let ((b </code>',a))
      (let ((ab <code>(,a ,b)))
        </code>(F ,ab))))
  '(lambda (a)
     (let ((b <code>',a))
       (let ((ab </code>(,a ,b)))
         <code>(F ,ab))))))

What makes this so awesome is that F can be any function we want, and the above program will run F with the source code of the entire program! For example, replacing F with identity causes the program to become a quine:

(identity
 ((lambda (a)
    (let ((b </code>',a))
      (let ((ab <code>(,a ,b)))
        </code>(identity ,ab))))
  '(lambda (a)
    (let ((b <code>',a))
      (let ((ab </code>(,a ,b)))
        <code>(identity ,ab))))))
=&gt;
(identity
 ((lambda (a)
    (let ((b </code>',a))
      (let ((ab <code>(,a ,b)))
        </code>(identity ,ab))))
  '(lambda (a)
    (let ((b <code>',a))
      (let ((ab </code>(,a ,b)))
        <code>(identity ,ab))))))

But we can also do some much more impressive things. We can replace F with a function that lists its argument twice, and get a program that returns a list containing its own source code twice:

((lambda (x) (list x x))
 ((lambda (a)
    (let ((b </code>',a))
      (let ((ab <code>(,a ,b)))
        </code>((lambda (x) (list x x)) ,ab))))
  '(lambda (a)
     (let ((b <code>',a))
       (let ((ab </code>(,a ,b)))
         <code>((lambda (x) (list x x)) ,ab))))))

=&gt;

(((lambda (x) (list x x))
  ((lambda (a)
     (let ((b </code>',a))
       (let ((ab <code>(,a ,b)))
         </code>((lambda (x) (list x x)) ,ab))))
   '(lambda (a)
      (let ((b <code>',a))
        (let ((ab </code>(,a ,b)))
          <code>((lambda (x) (list x x)) ,ab))))))
 ((lambda (x) (list x x))
  ((lambda (a)
     (let ((b </code>',a))
       (let ((ab <code>(,a ,b)))
         </code>((lambda (x) (list x x)) ,ab))))
   '(lambda (a)
      (let ((b <code>',a))
        (let ((ab </code>(,a ,b)))
          <code>((lambda (x) (list x x)) ,ab)))))))

To make writing these self-referential programs easier, we can define a function that fills in F for us. It just requires a little nested backquote trickery.1

(defun self-referential-version-of (f)
  </code>(,f
     ((lambda (a)
        (let ((b <code>',a))
          (let ((ab </code>(,a ,b)))
            <code>(,',f ,ab))))
       '(lambda (a)
          (let ((b </code>',a))
            (let ((ab <code>(,a ,b)))
              </code>(,',f ,ab)))))))

(self-referential-version-of '(lambda (x) (list x x))
=&gt;
((lambda (x) (list x x))
 ((lambda (a)
    (let ((b <code>',a))
      (let ((ab </code>(,a ,b)))
        <code>(,'(lambda (x) (list x x)) ,ab))))
  '(lambda (a)
     (let ((b </code>',a))
       (let ((ab <code>(,a ,b)))
         </code>(,'(lambda (x) (list x x)) ,ab))))))

Now that we’ve got a function that can generate self-referential programs for us, I am going to show you how to build something called a quine-relay. A quine-relay is like a normal quine, except it passes through multiple languages. The quine-relay we are going to write is a Lisp program that outputs a C program that outputs the original Lisp program. All we have to do is write a function that takes its argument and writes a C program that prints the argument it was given. Then we can pass that function to self-referential-version-of to get the quine-relay! That’s it! Here is a program that will generate the quine-relay:

(self-referential-version-of
  '(lambda (self)
     (format t

"#include <stdio.h>~%int main(){printf(\"%s\",~(~s~));}";

             (remove #\newline (prin1-to-string self)))))

I’ve omitted the actual quine-relay for brevity, but you can find it here if you are curious. There are a few idiosyncrasies in the above program and in the quine-relay because of the differences in behavior between Lisp and C. For example, in C you cant have multi-line strings, so it becomes easier to remove all of the newlines from the Lisp program, than it is to keep them.

And that’s all it takes to write self-referential programs. After seeing how easy it is to generate a quine-relay, it shouldn’t be hard to imagine how to write one with many more steps. You may even be able to get up to 100 if you work at it long enough.

  1. You may have noticed the extra comma and quote in front of F in the generated program. Although it doesn’t make a difference semantically it does make a different syntactically. Luckily, all of the code generated by a program with the extra comma and quote will also contain the extra comma and quote, so everything is okay.

Leave a Reply

Your email address will not be published. Required fields are marked *