Building Fizzbuzz in Fractran from the Bottom Up

In this post, I am going to show you how to write Fizzbuzz in the programming language Fractran. If you don’t know, Fractran is an esoteric programming language. That means it is extraordinary difficult to write any program in Fractran. To mitigate this difficultly, instead of writing Fizzbuzz in raw Fractran, what we are going to do is build a language that compiles to Fractran, and then write Fizzbuzz in that language.

This post is broken up into three parts. The first part covers what Fractran is and a way of understanding what a Fractran program does. Part 2 will go over the foundation of the language we will build and how it will map to Fractran. Finally, in Part 3, we will keep adding new features to the language until it becomes easy to write Fizzbuzz in it.


Part 1: Understanding Fractran

Before we can start writing programs in Fractran, we have to first understand what Fractran is. A Fractran program is represented as just a list of fractions. To execute a Fractran program, you start with a variable N=2. You then go through the list of fractions until you find a fraction F, such that N*F is an integer. You then set N=N*F and go back to the beginning of the list of fractions. You keep repeating this process until there is no fraction F such that N*F is an integer.

Since there is no way to print anything with the regular Fractran rules, we are going to add one additional rule on top of the ordinary ones. In addition to the list of fractions, each program will have a mapping from numbers to characters representing the “alphabet” of the program. After multiplying N by F, whenever the new N is a multiple of one of the numbers in the alphabet, that will “print” the character that the number maps to. I have written a function, run-fractran, which implements this version of Fractran and included it here. It takes a list of fractions and an alphabet as an alist and executes the program.

Let’s walk through a simple example. Let’s say we have the following Fractran program:

9/2, 1/5, 5/3

with the alphabet 5->’a’. To run this program, we start with N=2. We then go through the list fractions until we find a fraction F such that N*F is an integer. On this first step, F becomes 9/2, since N*F = 2 * 9/2 = 9 which is an integer. We then set N to N*F so that N now becomes 9. Repeating this process again, we get F=5/3 and N=N*F=15. Since the number 5 is in the alphabet, and N is now a multiple of 5, we output the character that 5 maps to, ‘a’. If we keep repeating these steps, we eventually reach a point where N=1 and we have outputted the string “aa”. Since 1 times any of the fractions does not result in an integer, the program terminates with the output “aa”.

At this point, you may be thinking that writing any program in Fractran is nearly impossible. The truth is that there is a simple trick you can use that makes it much easier program Fractran. All you need to do is look at the prime factorization of all of the numbers. Let’s see what the above Fractran program looks like if we convert every number into a tuple (a,b,c) where a is the how many times 2 divides the number, b is how many times 3 does, and c is how many times 5 does. The program then becomes:

(0, 2, 0) / (1, 0, 0)
(0, 0, 0) / (0, 0, 1)
(0, 0, 1) / (0, 1, 0)

We also have the tuple (0,0,1) mapping to ‘a’ for our alphabet. We start with N = (1,0,0). If you don’t know, multiplying two numbers is the same as adding the counts of each prime factors, and division is the same as subtracting the counts. For example, 2 * 6 = (1,0,0) + (1,1,0) = (2,1,0) = 12. With this way of looking at the program, finding a fraction F such that N*F is an integer becomes finding a “fraction” F such that each element in the tuple N is greater than or equal to the corresponding element in the tuple in the denominator of F. Once we find such F, instead of multiplying N by it, you subtract from each element of N the corresponding value in the denominator of F (equivalent to dividing by the denominator), and add the corresponding value in the numerator (equivalent to multiplying by the numerator). Executing the program with this interpretation proceeds as follows.

We start with N = (1,0,0). Since every value in N is greater than or equal to their corresponding values in the denominator of the first fraction, we subtract every value in the first denominator and then add every value in the numerator to get N = (1,0,0) – (1,0,0) + (0,2,0) = (0,2,0). Repeating this again, F becomes the third fraction. Subtracting the denominator and adding the numerator gets us N = (0,1,1). Then since every value in N is greater than or equal to their corresponding element in (0,0,1), we print ‘a’. The program continues, just like it did for the original Fractran program.

Basically we can think of every prime number as having a “register” which can take on non-negative integer values. Each fraction is an instruction that operates on some of the registers. You can interpret a fraction as saying if the current value of each register is greater than or equal to the the value specified by the denominator (the number of times the prime for that register divides the denominator), you subtract from the registers all of the values in the denominator, add all the values specified in the numerator (the number of times the prime for each register divides the numerator), and then jump back to the first instruction. Otherwise, if any register is less than the value specified in the denominator, continue to the next fraction. For example, the fraction 9/2 can be translated into the following pseudocode:

;; If the register corresponding to the prime number 2 
;; is greater or equal to 1
if reg[2] >= 1
  ;; Decrement it by 1 and increment the register 
  ;; corresponding to 3 by 2. 
  reg[2] = reg[2] - 1
  reg[3] = reg[3] + 2
  goto the beginning of the program
;; Otherwise continue with the rest of the program.

Although programming Fractran is still difficult, this technique suddenly makes writing Fizzbuzz in Fractran tractable.


Part 2: Compiling to Fractran

For our compiler, we are going to need to generate a lot of primes. To do so, we will use a function, new-prime, which will generate a different prime each time it is called.1

(defun prime (n)
  "Is N a prime number?"
  (loop for i from 2 to (isqrt n)
        never (multiple n i)))

(defparameter *next-new-prime* nil)

(defun new-prime ()
  "Returns a new prime we haven't used yet."
  (prog1 *next-new-prime*
    (setf *next-new-prime*
          (loop for i from (+ *next-new-prime* 1)
                if (prime i)
                  return i))))

So now that we’ve got new-prime, we’ve we can start figuring out how we are going to compile to Fractran. The first detail we will need to figure out is how to express control flow in Fractran.  In other words, we need a way to specify which fractions will execute after each other fractions. This is a problem because after a fraction executes, you always jump back to the first fraction.

Expressing control flow actually winds up being surprisingly easy. For each fraction we can designate a register. Then, we only execute a fraction if its register is set. It is easy to have a fraction conditionally execute depending on whether its register is set by using the trick we are using to interpret a Fractran program. All we need to do is multiply the denominator of each fraction by the prime for the register of that fraction. This way, we will pass over a fraction unless its register is set. Also, all we need to do to specify which fraction should execute after a given fraction is to multiply the numerator of the given fraction by the prime of the register for the next fraction. By doing this, after a fraction executes, it will set the register of the next fraction.

In order to keep track of the primes for the current fraction and for the next fraction, we will have two global variables. The first will be the prime number for the current instruction, and the second will be the prime number for the next instruction:

(defparameter *cur-inst-prime* nil)
(defparameter *next-inst-prime* nil)

We will also need a function advance which will advance the values of the variables once we move on to the next instruction.

(defun advance ()
  (setf *cur-inst-prime* *next-inst-prime*
        *next-inst-prime* (new-prime)))

Now that we’ve got a way of expressing control flow, we can start planning out what the language we will build will look like. From this point on, I am going to call the language we are building, Lisptran. An easy way we represent a Lisptran program is as just a list of expressions. We can have several different kinds of expressions each of which does something different.

The simplest kind of expression we will want is an inline fraction. If a Lisptran expression is just a fraction, we can just add that fraction to the Fractran program being generated.

Another kind of expression that would be useful are labels. Whenever a Lisptran expression is a Lisp symbol, we can interpret that as a label. Each label will be converted into that fraction that is the prime of the next instruction after the label divided by the prime of the label. This way we can jump to the instruction after the label by setting the register for the label. In order to make keeping track of the primes of labels easy, we are going to keep a hash-table, *lisptran-labels*, mapping from labels to the primes for those labels. We will also have a function prime-for-label, which will lookup the prime for a label or assign a new prime if one hasn’t been assigned yet:

(defparameter *lisptran-labels* nil)

(defun prime-for-label (label)
  (or (gethash label *lisptran-labels*)
      (setf (gethash label *lisptran-labels*)
            (new-prime))))

One last kind of expression that will be useful are macro calls. A macro call will be a list whose first element is the name of a macro followed by a list of arbitrary Lisp expressions (The expressions don’t have to be Fractran expressions. They can be interpreted however the macro wants them to be.). In order to compile a macro call, we will lookup the function associated with the macro, and call it on the expressions in the rest of the macro call. That function should then return a list of Lisptran expressions which will then be compiled in place of the macro call. After that we just continue compiling the new code generated by the macro expansion.

To keep track of the definitions of macros, we will keep a hash-table *lisptran-macros*, which will map from the name of the macro to the function for that macro. In order to make defining Lisptran macros easy, we can create a Lisp macro deftran, that works in a similar way to defmacro. When defining a macro with deftran, you are really just defining a function which will take the expressions in the macro call, and return a list of Lisptran instructions to be compiled in its place. Here is the definition for deftran:

(defparameter *lisptran-macros* (make-hash-table))

(defmacro deftran (name args &body body)
  "Define a Lisptran macro."
  `(setf (gethash ',name *lisptran-macros*)
         (lambda ,args ,@body)))

And that’s all of the different kinds of expressions we will need in Lisptran.

Although we now have all of the expressions we need, there are a few more pieces of the compiler we need to figure out. For example, we still haven’t figured out how we are going to represent variables yet. Ultimately this is trivial. We can just assign a register to every variable and keep a mapping from variable names to primes in the same way we have the mapping for labels:

(defparameter *lisptran-vars* nil)

(defun prime-for-var (var)
  (or (gethash var *lisptran-vars*)
      (setf (gethash var *lisptran-vars*)
            (new-prime))))

One last piece of the compiler we need to figure out is how we are going to represent the alphabet of the program. One way we can do this is just represent the characters in our alphabet as variables. The alphabet of a program could just be all of the variables that have characters for names and the primes of the registers for those variables. By doing it this way, we can print a character by just incrementing and then immediately decrementing a variable! Here is code that can be used to obtain the alphabet from *lisptran-vars*:

(defun alphabet (vars)
  "Given a hash-table of the Lisptran variables to primes, 
   returns an alist representing the alphabet."
  (loop for var being the hash-keys in vars 
        using (hash-value prime)
        if (characterp var)
          collect (cons var prime)))

Now that we can express control flow, variables, and macros, we have everything we need to write the actual Lisptran to Fractran compiler:

(defun assemble (insts)
  "Compile the given Lisptran program into Fractran. 
   Returns two values. The first is the Fractran program 
   and the second is the alphabet of the program."
  (let* ((*cur-prime* 2)
         (*cur-inst-prime* (new-prime))
         (*next-inst-prime* (new-prime))
         (*lisptran-labels* (make-hash-table))
         (*lisptran-vars* (make-hash-table)))
    (values (assemble-helper insts)
            (alphabet *lisptran-vars*))))

(defun assemble-helper (exprs)
  (if (null insts)
      '()
      (let ((expr (car exprs))
            (rest (cdr exprs)))
        (cond
          ;; If it's a number, we just add it to the 
          ;; Fractran  program and compile the rest 
          ;; of the Lisptran program
          ((numberp expr)
           (cons expr (assemble-helper rest)))

          ;; If it's a symbol, we divide the prime for 
          ;; the next instruction by the prime for the 
          ;; label.
          ((symbolp expr)
           (cons (/ *cur-inst-prime* 
                    (prime-for-label expr))
                 (assemble-helper rest)))

          ;; Otherwise it's a macro call. We look up the 
          ;; macro named by the first symbol in the 
          ;; expression and call it on the rest of the 
          ;; rest of the expressions in the macro call. 
          ;; We then append all of the instructions 
          ;; returned by it to the rest of the program 
          ;; and compile that.
          (:else
            (let ((macrofn (gethash (car inst)
                                    *lisptran-macros*)))
              (assemble-helper (append (apply macrofn
                                              (cdr inst))
                                       rest))))))))

The function assemble takes a Lisptran program and returns two values. It returns the generated Fractran program and the alphabet of that program. assemble first initializes all of the global variables for the program and then goes to assemble-helper which recursively processes the Lisptran program according to the specification above. Using the function run-fractran that I mentioned above, we can write a function that will execute a given Lisptran program as follows:

(defun run-lisptran (insts)
  "Run the given Lisptran program."
  (multiple-value-call #'run-fractran (assemble insts)))

Part 3: Building Lisptran

Now that we’ve completed the core compiler, we can start adding actual features to it. From here on out, we will not touch the core compiler. All we are going to do is define a couple Lisptran macros. Eventually we will have enough macros such that programming Lisptran seems like programming a high level assembly language.

The first operations we are going should define are basic arithmetic operations. For example, addition. In order to add addition to Lisptran, we can define a macro addi, which stands for add immediate. Immediate just means that we know what number we are adding at compile time. The macro addi will take a variable and a number, and will expand into a fraction which will add the given number to the register for the variable. In this case, the denominator for the fraction will just be the prime for the current instruction (execute this instruction when that register is set) and the numerator will be the prime for the next instruction (execute the next instruction after this one) times the prime for the variable raised to the power of the number we are adding (add the immediate to the register). Here is what the definition for addi looks like:

(deftran addi (x y)
  (prog1 (list (/ (* *next* (expt (prime-for-var x) y))
                  *cur*))
    (advance)))

With are also going to want an operation that performs subtraction. It’s a bit tricky, but we can implement a macro subi (subtract immediate) in terms of addi, since adding a number is the same as adding the negative of that number:2

(deftran subi (x y) `((addi x ,(- y))))

Now that we’ve got some macros for performing basic arithmetic, we can start focusing on macros that allow us to express control flow. The first control flow macro we will implement is >=i (jump if greater than or equal to immediate). In order to implement >=i, we will have it expand into three fractions. The first fraction will test if the variable is greater or equal to the immediate. If the test succeeds, we will then advance to the second fraction which will restore the variable (since when a test succeeds, all of the values from the denominator are decremented from the corresponding registers), and then jump to the label passed in to >=i. If the test fails, we will fall through to the third fraction which will just continue onto the next fraction after that.

The denominator of the first fraction will be the prime for current instruction (execute the instruction if that register is set) times the prime for the register raised to the power of the constant (how we test that the register is greater than or equal to the immediate) and the numerator will be the prime for the second instruction (so we go to the second instruction if the test succeeds). The second fraction is just the prime for the label passed into >=i (so we jump to wherever the label designates) divided the prime for that instruction. Lastly, the denominator of the third fraction is the prime for the current instruction (so we fall through to it if the test in the first fraction fails), and the numerator is just the prime for the next instruction so that we continue to that if the test fails:

(deftran >=i (var val label)
  (prog1 (let ((restore (new-prime)))
           (list (/ restore
                    (expt (prime-for-var var) val)
                    *cur-inst-prime*)
                 (/ (* (prime-for-label label)
                       (expt (prime-for-var var) val))
                    restore)
                 (/ *next-inst-prime* *cur-inst-prime*)))
    (advance)))

Believe it or not, but after this point, we won’t need to even think about fractions anymore. Lisptran now has enough of a foundation that all of the further macros we will need can be expressed in terms of addisubi and >=i. The only two functions that actually need to be implemented in terms of Fractran are addi and >=i. That means no more thinking about Fractran. From here on out, all we have is Lisptran!

We can easily define unconditional goto in terms of >=i. Since all of the registers start at 0, we can implement goto as greater than or equal to zero. We use the Lisp function gensym to generate a variable without a name so that the variable doesn’t conflict with any other Lisptran variables:

(deftran goto (label) `((>=i ,(gensym) 0 ,label)))

Then through a combination of >=i and goto, we can define <=i:

(deftran <=i (var val label)
  (let ((gskip (gensym))) 
    `((>=i ,var (+ ,val 1) ,gskip)
      (goto ,label)
      ,gskip)))

Now that we have several macros for doing control flow, we can start building some utilities for printing. As mentioned previously printing a character is the same as incrementing the variable with the character as its name and then immediately decrementing it:

(deftran print-char (char)
  `((addi ,char 1)
    (subi ,char 1)))

Then if we want to write a macro that prints a string, it can just expand into a series of calls to print-char, each of which prints a single character in the string:

(deftran print-string (str)
  (loop for char across str
        collect `(print-char ,char)))

We are also going to need a function to print a number. Writing this with the current state of Lisptran is fairly difficult since we haven’t implemented several utilities such as mod yet, but we can start by implementing a macro print-digit that prints the value of a variable that is between 0 and 9. We can implement it, by having it expand into a series of conditions. The first one will check if the variable is less than or equal to zero. If so it will print the character zero and jump past the rest of the conditions. Otherwise it falls through to the next condition which tests if the variable is less than or equal to one and so on. We don’t have to manually write the code for print-digit because we can use Lisp to generate the code for us:

(deftran print-digit (var)
  (loop with gend = (gensym)
        for i from 0 to 9
        for gprint = (gensym)
        for gskip = (gensym)
        append `((<=i ,var ,i ,gprint)
                 (goto ,gskip)
                 ,gprint
                 (print-char ,(digit-char i))
                 (goto ,gend)
                 ,gskip)
        into result
        finally (return `(,@result ,gend))))

At this point, now that we have macros for performing basic arithmetic, basic control flow, and printing, we can start writing some recognizable programs. For example here is a program that prints the numbers from zero to nine:

(start
 (>=i x 10 end)
 (print-digit x)
 (print-char #\newline)
 (addi x 1)
 (goto start)
 end)

If you are curious I have included the Fractran program generated by this Lisptran program here. It’s hard to believe that the above Lisptran program and the Fractran program are equivalent. They look completely different!

Now that we have a bunch of low level operations, we can start building some higher level ones. You may not have thought of it, but instructions don’t need to just have flat structure. For example, now that we have goto, we can use it to define while loops (just like in Loops in Lisp):

(deftran while (test &rest body)
  (let ((gstart (gensym))
        (gend (gensym)))
    `((goto ,gend)
      ,gstart
      ,@body
      ,gend
      (,@test ,gstart))))

In order to implement while, we are assuming that all predicates take labels as their last argument which is where they will jump to if the predicate succeeds. Now that we have while loops, we can start writing some much more powerful macros around manipulating variables. Here’s two useful ones, one that sets a variable to zero, and one that copies the value in one variable to another:

(deftran zero (var)
  `((while (>=i ,var 1)
      (subi ,var 1))))

(deftran move (to from)
  (let ((gtemp (gensym)))
    `((zero ,to)
      (while (>=i ,from 1)
        (addi ,gtemp 1)
        (subi ,from 1))
      (while (>=i ,gtemp 1)
        (addi ,to 1)
        (addi ,from 1)
        (subi ,gvar 1)))))

For move, we first have to decrement the number we are moving from and increment a temporary variable. Than we restore both the original variable and the variable we are moving the value to at the same time.

With all of these macros, we can finally start focusing on macros that are actually relevant to Fizzbuzz. One operation that is absolutely going to be necessary for Fizzbuzz is mod. We can implement a macro modi by repeatedly subtracting the immediate until the variable is less than the immediate.

(deftran modi (var val)
  `((while (>=i ,var ,val)
      (subi ,var ,val))))

We only need one more real feature before we can start writing Fizzbuzz. We are going to need a way of printing numbers. In order to print an arbitrary number, we are going to need a way of doing integer division. We can implement a macro divi by repeatedly subtracting the immediate until the variable is less than the immediate and keeping track of the number of times we’ve subtracted the immediate.

(deftran divi (x y)
  (let ((gresult (gensym)))
    `((zero ,gresult)
      (while (>=i ,x ,y)
        (addi ,gresult 1)
        (subi ,x ,y))
      (move ,x ,gresult))))

Now for the final macro we will need. A macro for printing numbers. Actually, we are going to cheat a little. Printing numbers winds up being pretty difficult since you have to print the digits from left to right, but you can only look at the lowest digit at a time. To make things easier, we are only to write a macro that is able to print two digit numbers. We won’t need to print 100 since “buzz” will be printed instead.

(deftran print-number (var)
  (let ((gtemp (gensym))
        (gskip (gensym)))
    `((move ,gtemp ,var)
      (divi ,gtemp 10)
      (>=i ,gtemp 0 ,gskip)
      (print-digit ,gtemp)
      ,gskip
      (move ,gtemp ,var)
      (modi ,gtemp 10)
      (print-digit ,gtemp)
      (print-char #\newline))))

Now our language is sufficiently high enough that Fizzbuzz is going to be practically as easy as it will get. Here is an implementation of Fizzbuzz in Fractran.

((move x 1)
 (while (<=i x 100)
   (move rem x)
   (modi rem 15)
   (<=i rem 0 fizzbuzz)

   (move rem x)
   (modi rem 3)
   (<=i rem 0 fizz)

   (move rem x)
   (modi rem 5)
   (<=i rem 0 buzz)

   (print-number x)
   (goto end)

   fizzbuzz
   (print-string "fizzbuzz")
   (goto end)

   fizz
   (print-string "fizz")
   (goto end)

   buzz
   (print-string "buzz")
   (goto end)

   end
   (addi x 1)))

I’ve also included the generated Fractran program here and included all of the full source code for this blog post here.

I find it absolutely amazing that we were able to build a pretty decent language by repeatedly adding more and more features on top of what we already had. To recap, we implemented a basic arithmetic operation (addi) in terms of raw Fractran and then defined a second (subi) in terms of that. From there we defined three macros for doing control flow (>=igoto<=i), with the second two being defined in terms of the first. Then we were then able to define macros for printing (print-charprint-stringprint-digit). At this point we had all of the low level operations we needed so we could start implement while loops (while), a high level control flow construct. With while loops, we were able to define several macros for manipulating variables (zeromove). With these new utilities for manipulating variables we could define more advanced arithmetic operations (modidivi). Then with these new operations we were able to define a way to print an arbitrary two digit number (print-number). Finally, using everything we had up to this point, we were able to write Fizzbuzz. It’s just incredible that we could make a language by always making slight abstractions on top of the operations we already had.

Or=

This post makes use of places. If you are unfamiliar with places, see my post Getting Places.

There are many cases where caching the results of a function (also called memoization), make a function much more efficient. For example a function that calculates the Fibonacci numbers:

(defun fib (n)
  (if (<= 0 n 1)
      n
      (+ (fib (- n 1))
         (fib (- n 2)))))

If you try running fib on different values, you will notice that around 35 or so, it starts to take quite a long time to run. The problem is that fib calculates the smaller Fibonacci numbers many more times than it needs to. When calculating the 35th Fibonacci number, the second Fibonacci number is calculated a total of 5702887 times.1

This is where memoization comes in. If the above function were memoized, it would only need to calculate each Fibonacci number once. Then, whenever fib is asked to calculate a number it has already calculated, it can just look up the result in the table. Here is what the above code would look like if it were to take advantage of memoization:

(let ((table (make-hash-table)))
  (defun fib (n)
    (or (gethash n table)
        (setf (gethash n table)
              (if (<= 0 n 1)
                  n
                  (+ (fib (- n 1))
                     (fib (- n 2))))))))

With the memoized version, you will hit a stack overflow before you find a value that takes more than a moment to calculate. The problem with the above implementation is that it has some duplicate code. There are two calls made to gethash. The first call checks to see if the value has already been calculated. If not, fib calculates the value manually, and then uses the second call to store it into the table. The fact that the gethash call is repeated may not seem like a problem, but when the expression for the place is more complicated, it can become a much bigger deal.

Or= is a macro that fixes this problem. It does so by first checking whether its first argument, which should be a place, has a non-nil value.2 If it does, or= will just return that value. Otherwise it evaluates its remaining arguments until one of them evaluates to a non-nil value. Or= will then write the value of that expression into the place designated by the first argument. Here is the above code rewritten to use or=.

(let ((table (make-hash-table)))
  (defun fib (n)
    (or= (gethash n table)
         (if (<= 0 n 1)
             n
             (+ (fib (- n 1))
                (fib (- n 2)))))))

The implementation of or= looks very similar to the incf ‘template’3 that is used when writing a macro that works with places. Here is the implementation of or=:

(defmacro or= (place &rest args)
  (multiple-value-bind 
        (temps exprs stores store-expr access-expr) 
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores) (or ,access-expr ,@args)))
       ,store-expr)))

This time, the value being stored to the place is the or of the place and whatever other arguments are passed in. Since or evaluates its arguments lazily, we get the desired behavior of or= – evaluate the expression (and store the result) only if the place doesn’t have a value already. One problem with or= is that it determines if a value has already been stored in the place by testing if the value is non-nil. This can lead to a problem if the value stored in the place is actually nil! As an exercise, try writing a version of or= that takes advantage of the multiple values returned by gethash in order to properly handle nil.

In my next post, I am going to continue with the memoization example and demonstrate how to write a macro defmemo, which makes it easy to define memoized functions.

Defaddress

This post is the first part of a two part series exploring the emulator cl-6502. This post will cover how addressing modes are implemented in cl-6502. The second part will go over the implementation of the opcodes.

cl-6502 is an emulator for the MOS 6502 processor, used in devices such as the Apple II and the NES. As an emulator, cl-6502 has three distinct roles. It needs to be able to convert assembly code into machine code (assembly), it needs to be able to convert machine code back into assembly (disassembly), and it needs to be able to actually interpret the machine code (execution). By using macros in clever ways, cl-6502 is able to create multiple DSLs for defining different components of the emulator. One of those macros is defaddress, which makes it easy to add addressing modes to the emulator. First some background.

Assembly language has what are known as “addressing modes”. Depending on which addressing mode is being used, the argument to the instruction will be calculated in a different manner. The programmer is able to specify different addressing modes by using slightly different syntaxes. As an example here is the same jump instruction just with two different addressing modes:

JMP $0
JMP ($0)

From here on out, I’m going to use the term “operand” to refer to the value given to the instruction before the addressing mode has been taken into account and the term “argument” to refer to the value after the addressing mode has been considered. As you should be able to tell, both instructions above are passed the same operand of zero, but because they are using different addressing modes, they will calculate their arguments in two different ways.

Since the first instruction doesn’t use any extra syntax (except the dollar sign which just means base 16), it uses “absolute” addressing. With absolute addressing the argument is the same as the operand.1 The first instruction can be read as, continue execution at the instruction at address zero.

Since the second instruction has parens around the operand, it uses what is known as “indirect” addressing. For indirect addressing, the operand is actually the memory location of the argument.2 The second instruction can be read as, get the address that is stored at address zero, and continue execution at the instruction at that location in memory. Assuming the value 123 was stored at address zero, the operand would be zero, the argument would be 123, and the instruction would cause execution to be resumed at the instruction at location 123.

In total there are 13 different addressing modes for the 6502. In order to make it easy to define all of these different addressing modes, cl-6502 creates a macro defaddress. Defaddress is a DSL for the sole purpose of defining addressing modes. Each one of the main arguments to defaddress handles one of the jobs (assembly/disassembly/execution) that an emulator has to perform with respect to the addressing mode. As to what the defaddress DSL looks like, here is the code that defines the absolute addressing mode.

(defaddress absolute (:reader "^_$" :writer "$~{~2,'0x~}")
  (get-word (cpu-pc cpu)))

The code above has three distinct parts. The first piece is the reader, which is used to parse the assembly code:

 "^_$" 

The reader argument is a regular expression that recognizes the syntax of the addressing mode being defined, in this case aboslute addressing. The regex is a normal perl compatible regex except it may use an underscore to match (and capture) an operand. The regex above matches a lone operand, which is exactly the syntax for absolute addressing. After the reader is the writer:

"$~{2,'0x~}"

The writer is a format string that is able to reproduce the original assembly (with the proper syntax for the addressing mode) from the machine code. The writer for absolute addressing says to print the operand as a zero padded, two digit, hexadecimal number. Basically, it just prints the lone operand in assembly language without any additional syntax. Since there is no extra syntax, that means the generated code is using absolute addressing.

The last part is the body. The body is a block of code that calculates the argument from the operands.3 For absolute addressing the body is:

 
(get-word (cpu-pc cpu)) 

When this code is ran, the variable cpu will be bound to an object representing the current state of the cpu. The pc of the cpu normally points to the current instruction being executed, but cl-6502 uses a slight trick. By incrementing the pc, it will now point to the first operand of the instruction! All the body does is take the value of the pc (which is the address of the argument/operand), and looks up the value at that address4 to get the actual argument.

As a second example of defaddress, here is the code for indirect addressing:

(defaddress indirect (:reader "^\\(_\\)$" 
                      :writer "($~{~2,'0x~})")
  (get-word (get-word (cpu-pc cpu)) t))

There are only a few differences between the code for indirect and absolute addressing. In the reader and writer, there are now an extra pair of parens around the operand. This is because the syntax for indirect addressing is an operand surrounded by parens. Another difference is with the body. Since there is an extra layer of indirection with indirect addressing, there is an additional call to get-word. For indirect addressing, the body says to calculate the argument, get the value of the pc (the address of the operand or the address of the address of the argument), get the value at that address (the operand or the address of the argument), and then get the value at that address (the actual argument).

Since I have already shown you some examples of how to use defaddress, I am now going to explain how defaddress works. Here is the complete definition of defaddress:

(defmacro defaddress (name (&key reader writer cpu-reg)
                      &body body)
  `(progn 
    (defmethod reader ((mode (eql ',name)))
      ,(cl-ppcre:regex-replace-all 
         "_" reader "([^,()#&]+)"))
     (defmethod writer ((mode (eql ',name))) ,writer)
     (push ',name *address-modes*)
     (defun ,name (cpu) ,@body)
     (defun (setf ,name) (value cpu)
       ,(if cpu-reg
            `(setf ,@body value)
            `(setf (get-byte ,@body) value)))))

I’m going to break down the code for defaddress one part at a time. After explaining a piece does, I will show you what the expansion of that piece looks like when defining absolute addressing. The first part of defaddress handles the reader:

 (defmethod reader ((mode (eql ',name)))
   ,(cl-ppcre:regex-replace-all "_" reader "([^,()#&]+)")) 

This part generates code which will define a method on the generic (virtual) function reader. Reader takes in the name of the mode as an argument and is supposed to return a regex (a true perl compatible regex, i.e. no underscores) that will recognize the mode and extract the operands:

(reader 'absolute)
=> "^([^,()#&]+)$"

To produce the method, defaddress just takes the reader argument, substitutes the underscore with a regex that can be used to recognize operands, and uses that as the value reader should return for the mode being defined. Here is what the piece of code expands into for absolute addressing:

(defmethod reader ((mode (eql 'absolute))) "^([^,()#&]+)$")

The next part does pretty much the exact same thing, only for the writer:

(defmethod writer ((mode (eql ',name))) ,writer)

It generates the code for a method for the generic function writer. Since the format string is used unmodified, defaddress just inserts the string into the body of the function. There result winds up being:

(defmethod writer ((mode (eql 'absolute))) "$~{~2,'0x~}")

Next up is the piece:

 (push ',name *address-modes*) 

This piece of code adds the mode being defined to a list of all of the addressing modes. The list is used to find all of the addressing modes that match the syntax of a given instruction. The snippet simply expands into:

(push 'absolute *address-modes*)

Now for the most important part of defaddress – the code that handles the body:

 (defun ,name (cpu) ,@body) 

It just puts the body inside of a function named by the addressing mode. The function is supposed to take the in the current state of the cpu as an object and return the argument used for the current instruction. Note that the variable cpu is available to the body. This is how the body of defaddress is able to access the cpu object. The expansion winds up looking like:

(defun absolute (cpu) (get-word (cpu-pc cpu)))

There is just one more part, a setf function for the addressing mode:

(defun (setf ,name) (value cpu)
  ,(if cpu-reg
       `(setf ,@body value)
       `(setf (get-byte ,@body) value)))

This code generates a setf function, basically a way to modify the argument of the instruction. Many instructions not only use the argument, but they store a new value to the memory location of the argument. The setf function defined by defaddress is just a way to do that. I’m not going to go in depth about it, but this is the only piece of code that uses the cpu-reg argument. The cpu-reg argument is just used to smooth out some differences between different addressing modes. The code generated by the above code winds up looking like:

(defun (setf absolute) (value cpu)
   (setf (get-byte (get-word (cpu-pc cpu))) value))

As I just said, the setf function defined can be used to set the value of the argument. To do it for absolute addressing, get the operand and set the value at that memory location.56

And that is pretty much everything there is to know about defaddress. In the next post I am going to talk a bout defasm, a macro that makes it easy to define different instructions for the emulator. It piggybacks off of the information provided by defaddress in order to handle all of the instructions in all of the different possible addressing modes.

Once-only

One of the most common mistakes made when writing macros is evaluating one of the arguments multiple times. Not only can this be inefficient, but when side effects are involved, it leads to quirky behavior. Take a macro square, which simply squares its argument (in reality one would use a function to do this):

(defmacro square (x)
  `(* ,x ,x))

The above implementation is buggy. Why? Because the x argument is evaluated twice. To see why this is a bad thing, check out the following code:

(square (incf a))

The above winds up expanding into:

(* (incf a) (incf a))

Which is buggy since it increments a twice. A way to fix this problem is to bind the value of x to a gensym, and then use that gensym throughout the rest of the macro. Here is a bug free definition of square that uses with-gensyms:

(defmacro square (x)
  (with-gensyms (gx)
    `(let ((,gx ,x))
       (* ,gx ,gx))))

Is there a way to automate this? Yes, there is, by using a macro called once-onlyOnce-only is a relatively complicated macro, but it eliminates lots of boilerplate code. Once-only takes a list of expressions, generally arguments to a macro, and makes sure they are evaluated only once in the final macro expansion. Here is an implementation of once-only based on the one from Practical Common Lisp:

(defmacro once-only ((&rest names) &body body)
  (let ((gensyms (loop for n in names collect (gensym))))
    `(with-gensyms (,@gensyms)
      `(let (,,@(loop for g in gensyms
                      for n in names
                      collect ``(,,g ,,n)))
        ,(let (,@(loop for n in names
                       for g in gensyms
                       collect `(,n ,g)))
           ,@body)))))

In order to explain how once-only works, I’m first going to show how to rewrite square using it. From there I will show what square looks like after once-only has been expanded. After that I will show what the macro expansion of square looks like. Finally, I will give an explanation as to what is going on. If you are reading on a computer, I strongly recommed you open this page in another window so you can follow along with the code and the explanation at the same time. Here is an implementation of square that uses once-only:

(defmacro square (x)
  (once-only (x)
    `(* ,x ,x)))

Here is what square looks like after once-only has been expanded inline:

(defmacro square (x)
  (with-gensyms (#:g830)
    `(let (,`(,#:g830 ,x))
       ,(let ((x #:g830))
          `(* ,x ,x)))))

So a usage of square such as the following:

(square (incf x))

will wind up looking like the code below after macro expansion.

(let ((#:g831 (incf x)))
  (* #:g831 #:g831))

So what the heck is going on? In line 2 of once-only, it creates a list of gensyms, one for each of the expressions that should only be evaluated once. We then take these gensyms and on line 3, generate code that will bind them to fresh gensyms. That generated code becomes line 2 of square after once-only has been expanded. We need to do this because we are writing a macro that writes a macro or code that writes code that writes code. So, after once-only has been expanded, square’s body will contain a use of with-gensyms which will bind a bunch of gensyms to new gensyms every time square is ran. These fresh gensyms will eventually be the ones used to store the value of the expressions we want to be evaluated once only.

Now for lines 4-6. By using the double backquote, this code generates code that will generate code that will be part of the expansion of square. Lines 4-6 of once-only become line 3 of the definition of square, which becomes line 1 of the expansion of square. Basically the little segment

``(,,g ,,n)

says to generate code that will generate code (double backquote), that will be a list containing the value of the value of g, and the value of the value of n. The value of g will be one of the gensyms we created in once-only. From line 3 of square after once-only has been expanded, we see that this gensym was #:g830. The value of #:g830 will be another gensym, whatever it was bound to by with-gensyms. From the code will can see that this gensym was #:g831. The value of n will be one of the arguments to once-only. From the original code for square we see that the only argument to once-only is x. Then the value of x, or the value of the value of n, will be whatever is passed as the argument to the square macro, in this case (incf x). Ultimately the code looks like this as it goes through the multiple expansions:

``(,,g ,,n) => `(,#:g830 ,x) => (#:g831 (incf x))

Lines 4-6 take a list of expressions similar to those in the middle of the above process, splices them into a let by using the comma-at, then evaluates each one of them by using the comma in order to evaluate them once more. This works because the single comma in ,,@ actually applies to every element in the spliced list. Here is an example that demonstrates this:

``(,,@ '(x y z)) => `(,x ,y ,z)

Then on line 3 of square after once-only has been expanded, we wind up with the comma followed by a backquote which wind up canceling each other out. So this is how lines 4-6 of once-only get us line 3 of square which then gives us line 1 of the expansion of square.

Now for lines 7-10 of once-only. These lines generate lines 4 and 5 of the code for square after once-only has been expanded. All these lines do is generate code that will bind the given names to the gensyms that will contain their values at runtime. In this case we want to bind x to the gensym #:g831. Since the value of #:g830 is #:g831, we can just bind x to the value of #:g830. Then we just evaluate the body in this environment. By doing this, we bind x to an expression that will give us the same value as the expression previously contained in x! And that is how once-only ultimately works. In the expansion of square, we bind #:g831 to the value of (incf x). Then we bind x to #:g831 so any where we insert the expression x, we get #:g831, a gensym which is bound to the value of the expression that was initially bound to x, but only evaluated once.

Ultimately, once-only is a fairly useful macro. Like with-gensyms it is a utility for writing other macros. Once-only greatly reduces boiler plate and complexity in cases where it is used. It is because of these reasons once-only is one of the most popular macros out there.

Efficiently Building Lists

It is a common problem to need to build up a list of objects and keep all of the objects in the same order they were generated in. The idiom I see everyone use to solve this problem is to push the objects onto the head of a list as they are generated. Unfortunately, doing this reverses the initial order of the elements, so in order to preserve the ordering, a call to either reverse or nreverse is needed. For example, here is code from Alexandria that converts an alist into a plist: 1

(defun alist-plist (alist)
  (let (plist)
    (dolist (pair alist)
      (push (car pair) plist)
      (push (cdr pair) plist))
    (nreverse plist)))

A better method to build up a list is to add the elements to the tail of the list. By adding them to the tail, you avoid having to reverse the list later. This is the same technique that some implementations of Loop use for the “collect” clause in order to efficiently collect objects into lists.  In order to build up a list this way, you need to keep track of the head of the list (what you are ultimately going to return) and the tail of the list (where you add the next element). Doing all of that by hand would be painful. Instead we can write a macro to handle that for us automagically. Here is an implementation of accum2, a macro which makes accumulating elements easy and efficient.

(defmacro accum (accfn &body body)
  (let ((ghead (gensym "HEAD"))
        (gtail (gensym "TAIL"))
        (garg  (gensym "ARG")))
    `(let* ((,ghead (list nil))
            (,gtail ,ghead))
       (macrolet ((,accfn (,garg)
                    `(setf ,',gtail
                           (setf (cdr ,',gtail)
                                 (list ,,garg)))))
         ,@body
         (cdr ,ghead)))))

Accum is an example of a macro that writes a macro, or code that writes code that writes code. In order to explain how accum works, I am going to show how to rewrite alist-plist using accum and then explain how accum expands into code that does what we desire. Here is an implementation of alist-plist that uses accum instead of using nreverse:

(defun alist-plist (alist)
  (accum a
    (dolist (pair alist)
      (a (car pair))
      (a (cdr pair)))))

The above is both simpler and more efficient than the original version! Now for an explanation as to how it works. Accum works by generating a macrolet binding. Macrolet allows for the creation of lexically scoped macros. The lexically scoped macro is named by whatever is passed in as the “accfn” argument to accum, in this case “a”. That macro, named “a”, will take its argument and generate code that will evaluate the argument, and add the result of that to the tail of the list. Similar to how push is a macro that takes an argument and generates code that adds the result of evaluating that to the front of the list passed in as its second argument. After accum is expanded, the code will look like the following:

(defun alist-plist (alist)
  (let* ((#:head1460 (list nil))
         (#:tail1461 #:head1460))
    (macrolet ((a (#:arg1462)
                 `(setf #:tail1461
                        (setf (cdr #:tail1461)
                              (list ,#:arg1462)))))
      (dolist (pair alist)
        (a (car pair))
        (a (cdr pair)))
      (cdr #:head1460))))

As you should be able to see, “a” becomes a local macro which does what I described above.

I find accum to be one of many similar macros. Macros that abstract out a common pattern, and at the same time, are more efficient than what a human would normally write by hand. While many of these macros may be trivial, they both simplify code and make it faster!