Loops in Lisp Part 4: Series

This is part four of Loops in Lisp. Follow one of the following links for part one, two, or three).

One of the many advantages of programming in a functional style (by this, I mean manipulating your data through the operations, map, fold, and filter) is that your program winds up being made up a bunch of tiny and composable pieces. Since each piece is so small, usually only a few lines each, it becomes trivial to unit test the entire program. Additionally, it is easy to express new features as just the composition of several existing functions. One disadvantage of programming through map and friends, is that there is fairly large time penalty for allocating the intermediate results. For example, every time filter is called on a list, a new list needs to be allocated. These costs add up pretty quickly and can make a functional program much slower than its imperative equivalent.

One solution to this problem is laziness. Instead of allocating a new list every time an operation is performed on a list, you instead keep track of all of the transformations made on the list. Then when you fold over the list, you perform all of the transformations as you are folding over it. By doing this, you don’t need to allocate intermediate lists. Although laziness doesn’t allocate any intermediate lists, there is still a small cost for keeping track of the laziness. An alternative solution that makes functional programming just as fast as imperative programming is provided by the Series library.1 Series lets you write your program in a functional style without any runtime penalty at all!

Personally, the Series library is my favorite example of the magic that can be pulled off with macros. In short, Series works by taking your functional code and compiling it down into a single loop. In this loop, there is one step per transformation performed on the original list. The loop iterates over the values of the original sequence on at a time. On each iteration, the loop takes a single element, performs all of the transformations performed on the list on that single element, and then accumulates that value into the result according to the folding operation. This loop requires no additional memory allocation at runtime, and their is no time penalty either! As an example, here is a program that sums the first N squares, written using Series:

(defun integers ()
  "Returns a 'series' of all of the natural numbers."
  (declare (optimizable-series-function))
  (scan-range :from 1))

(defun squares ()
  "Returns a 'series' of all of the square numbers."
  (declare (optimizable-series-function))
  (map-fn t 
          (lambda (x) (* x x)) 
          (integers)))

(defun sum-squares (n)
  "Returns the sum of the first N square numbers."
  (collect-sum (subseries (squares) 0 n)))

(sum-squares 10)
=> 385

The above code certainly looks functional, there are no side effects in sight. Now let’s look at the code generated by Series. Here is what the macroexpansion of collect-sum looks like:

(common-lisp:let* ((#:out-969 n))
  (common-lisp:let ((#:numbers-966
                     (coerce-maybe-fold (- 1 1) 'number))
                    #:items-967
                    (#:index-965 -1)
                    (#:sum-959 0))
    (declare (type number #:numbers-966)
             (type (integer -1) #:index-965)
             (type number #:sum-959))
    (tagbody
       #:ll-970
       (setq #:numbers-966
             (+ #:numbers-966
                (coerce-maybe-fold 1 'number)))
       (setq #:items-967
             ((lambda (x) (* x x)) #:numbers-966))
       (incf #:index-965)
       (locally
          (declare (type nonnegative-integer #:index-965))
         (if (>= #:index-965 #:out-969)
             (go end))
         (if (< #:index-965 0)
             (go #:ll-970)))
       (setq #:sum-959 (+ #:sum-959 #:items-967))
       (go #:ll-970)
     end)
    #:sum-959))

What series does it looks at the entire lifetime of the sequence from its creation until it is folded. It uses this information to build the above loop which simultaneously generates the original sequence, maps over it, filters elements out of it, and folds it into the final result. Here is the breakdown of the expansion. Lines 1-9 are just initialization. They define all of the variables the loop will be using and set them to their starting values. The important variables to keep track of are #:NUMBERS-966, #:ITEMS-967, and #:SUM-959. As the code “iterates” over the original sequence, #:NUMBERS-966 is the value of the original sequence, #:ITEMS-967 is the square of that value, and #:SUM-959 is the sum of the squares so far. The rest of the code is the actual loop.

The loop first takes #:NUMBERS-966, the previous value of the sequence, and increments it in order to set it to current value of the sequence (since the sequence is the range from 1 to infinity). Next the loop takes the square of #:NUMBERS-966 to get the ith square number and stores that in #:ITEMS-967. Then the loop checks if it ha taken more than N elements out of the sequence, and if so, terminates. Finally the loop takes the value in #:ITEMS-967 and accumulates that into #:SUM-959.

Although the imperative version is equivalent to the original functional code, it is much faster than the functional code if the functional code were to allocate intermediate results or use laziness. This idea of turning transformations on a list into a loop doesn’t just work for this simple example, it also works for much more complicated programs. I just find it incredible that Series is able to take such pretty code and compile it into code that is extremely fast.

Loops in Lisp Part 3: Iterate

This is part 3 of Loops in Lisp. For part 1 on how you can build any kind of looping construct you want out of just goto and macros, click here. For part 2 on Loop, click here.

The Iterate library is pretty awesome. It provides a macro iterate (and an alias for it, iter) that is basically a Lispy version of loop. The most obvious consequence of this is that iterate uses a lot more parens than loop does:

;; Loop code
(loop for i from 1 to 10
      collect i)

;; Iterate code
(iter (for i from 1 to 10)
      (collect i))

Even though all of the extra parens make iterate much uglier than loop, they give iterate all of the advantages of Lisp syntax. One such advantage is the ability to embed iterate clauses within Lisp code and vice versa. While you can’t do this with loop, you can do it with iterate because the syntax of iterate is so similar to the syntax of ordinary Lisp code. Here is what happens when you try to embed a collect clause within Lisp code with loop and with iterate:1

;; Not valid loop code.
(loop for i from 1 to 10
      do (when (evenp i)
           (collect i)))

;; Valid iterate code
(iter (for i from 1 to 10)
      (when (evenp i)
        (collect i)))

Although the ability to seamlessly go between Lisp code and iterate is pretty awesome, the greatest feature provided by iterate is also the entire reason why Lisp syntax has so many parens in the first place. Lisp syntax (and by extension iterate) makes it easy to write macros! Because of this, you can add pretty much any feature you want to iterate. As a simple example, here’s how you could define an iterate clause specifically for looping over the digits of a number:2

(defun digits (n)
  "Returns a list of the digits of N."
  (map 'list #'digit-char-p (princ-to-string n)))

(defmacro-clause (for var in-digits-of n)
  `(for ,var in (digits ,n)))

And here is how you would use it:

(iter (for i in-digits-of 123)
      (sum i))
=> 6

I cannot express how awesome this is. If you want an iterate clause for iterating over SQL queries, you can add it. If you want an iterate clause for looping over your own custom data structure, you can add it. You can add any feature you want all because iterate allows for the use of macros!

Personally, I prefer to use iterate over loop. Even though it is uglier, it is much more extensible than loop because it decides to use a Lisp like syntax.

Getting Places

This post will serve as an introduction to writing macros that work with places. I will refer back to it whenever I examine a macro which deals with places.

Places are an incredible part of Common Lisp. In short, a “place” is any location that can hold a value. The obvious example of a place is a variable. Less obvious examples include the elements of an array, or the slots of an object. What makes the concept of places special is that Common Lisp provides a standard interface for reading and writing to them. You can write macros on top of this interface that work for every kind of place. As an example, look at the macro incf. It takes a place as an argument, adds one to its value, and stores the new value back into the place. If you want to increment a variable x, you would use:

(incf x)

And if you wanted to increment the element at index x of a sequence, you would use:

(incf (elt seq x))

They use the exact same syntax even though a variable is very different from an element of a sequence. Because it takes advantage of the interface for places, incf will work on any place, be it a variable, the slot of an object, or a user defined place.

So at this point you are probably wondering how does incf work and more generally, how do you write macros that use places? To write such a macro, you need to use the function get-setf-expansion.1 Get-setf-expansion takes an expression representing a place and returns a total of five values (if you are unfamiliar with multiple values, see my post on multiple-value-bind). Altogether, these five values tell you everything you need to know about the place in order to read and write to it.

To show you how you are supposed to use get-setf-expansion, I’m first going to demonstrate how you could use it to write the expansion of incf by hand. After that, I will show code that will automate this, which winds up being an implementation of incf. Let’s start by writing the expansion of the example above. The one where the element of a sequence is being incremented. To write the expansion of that by hand, you would first call get-setf-expansion to obtain all of the information:2

(get-setf-expansion '(elt seq x))

In SBCL this call will return the following values:

;; (1) temps
(#:seq1017 #:x1016)

;; (2) exprs
(seq x) 

;; (3) stores
(#:new1015) 

;; (4) store-expr
(sb-kernel:%setelt #:seq1017 #:x1016 #:new1015) 

;; (5) access-expr
(elt #:seq1017 #:x1016))

From now on, I will refer to each value returned by get-setf-expansion by the name in the comment before it (e.g. temps refers to the first value).

In order to uniquely identify the element of a sequence (the place we are working with in this example), you need two things. You need the sequence itself and the index into the sequence. That is exactly what the two expressions in exprs evaluate to! Since incf needs to use these values multiple times, the two values have to be bound to gensyms in order to prevent multiple evaluation (see my post on once-only for why multiple evaluation is a problem). You are supposed to bind the values of the expressions to the gensyms in temps so that the other expressions returned by get-setf-expansion can use those gensyms to easily determine the place being worked with. The bindings need to be made with let* because it is possible for an expression in exprs to refer to the value of a previous expression in exprs. So the first part of the expansion will bind all of the symbols in temps to values of the expressions in exprs with let*:

(let* ((#:seq1017 seq) (#:x1016 x))
  ...)

Now the gensyms in temps can be used to uniquely identify the place. As I mentioned previously, the other expressions can now easily determine the place through the gensyms. For example, access-expr can be used to retrieve the value currently in the place. Since the place we are dealing with is the element of a sequence,  access-expr is just a call to elt using the gensyms in temps as the arguments. We are going to use access-expr in a moment, but first I have to talk about how to write to the place.

In order to write to the place, you need to use stores and store-exprStores is a list of gensyms that need to be bound to the values that are to be stored in the place (it is possible for a single place to hold multiple values).  In this case we want to bind the gensym in stores to one plus the value already in the place. We can easily obtain the value in the place through access-expr. Once the gensyms have been bound, you can use store-expr to actually write the values in stores to the place. Notice how store-expr is a call to an internal SBCL function sb-kernel:setelt% that uses the gensyms in temps and stores as arguments. Presumably sb-kernel:setelt% sets the element of a sequence. After adding the binding for the gensym in stores and store-expr, we wind up with the final expansion which looks like:3

(let* ((#:seq1017 seq) 
       (#:x1016 x) 
       (#:new1015 (+ 1 (elt #:seq1017 #:x1016))))
  (sb-kernel:%setelt #:seq1017 #:x1016 #:new1015))
 

To review, the above code first binds the gensyms in temps to the values of the expressions in exprs. This allows access-expr and store-expr to use the gensyms in temps in order to determine the place being worked with. Then the code uses access-expr to retrieve the value, adds one to that, and binds that value to the gensym in stores. This is because the value of the gensym in stores is ultimately going to be the one written to the place. Finally the code evaluates store-expr in order to actually store the value in the gensym into the place.

Now here is one possible implementation of incf,4 which is code for everything we just did by hand. I called it incf% so that it doesn’t have the same name as the builtin version.

(defmacro incf% (place)
  (multiple-value-bind
        (temps exprs stores store-expr access-expr)
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores) (+ 1 ,access-expr)))
       ,store-expr)))

The above code first binds the five values returned by get-setf-expansion to variables. It then generates a let* binding which binds the symbols in temps to the expressions in exprs and also binds the gensym in stores to one plus the result of evaluating access-expr. Finally the above code splices in store-expr to actually write the value. And that is everything there is to incf.

Incf is but a single example of what can be done with places. In the next couple of posts, I plan to cover some really cool macros that encapsulate a bunch of common patterns related to places.

Defasm

This post is the second part of a two part series exploring the emulator cl-6502. If you haven’t read the first part exploring the implementation of addressing modes in cl-6502, you can find it here.

This post is going to go over how cl-6502 implements the instruction set of the 6502. Most of the work in defining the instruction set is done by a single macro, defasm. But before I can go into the details of defasm, I have to explain how cl-6502 represents instructions.

cl-6502 represents each instruction as a function inside an array called *array-funs*. The function for a specific instruction is indexed by that instruction’s opcode.1 To execute an instruction, cl-6502 looks up the opcode of the current instruction and calls the function at that location inside of *array-funs*. There is also a second array, *opcode-metadata*, which keeps track of some metadata about each instruction such as the number of bytes each one takes up. All defasm does is make it easy to generate all of the functions and metadata that wind up inside of those two arrays.

To show you just how easy it is to implement instructions with defasm, here is the implementation of the adc (add with carry) instruction:

(defasm adc (:docs "Add to Accumulator with Carry")
    ((#x61 6 2 indirect-x)
     (#x65 3 2 zero-page)
     (#x69 2 2 immediate)
     (#x6d 4 3 absolute)
     (#x71 5 2 indirect-y)
     (#x75 4 2 zero-page-x)
     (#x79 4 3 absolute-y)
     (#x7d 4 3 absolute-x))

  (let ((result (+ (cpu-ar cpu) 
                   (getter) 
                   (status-bit :carry))))
    (set-flags-if 
      :carry (> result #xff)
      :overflow (overflow-p result (cpu-ar cpu) (getter))
      :negative (logbitp 7 result)
      :zero (zerop (wrap-byte result)))
    (setf (cpu-ar cpu) (wrap-byte result))))

There are two main parts to the above code. The first part specifies all of the addressing modes the instruction is compatible with along with the metadata for each variant of the instruction (there is a different version of the instruction for every possible addressing mode the instruction can be used with).

After that is the body – the code that actually implements the instruction being defined. The body is responsible for setting all of the appropriate flags and memory locations to the values they should have after executing the instruction. Make sure you note that just like in defaddress, the variable cpu can be used in the body to reference an object that represents the current state of the cpu.

Defasm takes these two pieces, and generates one lambda expression for each variant of the instruction. All of the generated lambda expressions use the same body, except defasm generates some additional code that allows the body to work across all of the different addressing modes.

Now to get into the specifics of the DSL. In the addressing mode part of the DSL, there are four pieces of metadata that need to be associated with each version of the instruction. The first part is the opcode, the machine code representation of the instruction. Next up is the number of cycles it takes for the instruction to execute. After that is the size of the instruction, the number of bytes it takes up in memory. Last is the name of the addressing mode used for that specific variant of the instruction. As an example, here is the metadata for the adc instruction in the indirect-x addressing mode:

(#x61 6 2 indirect-x)

What it is saying is that this version of the instruction has the opcode #x61, takes six cycles to run, takes two bytes in memory, and uses the indirect-x addressing mode. The fact that when an instruction is used in different addressing modes, it uses a different number of clock cycles and takes up a different amount of space is one reason why different addressing modes are provided in assembly language.

For the body, defasm does something very clever to have the body work for every possible addressing modes. Within the body, the functions getter and setter are bound to local functions that can be used to obtain and modify the argument to the instruction. For each variant of the instruction, defasm generates the definition of these two functions differently so that they will always calculate the correct argument for the given addressing mode.

For example, in the version of adc that uses immediate addressing, getter will just return the value of the operand, but in the version that uses absolute addressing, getter will use the operand as an address and look up the value at that location in memory. In the definition of the adc instruction above, the body uses getter to obtain the argument, adds that to the value in the accumulator, adds in the carry, and then sets all of the appropriate flags and registers depending on the final value it winds up with. Since getter and setter work across all of the different addressing modes, so does the body!

Now let’s look at the actual implementation of defasm:

(defmacro defasm (name (&key (docs "") raw-p (track-pc t))
                  modes &body body)
  `(progn

     ,@(loop for (op cycles bytes mode) in modes collect
         `(setf (aref *opcode-meta* ,op) 
                ',(list name docs cycles bytes mode)))

     ,@(loop for (op cycles bytes mode) in modes collect
         `(setf (aref *opcode-funs* ,op)
                (lambda (cpu)
                  (incf (cpu-pc cpu))
                  (flet ((getter ()
                           ,(make-getter name mode raw-p))
                         (setter (x)
                           (setf (,mode cpu) x)))
                    ,@body)
                  ,@(when track-pc
                     `((incf (cpu-pc cpu) ,(1- bytes))))
                  (incf (cpu-cc cpu) ,cycles))))))

As usual, I’m going to show a snippet of the implementation of defasm and then show what the macroexpansion of that piece looks like. The first part of the implementation handles the addressing modes and metadata:

(loop for (op cycles bytes mode) in modes collect
  `(setf (aref *opcode-meta* ,op) 
         ',(list name docs cycles bytes mode)))

For each addressing mode, this generates code which will store a list containing the metadata into the proper place in the *opcode-meta* array. In other words it takes each part that looks like:

(#x61 6 2 indirect-x)
and generates code that looks like:
(setf (aref *opcode-meta* #x61)
     '(adc "Add to accumulator with carry" 6 2 indirect-x))

After that we have the part that will generate the actual lambda expressions for the functions that will be stored in *array-funs*:

(loop for (op cycles bytes mode) in modes collect
  `(setf (aref *opcode-funs* ,op)
         (lambda (cpu)
           (incf (cpu-pc cpu))
           (flet ((getter ()
                   ,(make-getter name mode raw-p))
                 (setter (x)
                   (setf (,mode cpu) x)))
             ,@body)
          ,@(when track-pc
              `((incf (cpu-pc cpu) ,(1- bytes))))
          (incf (cpu-cc cpu) ,cycles))))

This code loops over all of the metadata for the different addressing modes and uses this information to generate the expression for each variant of the instruction. As mentioned previously, the function will be stored by the variant’s opcode. As for the actual function itself, it does something along these lines. First, it advances the pc. This is done so that the pc now points to the operand of the instruction. By doing this, the job of defaddress becomes much easier since it can use the pc as a pointer to the operand. Next, the function evaluates the body in an environment with getter and setter bound to functions that can be used to read and write to the argument. After that it will advance the pc forward to the next instruction (unless track-pc was false, which happens for instructions that modify the pc themselves such as jumps). Finally, the function will increment the cycle count by the number of cycles it takes the instruction to execute.

The definitions of getter and setter are really just calls to the function with the same name as the addressing mode associated with the variant of the instruction.2 If you look back at the last post, you will see that defaddress automatically generates these “mode” functions. All they do is calculate the effective argument for the given addressing mode! Exactly what getter does. As an example of what the expansion looks like, here is the lambda expression generated for the adc instruction in the indirect-x addressing mode.

(setf (aref *opcode-funs* #x61)
      (lambda (cpu)
        (incf (cpu-pc cpu))
        (flet ((getter ()
                 (get-byte (indirect-x cpu)))
               (setter (x)
                 (setf (indirect-x cpu) x)))
         (let ((result (+ (cpu-ar cpu) 
                          (getter) 
                          (status-bit :carry))))
          (set-flags-if :carry (> result 255) 
                        :overflow (overflow-p result 
                                              (cpu-ar cpu)
                                              (getter))
                        :negative (logbitp 7 result) 
                        :zero (zerop (wrap-byte result)))
          (setf (cpu-ar cpu) (wrap-byte result))))
        (incf (cpu-pc cpu) 1)
        (incf (cpu-cc cpu) 6)))

And that’s all there is to defasm! There are a couple really cool things you should note about cl-6502. First off, the macros expand into a lot of code. The definition of adc at the beginning of this post expands into roughly 500 lines of code. Here is a link to a gist of it if you want to see it. More incredibly, cl-6502 implements an entire emulator in under 1000 lines of code. cl-6502 is a fantastic example of how effective macros are at creating concise DSLs.

CL-WHO

The CL-WHO library is one of many that make it easy to generate HTML. When first checking out CL-WHO, I thought that it must be at least a couple thousand lines of code long. As it turns out, it is only several hundred. At the core of CL-WHO is with-html-output (hence the name “who”), which allows one to use a DSL for generating HTML. With-html-output works like all macros. At a high level, it takes your code in the DSL, and compiles it into Lisp code that will generate the desired HTML (here are some examples).

With-html-output does little by itself. Almost all of the work is done by three functions: tree-to-templateprocess-tag, and convert-tag-to-string-list. Most of the time these functions call one another recursively in order to process the entire DSL. It is possible to customize the control flow, but I will get to that later. Here is a link to a gist of the output after tracing all of the functions and using macroexpand-1 to expand a simple example. The example only shows what happens when using basic tags in CL-WHO. It doesn’t show what happens when you embed Lisp expressions in the DSL.

Tree-to-template is the entry point into the compilation process. It loops through the DSL tree, and builds up a “template”. A template is just a list of strings and expressions. The strings in the template contain HTML and are meant to be printed directly to the HTML stream. On the other hand, the expressions contain code that will print objects to that stream. Eventually all of this output put together will be the desired HTML. As tree-to-template loops through the code, if it sees a non-tag, it will just collect that into the list.1 When it does see a tag, tree-to-template calls process-tag to process it, and then concatenates the result of that into the template.

Process-tag will extract the tag as well as the attribute list. Everything after the attribute list makes up the “body” of the tag. How is the body processed? Well, process-tag takes an additional argument, body-fn, which specifies how to process the body. Process-tag will then call convert-tag-to-string-list with the tag, the attribute list, the body, and body-fn. The reason process-tag doesn’t process the body itself is that convert-tag-to-string-list is a generic function, making it possible to customize its behavior.

Convert-tag-to-string-list handles the semantics of the tag. It takes all of the arguments above and returns a list of strings and expressions. That list will become part of the template eventually returned by tree-to-template. Since convert-tag-to-string-list is a generic function, it is possible to extend it. The documentation for CL-WHO gives an example of how one could create a custom “red” tag which changes the font of the text to red, even though there is no such HTML tag. In the default case, convert-tag-to-string-list takes the result from calling body-fn on body and surrounds that with strings for the opening and closing tags. Since convert-tag-to-string-list is customizable, it is possible to change the control flow and ultimately how the body is processed. If one wanted, they could make a call to process-tag, but with a different body-fn argument, changing how the code is processed further up (down?) the tree.

With the help of these functions with-html-output converts the DSL into a template. The template is then turned into a list of valid Lisp code.2 With-html-output then wraps the body with a macrolet which binds several local macros. These macros are: htm, fmt, esc, str. These macros make it easier to print objects to the stream used for output. Check out the documentation for CL-WHO for a more detailed description of what these macros do.

I really like CL-WHO. It is a great example of an embedded DSL. A Lisp hacker still has full access to Lisp from within what is a great DSL. The only problem I have with CL-WHO is the inability to have macros expand into code for the DSL. This decreases the flexibility of CL-WHO somewhat. The only way I can see to fix this problem would be to use a library such as :hu.dwim.walker to expand all of the macros in advance.