Defasm

This post is the second part of a two part series exploring the emulator cl-6502. If you haven’t read the first part exploring the implementation of addressing modes in cl-6502, you can find it here.

This post is going to go over how cl-6502 implements the instruction set of the 6502. Most of the work in defining the instruction set is done by a single macro, defasm. But before I can go into the details of defasm, I have to explain how cl-6502 represents instructions.

cl-6502 represents each instruction as a function inside an array called *array-funs*. The function for a specific instruction is indexed by that instruction’s opcode.1 To execute an instruction, cl-6502 looks up the opcode of the current instruction and calls the function at that location inside of *array-funs*. There is also a second array, *opcode-metadata*, which keeps track of some metadata about each instruction such as the number of bytes each one takes up. All defasm does is make it easy to generate all of the functions and metadata that wind up inside of those two arrays.

To show you just how easy it is to implement instructions with defasm, here is the implementation of the adc (add with carry) instruction:

(defasm adc (:docs "Add to Accumulator with Carry")
    ((#x61 6 2 indirect-x)
     (#x65 3 2 zero-page)
     (#x69 2 2 immediate)
     (#x6d 4 3 absolute)
     (#x71 5 2 indirect-y)
     (#x75 4 2 zero-page-x)
     (#x79 4 3 absolute-y)
     (#x7d 4 3 absolute-x))

  (let ((result (+ (cpu-ar cpu) 
                   (getter) 
                   (status-bit :carry))))
    (set-flags-if 
      :carry (> result #xff)
      :overflow (overflow-p result (cpu-ar cpu) (getter))
      :negative (logbitp 7 result)
      :zero (zerop (wrap-byte result)))
    (setf (cpu-ar cpu) (wrap-byte result))))

There are two main parts to the above code. The first part specifies all of the addressing modes the instruction is compatible with along with the metadata for each variant of the instruction (there is a different version of the instruction for every possible addressing mode the instruction can be used with).

After that is the body – the code that actually implements the instruction being defined. The body is responsible for setting all of the appropriate flags and memory locations to the values they should have after executing the instruction. Make sure you note that just like in defaddress, the variable cpu can be used in the body to reference an object that represents the current state of the cpu.

Defasm takes these two pieces, and generates one lambda expression for each variant of the instruction. All of the generated lambda expressions use the same body, except defasm generates some additional code that allows the body to work across all of the different addressing modes.

Now to get into the specifics of the DSL. In the addressing mode part of the DSL, there are four pieces of metadata that need to be associated with each version of the instruction. The first part is the opcode, the machine code representation of the instruction. Next up is the number of cycles it takes for the instruction to execute. After that is the size of the instruction, the number of bytes it takes up in memory. Last is the name of the addressing mode used for that specific variant of the instruction. As an example, here is the metadata for the adc instruction in the indirect-x addressing mode:

(#x61 6 2 indirect-x)

What it is saying is that this version of the instruction has the opcode #x61, takes six cycles to run, takes two bytes in memory, and uses the indirect-x addressing mode. The fact that when an instruction is used in different addressing modes, it uses a different number of clock cycles and takes up a different amount of space is one reason why different addressing modes are provided in assembly language.

For the body, defasm does something very clever to have the body work for every possible addressing modes. Within the body, the functions getter and setter are bound to local functions that can be used to obtain and modify the argument to the instruction. For each variant of the instruction, defasm generates the definition of these two functions differently so that they will always calculate the correct argument for the given addressing mode.

For example, in the version of adc that uses immediate addressing, getter will just return the value of the operand, but in the version that uses absolute addressing, getter will use the operand as an address and look up the value at that location in memory. In the definition of the adc instruction above, the body uses getter to obtain the argument, adds that to the value in the accumulator, adds in the carry, and then sets all of the appropriate flags and registers depending on the final value it winds up with. Since getter and setter work across all of the different addressing modes, so does the body!

Now let’s look at the actual implementation of defasm:

(defmacro defasm (name (&key (docs "") raw-p (track-pc t))
                  modes &body body)
  `(progn

     ,@(loop for (op cycles bytes mode) in modes collect
         `(setf (aref *opcode-meta* ,op) 
                ',(list name docs cycles bytes mode)))

     ,@(loop for (op cycles bytes mode) in modes collect
         `(setf (aref *opcode-funs* ,op)
                (lambda (cpu)
                  (incf (cpu-pc cpu))
                  (flet ((getter ()
                           ,(make-getter name mode raw-p))
                         (setter (x)
                           (setf (,mode cpu) x)))
                    ,@body)
                  ,@(when track-pc
                     `((incf (cpu-pc cpu) ,(1- bytes))))
                  (incf (cpu-cc cpu) ,cycles))))))

As usual, I’m going to show a snippet of the implementation of defasm and then show what the macroexpansion of that piece looks like. The first part of the implementation handles the addressing modes and metadata:

(loop for (op cycles bytes mode) in modes collect
  `(setf (aref *opcode-meta* ,op) 
         ',(list name docs cycles bytes mode)))

For each addressing mode, this generates code which will store a list containing the metadata into the proper place in the *opcode-meta* array. In other words it takes each part that looks like:

(#x61 6 2 indirect-x)
and generates code that looks like:
(setf (aref *opcode-meta* #x61)
     '(adc "Add to accumulator with carry" 6 2 indirect-x))

After that we have the part that will generate the actual lambda expressions for the functions that will be stored in *array-funs*:

(loop for (op cycles bytes mode) in modes collect
  `(setf (aref *opcode-funs* ,op)
         (lambda (cpu)
           (incf (cpu-pc cpu))
           (flet ((getter ()
                   ,(make-getter name mode raw-p))
                 (setter (x)
                   (setf (,mode cpu) x)))
             ,@body)
          ,@(when track-pc
              `((incf (cpu-pc cpu) ,(1- bytes))))
          (incf (cpu-cc cpu) ,cycles))))

This code loops over all of the metadata for the different addressing modes and uses this information to generate the expression for each variant of the instruction. As mentioned previously, the function will be stored by the variant’s opcode. As for the actual function itself, it does something along these lines. First, it advances the pc. This is done so that the pc now points to the operand of the instruction. By doing this, the job of defaddress becomes much easier since it can use the pc as a pointer to the operand. Next, the function evaluates the body in an environment with getter and setter bound to functions that can be used to read and write to the argument. After that it will advance the pc forward to the next instruction (unless track-pc was false, which happens for instructions that modify the pc themselves such as jumps). Finally, the function will increment the cycle count by the number of cycles it takes the instruction to execute.

The definitions of getter and setter are really just calls to the function with the same name as the addressing mode associated with the variant of the instruction.2 If you look back at the last post, you will see that defaddress automatically generates these “mode” functions. All they do is calculate the effective argument for the given addressing mode! Exactly what getter does. As an example of what the expansion looks like, here is the lambda expression generated for the adc instruction in the indirect-x addressing mode.

(setf (aref *opcode-funs* #x61)
      (lambda (cpu)
        (incf (cpu-pc cpu))
        (flet ((getter ()
                 (get-byte (indirect-x cpu)))
               (setter (x)
                 (setf (indirect-x cpu) x)))
         (let ((result (+ (cpu-ar cpu) 
                          (getter) 
                          (status-bit :carry))))
          (set-flags-if :carry (> result 255) 
                        :overflow (overflow-p result 
                                              (cpu-ar cpu)
                                              (getter))
                        :negative (logbitp 7 result) 
                        :zero (zerop (wrap-byte result)))
          (setf (cpu-ar cpu) (wrap-byte result))))
        (incf (cpu-pc cpu) 1)
        (incf (cpu-cc cpu) 6)))

And that’s all there is to defasm! There are a couple really cool things you should note about cl-6502. First off, the macros expand into a lot of code. The definition of adc at the beginning of this post expands into roughly 500 lines of code. Here is a link to a gist of it if you want to see it. More incredibly, cl-6502 implements an entire emulator in under 1000 lines of code. cl-6502 is a fantastic example of how effective macros are at creating concise DSLs.

Defaddress

This post is the first part of a two part series exploring the emulator cl-6502. This post will cover how addressing modes are implemented in cl-6502. The second part will go over the implementation of the opcodes.

cl-6502 is an emulator for the MOS 6502 processor, used in devices such as the Apple II and the NES. As an emulator, cl-6502 has three distinct roles. It needs to be able to convert assembly code into machine code (assembly), it needs to be able to convert machine code back into assembly (disassembly), and it needs to be able to actually interpret the machine code (execution). By using macros in clever ways, cl-6502 is able to create multiple DSLs for defining different components of the emulator. One of those macros is defaddress, which makes it easy to add addressing modes to the emulator. First some background.

Assembly language has what are known as “addressing modes”. Depending on which addressing mode is being used, the argument to the instruction will be calculated in a different manner. The programmer is able to specify different addressing modes by using slightly different syntaxes. As an example here is the same jump instruction just with two different addressing modes:

JMP $0
JMP ($0)

From here on out, I’m going to use the term “operand” to refer to the value given to the instruction before the addressing mode has been taken into account and the term “argument” to refer to the value after the addressing mode has been considered. As you should be able to tell, both instructions above are passed the same operand of zero, but because they are using different addressing modes, they will calculate their arguments in two different ways.

Since the first instruction doesn’t use any extra syntax (except the dollar sign which just means base 16), it uses “absolute” addressing. With absolute addressing the argument is the same as the operand.1 The first instruction can be read as, continue execution at the instruction at address zero.

Since the second instruction has parens around the operand, it uses what is known as “indirect” addressing. For indirect addressing, the operand is actually the memory location of the argument.2 The second instruction can be read as, get the address that is stored at address zero, and continue execution at the instruction at that location in memory. Assuming the value 123 was stored at address zero, the operand would be zero, the argument would be 123, and the instruction would cause execution to be resumed at the instruction at location 123.

In total there are 13 different addressing modes for the 6502. In order to make it easy to define all of these different addressing modes, cl-6502 creates a macro defaddress. Defaddress is a DSL for the sole purpose of defining addressing modes. Each one of the main arguments to defaddress handles one of the jobs (assembly/disassembly/execution) that an emulator has to perform with respect to the addressing mode. As to what the defaddress DSL looks like, here is the code that defines the absolute addressing mode.

(defaddress absolute (:reader "^_$" :writer "$~{~2,'0x~}")
  (get-word (cpu-pc cpu)))

The code above has three distinct parts. The first piece is the reader, which is used to parse the assembly code:

 "^_$" 

The reader argument is a regular expression that recognizes the syntax of the addressing mode being defined, in this case aboslute addressing. The regex is a normal perl compatible regex except it may use an underscore to match (and capture) an operand. The regex above matches a lone operand, which is exactly the syntax for absolute addressing. After the reader is the writer:

"$~{2,'0x~}"

The writer is a format string that is able to reproduce the original assembly (with the proper syntax for the addressing mode) from the machine code. The writer for absolute addressing says to print the operand as a zero padded, two digit, hexadecimal number. Basically, it just prints the lone operand in assembly language without any additional syntax. Since there is no extra syntax, that means the generated code is using absolute addressing.

The last part is the body. The body is a block of code that calculates the argument from the operands.3 For absolute addressing the body is:

 
(get-word (cpu-pc cpu)) 

When this code is ran, the variable cpu will be bound to an object representing the current state of the cpu. The pc of the cpu normally points to the current instruction being executed, but cl-6502 uses a slight trick. By incrementing the pc, it will now point to the first operand of the instruction! All the body does is take the value of the pc (which is the address of the argument/operand), and looks up the value at that address4 to get the actual argument.

As a second example of defaddress, here is the code for indirect addressing:

(defaddress indirect (:reader "^\\(_\\)$" 
                      :writer "($~{~2,'0x~})")
  (get-word (get-word (cpu-pc cpu)) t))

There are only a few differences between the code for indirect and absolute addressing. In the reader and writer, there are now an extra pair of parens around the operand. This is because the syntax for indirect addressing is an operand surrounded by parens. Another difference is with the body. Since there is an extra layer of indirection with indirect addressing, there is an additional call to get-word. For indirect addressing, the body says to calculate the argument, get the value of the pc (the address of the operand or the address of the address of the argument), get the value at that address (the operand or the address of the argument), and then get the value at that address (the actual argument).

Since I have already shown you some examples of how to use defaddress, I am now going to explain how defaddress works. Here is the complete definition of defaddress:

(defmacro defaddress (name (&key reader writer cpu-reg)
                      &body body)
  `(progn 
    (defmethod reader ((mode (eql ',name)))
      ,(cl-ppcre:regex-replace-all 
         "_" reader "([^,()#&]+)"))
     (defmethod writer ((mode (eql ',name))) ,writer)
     (push ',name *address-modes*)
     (defun ,name (cpu) ,@body)
     (defun (setf ,name) (value cpu)
       ,(if cpu-reg
            `(setf ,@body value)
            `(setf (get-byte ,@body) value)))))

I’m going to break down the code for defaddress one part at a time. After explaining a piece does, I will show you what the expansion of that piece looks like when defining absolute addressing. The first part of defaddress handles the reader:

 (defmethod reader ((mode (eql ',name)))
   ,(cl-ppcre:regex-replace-all "_" reader "([^,()#&]+)")) 

This part generates code which will define a method on the generic (virtual) function reader. Reader takes in the name of the mode as an argument and is supposed to return a regex (a true perl compatible regex, i.e. no underscores) that will recognize the mode and extract the operands:

(reader 'absolute)
=> "^([^,()#&]+)$"

To produce the method, defaddress just takes the reader argument, substitutes the underscore with a regex that can be used to recognize operands, and uses that as the value reader should return for the mode being defined. Here is what the piece of code expands into for absolute addressing:

(defmethod reader ((mode (eql 'absolute))) "^([^,()#&]+)$")

The next part does pretty much the exact same thing, only for the writer:

(defmethod writer ((mode (eql ',name))) ,writer)

It generates the code for a method for the generic function writer. Since the format string is used unmodified, defaddress just inserts the string into the body of the function. There result winds up being:

(defmethod writer ((mode (eql 'absolute))) "$~{~2,'0x~}")

Next up is the piece:

 (push ',name *address-modes*) 

This piece of code adds the mode being defined to a list of all of the addressing modes. The list is used to find all of the addressing modes that match the syntax of a given instruction. The snippet simply expands into:

(push 'absolute *address-modes*)

Now for the most important part of defaddress – the code that handles the body:

 (defun ,name (cpu) ,@body) 

It just puts the body inside of a function named by the addressing mode. The function is supposed to take the in the current state of the cpu as an object and return the argument used for the current instruction. Note that the variable cpu is available to the body. This is how the body of defaddress is able to access the cpu object. The expansion winds up looking like:

(defun absolute (cpu) (get-word (cpu-pc cpu)))

There is just one more part, a setf function for the addressing mode:

(defun (setf ,name) (value cpu)
  ,(if cpu-reg
       `(setf ,@body value)
       `(setf (get-byte ,@body) value)))

This code generates a setf function, basically a way to modify the argument of the instruction. Many instructions not only use the argument, but they store a new value to the memory location of the argument. The setf function defined by defaddress is just a way to do that. I’m not going to go in depth about it, but this is the only piece of code that uses the cpu-reg argument. The cpu-reg argument is just used to smooth out some differences between different addressing modes. The code generated by the above code winds up looking like:

(defun (setf absolute) (value cpu)
   (setf (get-byte (get-word (cpu-pc cpu))) value))

As I just said, the setf function defined can be used to set the value of the argument. To do it for absolute addressing, get the operand and set the value at that memory location.56

And that is pretty much everything there is to know about defaddress. In the next post I am going to talk a bout defasm, a macro that makes it easy to define different instructions for the emulator. It piggybacks off of the information provided by defaddress in order to handle all of the instructions in all of the different possible addressing modes.

Debugging Lisp Part 5: Miscellaneous

This post is for all of the miscellaneous features that aren’t large enough to get their own individual posts. If you haven’t read all of them, here are the links to the previous posts on recompilation, inspection, class redefinition, and restarts.

One somewhat obscure tool for debugging is SBCL’s trace. SBCL’s trace goes way beyond what most other implementations provide. In SBCL, trace takes several additional keyword arguments.1 For example, trace accepts a keyword argument, :break. The expression passed in as the value of :break will be evaluated every time the traced function is called. When that expression evaluates to true, the debugger will be invoked. For example if you have a Fibonacci function:

(defun fib (n)
  (if (<= 0 n 1)
      n
      (+ (fib (- n 1))
         (fib (- n 2)))))

you can use trace to break specifically when fib is called with an argument of zero:

 

ezgif.com-optimize (4)

 

A bit of mangling has to be done (using sb-debug:arg) since the expression refers to variables within the fib function. Trace also accepts several variants of :break, such as :break-after, which evaluate the expression after the function has been called instead of before. There are also arguments :print and :print-after, which are like their break counterparts, only they print the value of the expression before/after the function is called. You could use :print-after to say, print the time (Unix time) whenever fib returns:

 

ezgif.com-optimize (6)

 

For a complete list of all of the arguments that trace can accept, check out this page of the SBCL manual.

Another relatively unknown group of features are the cross referencing commands. The cross referencing commands are commands which lookup all of the places where something is referenced. All of the bindings for the cross referencing commands begin with C-c C-w.2 The cross referencing command I find myself using the most, “slime-who-calls”, which is bound to C-c C-w C-c, shows you all of the places where a function is called from. Here is what it would look like if you were to lookup all of the places where the scan function is used in cl-ppcre and then scroll through them:3

 

ezgif.com-optimize (7)

 

Slime-who-calls makes it easy to figure out how a function is supposed be used. You can pull up all of the usages of a function and just look at them. There are also several analogs of slime-who-calls. There is slime-who-macroexpands (C-c C-w RET), which pulls up all of the places where a macro is used and there is also slime-who-references (C-C C-w C-r) which is the same thing only for variables.

Another important feature is how to pull up the source code of a function on the stack while inside of the debugger. One way to do it is to press the ‘v’ key with the cursor on the frame you want to view the source of. An alternative option is to use M-p (the alt key and the ‘p’ key at the same time) and M-n to move up and down the stack frame. When using these commands instead of the normal C-p and C-n for movement, Slime will automatically pull up the source code as you are moving through the stack. Here is what it would look like if you were to pass a malformed regular expression to cl-ppcre (so that an error will be signaled and you will enter the debugger), and then scroll through the stack trace using M-n:

 

ezgif.com-optimize (9)

 

And now for the most common of all IDE commands, jumping to source. I was recently talking with someone and he mentioned that the only feature he uses an IDE for is to easily find definitions in source code. In Emacs with Slime, it is possible to jump to the source of pretty much anything by hitting “M-.” (that is the control key followed by a period). This command works on functions, variables, classes, and more! When you jump to the source of a generic (virtual) function, you are given a list of all of the different methods that implement that function. For example if you weret to jump to the source of create-matcher-aux (the function that does most of the work in cl-ppcre), here is what you would see:

 

ezgif.com-optimize (8)

 

To jump back to wherever you were previously, use “M-,”.

And that is everything you should need to know about debugging Common Lisp.

Debugging Lisp Part 4: Restarts

This is part four of Debugging Lisp. Here are the previous parts on recompilation, inspecting, and class redefinition. The next post on miscellaneous debugging techniques can be found here.

Many languages provide error handling as two distinct parts, throw and catch. Throw is the part that detects something has gone wrong and in some way signals that an error has occurred. In the process, throw creates an exception object which contains information about the problem. The other part, catch, takes the exception object signaled by throw and attempts to recover from the error.

The issue with throw/catch is that throw acts like an unconditional goto to the catch part. Because of this, all of the state information that is available when throw is used that is not given to the exception object is lost. This becomes problematic if the code that catches the error wants to use some information about what happened when the error occurred in order to recover.

As an example, let’s say you are implementing a library which takes several files and parses a list of numbers from each one. One way to implement this library is as two functions. The first function, read-file, will read the contents of a single file and return a list of the results. The second, read-files, will take a list of files and return a list of the contents of each one. Here is what the code for those two functions might look like if they did not have any error handling:

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          collect (parse-integer line))))

(defun read-files (files)
  (loop for file in files
        collect (read-file file)))

To test the library you have two files. The first file contains the numbers 5, 10, 15, 20, 25 and the second contains 5, 10, 15, 20, a, 30, 40. In order to make sure your library handles errors properly, you decided to put a line which is just “a” in the second file. As it stands, parse-integer will signal an error when it comes across this line. To make testing the library easy, you have stored a list containing the pathnames of the two files in the variable *files*. Here is what happens when you try running the library on the two files:

(read-files *files*)

=> ERROR

An error occurred due to the “a” in the second file. As the designer of the library you have to decide what should happen when a situation like this one comes up. Below are several different options you could choose from if your language only provided catch/throw.

Your first option is to just skip the entry that caused the error. To do this, you could use handler-case, Common Lisp’s version of catch:

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (handler-case (parse-integer line)
                 ;; C is the name being used to
                 ;; refer to the exception object.
                 (error (c)
                   (declare (ignore c))
                   nil))
          collect it)))

(read-files *files*)

=> ((5 10 15 20 25) (5 10 15 20 30 40))

Another option is to provide a dynamic variable1 which the user of the library can use to specify a value to be used in place of the malformed entry:

(defvar *malformed-value* nil)

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (handler-case (parse-integer line)
                 (error (c)
                   (declare (ignore c))
                   *malformed-value*))
          collect it)))

(let ((*malformed-value* :malformed))
  (read-files *files*))

=> ((5 10 15 20 25) (5 10 15 20 :MALFORMED 30 40))

A third option is to have read-files catch the error and skip the entire file with the malformed entry:

(defun read-files (files)
  (loop for file in files
        when (handler-case (read-file file)
               (error (c)
                 (declare (ignore c))
                 nil))
        collect it))

(read-files *files*)

=> ((5 10 15 20 25))

Your last option is to let the user of the library handle the exception themselves:

(handler-case (read-files *files*)
  (error (c) (do-something)))

To the user, this last option is somewhat useful because it gives them some flexibility into how the error is handled. As mentioned above, the problem with doing this is that it becomes difficult for the user to properly recover from the error. If the user just wanted to skip the one corrupted file, there is no easy way to for them to do that due to the fact that by the time their error handling code is ran, execution would have left read-files. This means all of the state information, such as the remaining files that need to be read from, is completely lost by the time their code catches the exception.

Another problem with catch/throw is that of the four possible ways above you could handle the problem, you only get to choose one of them. Any one of them is in conflict with all of the others. Again, this is because throw acts like goto. Once you decide where you are jumping to, you have no control over what happens next. And, if you let the user handle the error themselves, they have no easy way to handle the error gracefully since all of the state information is lost.

This is where restarts come in. In Common Lisp, catch is provided as two separate pieces: handlers and restarts. A handler is bound by the user of the library in order to specify what should happen when an exception is thrown and a restart is defined by the library in order to provide a recovery option to the user. If you are using a language that supports restarts, you could implement the first three options above as restarts. Then when a user is using the library, they will get to select which of those restarts they want to have run when an error occurs. If they do not want to use any of the restarts, they can run their own code instead. Here is the code for the file reading library, but reimplemented to support three different restarts, one for each of the first three ways to handle errors.

(defun ask (string)
  (princ string *query-io*)
  (read *query-io*))

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (restart-case (parse-integer line)
                 (use-value (value)
                   :report "Use a new value."
                   :interactive (lambda ()
                                  (list (ask "Value: ")))
                   value)
                 (skip-entry ()
                   :report "Skip the entry."
                   nil))
          collect it)))

(defun read-files (files)
  (loop for file in files
        when (restart-case (read-file file)
               (skip-file ()
                 :report "Skip the entire file."
                 nil))
        collect it))

;;; The three functions below are predefined
;;; handlers for the most common ways the user
;;; will interact with the restarts.
(defun skip-entry (c)
  (declare (ignore c))
  (invoke-restart 'skip-entry))

(defun skip-file  (c)
  (declare (ignore c))
  (invoke-restart 'skip-file))

(defun use-value-handler (value)
  (lambda (c)
    (declare (ignore c))
    (invoke-restart 'use-value value)))

A restart is defined with the macro restart-case, and invoked by the function invoke-restart. This is a bit of a simplification, but invoking a restart is effectively equivalent to jumping to the body of the restart from where the error was signaled. This means that all of the state stored on the stack before the restart was established is still available when the restart is invoked. This gives the user of the library much finer grained control over what happens when an error is thrown.

To specify what should happen, all the user needs to do is use the macro handler-bindHandler-bind takes an error type and a handler (which should be a function) to call when an error of that type is thrown. The handler can then call invoke-restart in order to invoke one of the restarts provided by the library. As part of the library, there is one handler per restart provided, since those are the most common kinds of handlers. Here is what happens when each of the handlers are used when running the library on the two test files:

(handler-bind ((error #'skip-entry))
  (read-files files*))

=> ((5 10 15 20 25) (5 10 15 20 30 40))

(handler-bind ((error #'skip-file))
  (read-files files*))

=> ((5 10 15 20 25))

(handler-bind ((error (use-value-handler 0)))
  (read-files files*))

=> ((5 10 15 20 25) (5 10 15 20 0 30 40))

The really cool thing about restarts is what happens when the user doesn’t handle the error. When this happens they will enter the Slime Debugger. From there they will be given a list of the restarts that are available to them and they will be able to invoke them as if the error had been handled in the first place! Here is what happens when a user doesn’t handle the error, and then invokes the skip-entry restart on the fly:

 

ezgif.com-optimize (3)

 

What’s really cool about this is that this “interactive restarting” can use it to implement breakpoints! As I said in Part 1, Common Lisp provides breakpoints as a function “break” instead of as a feature of the editor. Here is code that could be used to implement break:

(defun break (&optional (format-control "Break")
              &rest format-arguments)
   (with-simple-restart (continue "Return from BREAK.")
     (let ((*debugger-hook* nil))
       (invoke-debugger
         (make-condition 'simple-condition
           :format-control   format-control
           :format-arguments format-arguments))))
   nil)

The code for break works by signalling an error while providing a “continue” restart. This means that as soon as the function break is called, you will enter the debugger with a restart available which will continue normal execution. Exactly what a breakpoint actually is.

Restarts are another fantastic part of debugging Common Lisp. They give you better control over what happens when an error occurs. And, if your code doesn’t handle the error itself, you can still recover the process by using an interactive restart.

Debugging Lisp Part 3: Redefining Classes

This is part 3 of Debugging Common Lisp. If you haven’t read either of the previous parts, you can find part 1 here, and part 2 hereYou can find part 4, which is on restarts, here.

The Common Lisp Object System (CLOS) is pretty powerful. It gives you multiple inheritance, multiple dispatch, and many different ways to extend the behavior of methods. Underneath, most implementations use the Metaobject Protocol (MOP), a way of defining CLOS in terms of itself. As part of the MOP, classes are implemented as objects with several instance variables. Among those are variables that hold the class’s name, its superclasses, and a list of the class’s own instance variables. If you don’t believe me, take the point class from the previous post:

(defclass point ()
  ((x :accessor point-x :initarg :x :initform 0)
   (y :accessor point-y :initarg :y :initform 0)))

And use the Slime Inspector to inspect the point class object, which can be obtained by calling find-class:

 

ezgif.com-optimize

 

The advantage of using the MOP is that it makes it possible to fine tune the behavior of CLOS by using ordinary object-oriented programming. A great example of this is the filtered-functions library which adds arbitrary predicate based dispatch to CLOS. But enough about the MOP.1 In this post I’m going to talk about one tiny piece of CLOS, update-instance-for-redefined-class.

Update-instance-for-redefined-class is a method which is called whenever a class is redefined (at runtime). By overriding it, you can customize what exactly happens at that point in time. For example, let’s say you are using the above point class to represent complex numbers for some sort of simulation. As part of the simulation, you have a point object saved inside of the *location* variable:

 

ezgif.com-optimize (1)

 

After profiling the simulation, you find that one of the bottlenecks is complex multiplication. Since multiplication of complex numbers is much more efficient when they are represented in polar form, you decide that you want to change the implementation of the point class from Cartesian to polar coordinates. To do that (at runtime), all you need to do is run the following code:

(defmethod update-instance-for-redefined-class :before
     ((pos point) added deleted plist &key)
  (let ((x (getf plist 'x))
        (y (getf plist 'y)))
    (setf (point-rho pos) (sqrt (+ (* x x) (* y y)))
          (point-theta pos) (atan y x))))

(defclass point ()
  ((rho :initform 0 :accessor point-rho)
   (theta :initform 0 :accessor point-theta)))

(defmethod point-x ((pos point))
  (with-slots (rho theta) pos (* rho (cos theta))))

(defmethod point-y ((pos point))
  (with-slots (rho theta) pos (* rho (sin theta))))

Basically, the code extends update-instance-for-redefined-class to calculate the values of rho and theta for the polar implementation in terms of the variables x and y from the Cartesian one. After extending update-instance-for-redefined-class the code then redefines the class, causing all of the existing instances to be changed over to the new implementation.2 Finally, two methods are defined, point-x and point-y, which preserve the interface for the point class.3 After running the code and then inspecting the contents of *location*, you should see:

 

ezgif.com-optimize (2)

 

Even though the object inside of *location* is still the same object, it is now implemented using polar coordinates! To make sure that it was converted from Cartesian to polar correctly, you decide to call point-x on the object to check that the x-coordinate is still the same:

 

ezgif.com-crop

Amazingly, all of the code continues to work even though the implementation of an entire class was completely changed. So anytime you want to change the implementation of a class that is part of a service that needs to be up 24/7 and just happens to be written in Common Lisp, remember to use update-instance-for-redefined-class.

Debugging Lisp Part 2: Inspecting

This is part 2 of Debugging Lisp. If you haven’t read part 1 on dynamic recompilation, you can find it here. For the next post in the series on redefining classes, click here.

In this post I am going to discuss another tool used for debugging Common Lisp – the Slime Inspector. The Slime inspector makes it possible to manipulate objects directly from the repl. You can do many different things with it, including clicking on objects to look at their contents and being able to copy and paste objects in order to reuse them in future function calls.1 Let’s say you have the following point class:

(defclass point ()
  ((x :accessor point-x :initarg :x :initform 0)
   (y :accessor point-y :initarg :y :initform 0)))

If you were to make an instance of the above class:

(make-instance 'point :x 10 :y 20)

You can then right click on it and click on the “inspect” option, or just use the Emacs shortcut “C-c C-v TAB” to peek inside the object:

 

ezgif.com-optimize (5)

 

This will show you the current values of all of the instance variables of the object. Not only can you look at the object’s instance variables, you can modify them as well. Note that the power comes from being able to do all of this from within the debugger at runtime.

 

ezgif.com-optimize (7)

 

To make sure that the value of that object was actually changed, you can copy and paste the point object and then call the point-x function on it.

 

ezgif.com-crop (5)

 

One more really cool tool that hooks into the Inspector is the Slime Trace Dialog. The Slime Trace Dialog is like ordinary trace, but it also allows for inspection on the objects that were passed to or returned from the traced functions. For example, let’s say you are writing a tail call optimized function, sum, that sums all of the numbers in a list.

(defun sum (xs &optional (acc 0))
  (if (null xs)
      acc
      (sum (cdr xs) (+ (car xs) acc))))

(sum '(1 2 3))
=> 6

You can toggle the use the Slime Trace Dialog to trace sum by typing the shortcut “C-c M-t” and then typing in the name of function, “sum”. After tracing it and running the code, you can press “C-c T” to enter the interactive Trace Dialog buffer. From there you can press “G” to refresh it and obtain the most recent trace.

 

ezgif.com-crop (4)

 

The trace will look like the output from ordinary trace, except it will have some addition goodies. As I said above you can inspect all of the arguments and return values. You can also hide/show branches of the trace tree in order to make it easier to find what you are looking for.

 

ezgif.com-optimize (8)

 

The Slime Trace Dialog is invaluable when you have code which is passing lots of objects around and you aren’t exactly sure what the value of each variable in each object is. You can just use the Slime Trace Dialog and have it keep track of all of the information for you.

All in all, the Slime Inspector is another amazing part of the Common Lisp debugging tool set. It comes in handy when the program crashes and you are unaware of the current state of the program. When combined with the rest of the features for debugging Common Lisp, the Slime Inspector is just incredible.

Debugging Lisp Part 1: Recompilation

This post is the start of a series on how to debug Common Lisp code, specifically with Emacs, Slime, and SBCL. If you do not understand Common Lisp, you should still be able to follow along and recognize just how powerful the facilities provided by the Common Lisp debugger are. Nathan Marz asked me to write these posts since he thought many of the tools for debugging Common Lisp were pretty cool.

For the next post in the series on inspecting, click here.

The first thing you need to do in order to get started debugging Common Lisp is to set your Lisp’s optimization qualities. Optimization qualities are basically a group of settings which allow you to specify what the compiler should optimize for. These qualities include speed, space, compilation speed, safety, and debugging. If you do not run the code below, which tells the compiler to optimize for debugging, almost none of the examples in this post will work.

CL-USER> (declaim (optimize (debug 3)))
NIL

CL-USER> (your-program)
...

With the compiler optimized for debugging, it becomes possible to do pretty much everything at runtime. This post will show you how Tom, an experienced Lisp developer would debug and patch a buggy function at runtime. Let’s say that Tom has the following code which implements the well known Fibonacci function:

(defun fib (n)
  (if (<= 0 n 1)
      (/ 1 0)
      (+ (fib (- n 1))
         (fib (- n 2)))))

There’s just one problem, the code isn’t correct! Instead of returning n in the base case, the code winds up dividing by zero. When Tom tries to calculate the tenth Fibonacci with this code, a debugger window pops up because an error was signaled.

 

ezgif.com-crop

 

Realizing that he has entered the debugger, Tom wonders what has gone wrong. In order to find the bug, Tom decides to insert a breakpoint into the function.1 In Common Lisp, breakpoints are implemented as a function called ‘break’.2 To insert his breakpoint, Tom adds a call to break at the beginning of fib. After adding the breakpoint, Tom then puts his cursor next to one of the frames and hits the ‘r’ key in order to restart it. In this case, Tom decided to restart the frame where n was three.

 

ezgif.com-optimize

 

By restarting the frame, Tom basically traveled back in time to the beginning of the frame he restarted. After restarting the frame, the debugger immediately hits the breakpoint Tom had just added. From there Tom steps through the code by hitting the ‘s’ key. He eventually realizes that the base case is implemented incorrectly and that that is why he received the error.

 

ezgif.com-optimize (2)

 

After finding the source of the problem, similar to how he had previously inserted the breakpoint, Tom patches the code. He replaces the base case with n and removes the breakpoint he had previously inserted.

 

ezgif.com-optimize (3)

 

After recompiling the code, Tom once again restarts one of the frames. Since he was previously stepping through code, the debugger starts stepping through the frame Tom decided to restart. Tom just taps the ‘0’  (zero) key in order to invoke the step-continue restart3 and continue normal execution. Because Tom restarted a frame which occurred before the bug, and now that the bug is gone, the code runs as if there had never a bug in the first place!

 

ezgif.com-crop (3)

 

Let’s recap what happened. After the code signaled an error, Tom found himself in the debugger. Tom was able to insert a breakpoint and poke around until he found the source of the problem. After finding the problem, Tom patched the code and restarted the process from a point before it had signaled an error. Because Tom had corrected the code, after he restarted the frame, it acted as if nothing had ever gone wrong!

The ability to recompile code at runtime is just one of the many incredible features provided by Common Lisp. Next time, I’m going to talk about the Slime inspector, which makes it possible to look into and modify objects from within debugger.

Multiple-value-bind

Common Lisp is a pretty unique language. One of the many features that makes Common Lisp such an awesome language is multiple values. Yes, you read right. In Common Lisp it is possible for a function to return more than a single value. One example of a function that takes advantage of multiple values is floor. Floor takes a number as its argument and returns two values, whatever was passed in rounded down and the remainder.

(floor 3.5)
=>
3
0.5

When you use floor in the manner above, you get two values back, 3 as the first return value, and 0.5 as the second. What’s really cool is that the values besides the first are completely ignored unless you explicitly ask for them.1 This means you can pretend that floor returns only a single value as long as you don’t need the other ones. Notice how in the following example, the + function is not aware of the second value returned by floor:

(+ (floor 3.5) 10)
=> 13

Now you may be wondering, “How can I obtain other values besides the first one?”. Well, there are several macros for doing that, the main one being multiple-value-bind. To use multiple-value-bind, you specify a list of the variables you want to bind each value to, followed by the expression that will return multiple values. Let’s say you want to multiply the two values returned by floor together. Here is how you would do that with multiple-value-bind:

(multiple-value-bind (val remainder) (floor 3.5)
  (* val remainder))
=> 1.5

It is also easy to create your own function that returns multiple values. All you need to do is pass each value you want to return to the values function. Below is a function which returns both twice its argument and three times its argument:

(defun multiples (x)
  (values (* 2 x) (* 3 x)))

(multiples 10)
=>
20
30

There is just one more thing you need to know about multiple values. If the last call of a function is to another that returns multiple values, the first function will return all of the values the second one returns. If you were to write a function that doubles its argument and then uses floor to round it down, that function will return both values that are returned by floor.

(defun double-and-round-down (x)
  (floor (* 2 x)))

(double-and-round-down 5.25)
=>
10
0.5

This behavior may or may not be desired. The standard way to make sure your function only returns a single value is to wrap the function that returns multiple values with a call to valuesValues will pay attention only to the first value and will return just that and nothing else.

(defun double-and-round-down (x)
  (values (floor (* 2 x))))

(double-and-round-down 5.25)
=> 10

And that’s all you need to know to work with multiple values!

Hofeach

Last time I talked about mapeach, a macro which is a simple wrapper around mapcar. After using mapeach a couple times, I found that I wanted ‘each’ version of many other other functions, removefind, and count to name a few. One option I had was to write a macro for every single one of these functions. If I were to have done this, I would have wound up with ‘remove-each’, ‘find-each’, and so on. Instead I took door number two, creating a general macro which I call ‘hofeach’Hofeach, is just like mapeach, except it takes an extra argument for the HOF (higher order function), that you want to use. Below is one possible implementation of hofeach.

(defmacro hofeach (hof var list &body body)
  `(funcall ,hof (lambda (,var) ,@body) ,list))

Here is what code that uses hofeach as a fill in for mapeach looks like:

(hofeach #'mapcar x '(1 2 3)
  (* x x))

=> (1 4 9)

Now we get to specify which HOF we want to use! If we want to keep all of the numbers in a list that are even, here is how we could do that:1 2

(hofeach #'remove-if-not x '(1.2 5 7 2 3.5 6 9)
  (and (integerp x) (evenp x)))

=> (2 6)

So now that I have hofeach, I generally will use it instead of passing a complex lambda expression to a HOF. Most of the time I use hofeach with remove-if-not, but I have also used it with count-if as well. It gives code a nice down and to the right look, which I find pretty easy to read. You get to read the forms in the order that they appear. If you were to use a lambda expression instead, it becomes much more difficult to read since you have to jump around in order to read the code.

Mapeach

Many times when using mapcar, I find myself using a complex lambda expression for the function argument. This makes the code difficult to read since it breaks apart the flow. My code winds up looking like the following:

(mapcar (lambda (x)
          ...)
        list)

First you have to read the possibly massive lambda expression, then you finally find out what you are mapping over. As the lambda expression increases in length, it becomes harder and harder to read. A way to fix this is with the macro mapeach. Mapeach is a macro which is meant to be used when the lambda expression that would be passed to mapcar is much longer than the expression for the list. Mapeach works just like mapcar, but instead provides an alternative syntax which makes it easier to read when the lambda expression is complicated. Here is one possible implementation of mapeach:1

(defmacro mapeach (var list &body body)
  `(mapcar (lambda (,var) ,@body) ,list))

Mapeach, does two things to fix the problem. First it hides the lambda, making it easier to find the important parts of the code. Second, it inverts the order of the arguments, putting the simple list expression first and the complex body second. As a simple example of mapeach, here is how one could square each element in a list using it:

(mapeach x '(1 2 3)
  (* x x))

If one wanted to write the above code by using mapcar, it would look something like the following:

(mapcar (lambda (x)
          (* x x))
        '(1 2 3))

Although it doesn’t shine for this simple example, you can tell that mapeach makes the code a bit clearer. As the body for the lambda expression gets longer and longer, mapeach begins to make the code much easier to understand. I find that mapcar is nice to use only when the expression for the function is short. This happens either when you are either using a named function or you are using some sort of reader macro. Mapeach is another one of those macros that makes what seems like an insignificant difference. Even so, I find that it aids a lot in readability since it puts all of the simple parts in one place.