Defaddress

This post is the first part of a two part series exploring the emulator cl-6502. This post will cover how addressing modes are implemented in cl-6502. The second part will go over the implementation of the opcodes.

cl-6502 is an emulator for the MOS 6502 processor, used in devices such as the Apple II and the NES. As an emulator, cl-6502 has three distinct roles. It needs to be able to convert assembly code into machine code (assembly), it needs to be able to convert machine code back into assembly (disassembly), and it needs to be able to actually interpret the machine code (execution). By using macros in clever ways, cl-6502 is able to create multiple DSLs for defining different components of the emulator. One of those macros is defaddress, which makes it easy to add addressing modes to the emulator. First some background.

Assembly language has what are known as “addressing modes”. Depending on which addressing mode is being used, the argument to the instruction will be calculated in a different manner. The programmer is able to specify different addressing modes by using slightly different syntaxes. As an example here is the same jump instruction just with two different addressing modes:

JMP $0
JMP ($0)

From here on out, I’m going to use the term “operand” to refer to the value given to the instruction before the addressing mode has been taken into account and the term “argument” to refer to the value after the addressing mode has been considered. As you should be able to tell, both instructions above are passed the same operand of zero, but because they are using different addressing modes, they will calculate their arguments in two different ways.

Since the first instruction doesn’t use any extra syntax (except the dollar sign which just means base 16), it uses “absolute” addressing. With absolute addressing the argument is the same as the operand.1 The first instruction can be read as, continue execution at the instruction at address zero.

Since the second instruction has parens around the operand, it uses what is known as “indirect” addressing. For indirect addressing, the operand is actually the memory location of the argument.2 The second instruction can be read as, get the address that is stored at address zero, and continue execution at the instruction at that location in memory. Assuming the value 123 was stored at address zero, the operand would be zero, the argument would be 123, and the instruction would cause execution to be resumed at the instruction at location 123.

In total there are 13 different addressing modes for the 6502. In order to make it easy to define all of these different addressing modes, cl-6502 creates a macro defaddress. Defaddress is a DSL for the sole purpose of defining addressing modes. Each one of the main arguments to defaddress handles one of the jobs (assembly/disassembly/execution) that an emulator has to perform with respect to the addressing mode. As to what the defaddress DSL looks like, here is the code that defines the absolute addressing mode.

(defaddress absolute (:reader "^_$" :writer "$~{~2,'0x~}")
  (get-word (cpu-pc cpu)))

The code above has three distinct parts. The first piece is the reader, which is used to parse the assembly code:

 "^_$" 

The reader argument is a regular expression that recognizes the syntax of the addressing mode being defined, in this case aboslute addressing. The regex is a normal perl compatible regex except it may use an underscore to match (and capture) an operand. The regex above matches a lone operand, which is exactly the syntax for absolute addressing. After the reader is the writer:

"$~{2,'0x~}"

The writer is a format string that is able to reproduce the original assembly (with the proper syntax for the addressing mode) from the machine code. The writer for absolute addressing says to print the operand as a zero padded, two digit, hexadecimal number. Basically, it just prints the lone operand in assembly language without any additional syntax. Since there is no extra syntax, that means the generated code is using absolute addressing.

The last part is the body. The body is a block of code that calculates the argument from the operands.3 For absolute addressing the body is:

 
(get-word (cpu-pc cpu)) 

When this code is ran, the variable cpu will be bound to an object representing the current state of the cpu. The pc of the cpu normally points to the current instruction being executed, but cl-6502 uses a slight trick. By incrementing the pc, it will now point to the first operand of the instruction! All the body does is take the value of the pc (which is the address of the argument/operand), and looks up the value at that address4 to get the actual argument.

As a second example of defaddress, here is the code for indirect addressing:

(defaddress indirect (:reader "^\\(_\\)$" 
                      :writer "($~{~2,'0x~})")
  (get-word (get-word (cpu-pc cpu)) t))

There are only a few differences between the code for indirect and absolute addressing. In the reader and writer, there are now an extra pair of parens around the operand. This is because the syntax for indirect addressing is an operand surrounded by parens. Another difference is with the body. Since there is an extra layer of indirection with indirect addressing, there is an additional call to get-word. For indirect addressing, the body says to calculate the argument, get the value of the pc (the address of the operand or the address of the address of the argument), get the value at that address (the operand or the address of the argument), and then get the value at that address (the actual argument).

Since I have already shown you some examples of how to use defaddress, I am now going to explain how defaddress works. Here is the complete definition of defaddress:

(defmacro defaddress (name (&key reader writer cpu-reg)
                      &body body)
  `(progn 
    (defmethod reader ((mode (eql ',name)))
      ,(cl-ppcre:regex-replace-all 
         "_" reader "([^,()#&]+)"))
     (defmethod writer ((mode (eql ',name))) ,writer)
     (push ',name *address-modes*)
     (defun ,name (cpu) ,@body)
     (defun (setf ,name) (value cpu)
       ,(if cpu-reg
            `(setf ,@body value)
            `(setf (get-byte ,@body) value)))))

I’m going to break down the code for defaddress one part at a time. After explaining a piece does, I will show you what the expansion of that piece looks like when defining absolute addressing. The first part of defaddress handles the reader:

 (defmethod reader ((mode (eql ',name)))
   ,(cl-ppcre:regex-replace-all "_" reader "([^,()#&]+)")) 

This part generates code which will define a method on the generic (virtual) function reader. Reader takes in the name of the mode as an argument and is supposed to return a regex (a true perl compatible regex, i.e. no underscores) that will recognize the mode and extract the operands:

(reader 'absolute)
=> "^([^,()#&]+)$"

To produce the method, defaddress just takes the reader argument, substitutes the underscore with a regex that can be used to recognize operands, and uses that as the value reader should return for the mode being defined. Here is what the piece of code expands into for absolute addressing:

(defmethod reader ((mode (eql 'absolute))) "^([^,()#&]+)$")

The next part does pretty much the exact same thing, only for the writer:

(defmethod writer ((mode (eql ',name))) ,writer)

It generates the code for a method for the generic function writer. Since the format string is used unmodified, defaddress just inserts the string into the body of the function. There result winds up being:

(defmethod writer ((mode (eql 'absolute))) "$~{~2,'0x~}")

Next up is the piece:

 (push ',name *address-modes*) 

This piece of code adds the mode being defined to a list of all of the addressing modes. The list is used to find all of the addressing modes that match the syntax of a given instruction. The snippet simply expands into:

(push 'absolute *address-modes*)

Now for the most important part of defaddress – the code that handles the body:

 (defun ,name (cpu) ,@body) 

It just puts the body inside of a function named by the addressing mode. The function is supposed to take the in the current state of the cpu as an object and return the argument used for the current instruction. Note that the variable cpu is available to the body. This is how the body of defaddress is able to access the cpu object. The expansion winds up looking like:

(defun absolute (cpu) (get-word (cpu-pc cpu)))

There is just one more part, a setf function for the addressing mode:

(defun (setf ,name) (value cpu)
  ,(if cpu-reg
       `(setf ,@body value)
       `(setf (get-byte ,@body) value)))

This code generates a setf function, basically a way to modify the argument of the instruction. Many instructions not only use the argument, but they store a new value to the memory location of the argument. The setf function defined by defaddress is just a way to do that. I’m not going to go in depth about it, but this is the only piece of code that uses the cpu-reg argument. The cpu-reg argument is just used to smooth out some differences between different addressing modes. The code generated by the above code winds up looking like:

(defun (setf absolute) (value cpu)
   (setf (get-byte (get-word (cpu-pc cpu))) value))

As I just said, the setf function defined can be used to set the value of the argument. To do it for absolute addressing, get the operand and set the value at that memory location.56

And that is pretty much everything there is to know about defaddress. In the next post I am going to talk a bout defasm, a macro that makes it easy to define different instructions for the emulator. It piggybacks off of the information provided by defaddress in order to handle all of the instructions in all of the different possible addressing modes.

Debugging Lisp Part 5: Miscellaneous

This post is for all of the miscellaneous features that aren’t large enough to get their own individual posts. If you haven’t read all of them, here are the links to the previous posts on recompilation, inspection, class redefinition, and restarts.

One somewhat obscure tool for debugging is SBCL’s trace. SBCL’s trace goes way beyond what most other implementations provide. In SBCL, trace takes several additional keyword arguments.1 For example, trace accepts a keyword argument, :break. The expression passed in as the value of :break will be evaluated every time the traced function is called. When that expression evaluates to true, the debugger will be invoked. For example if you have a Fibonacci function:

(defun fib (n)
  (if (<= 0 n 1)
      n
      (+ (fib (- n 1))
         (fib (- n 2)))))

you can use trace to break specifically when fib is called with an argument of zero:

 

ezgif.com-optimize (4)

 

A bit of mangling has to be done (using sb-debug:arg) since the expression refers to variables within the fib function. Trace also accepts several variants of :break, such as :break-after, which evaluate the expression after the function has been called instead of before. There are also arguments :print and :print-after, which are like their break counterparts, only they print the value of the expression before/after the function is called. You could use :print-after to say, print the time (Unix time) whenever fib returns:

 

ezgif.com-optimize (6)

 

For a complete list of all of the arguments that trace can accept, check out this page of the SBCL manual.

Another relatively unknown group of features are the cross referencing commands. The cross referencing commands are commands which lookup all of the places where something is referenced. All of the bindings for the cross referencing commands begin with C-c C-w.2 The cross referencing command I find myself using the most, “slime-who-calls”, which is bound to C-c C-w C-c, shows you all of the places where a function is called from. Here is what it would look like if you were to lookup all of the places where the scan function is used in cl-ppcre and then scroll through them:3

 

ezgif.com-optimize (7)

 

Slime-who-calls makes it easy to figure out how a function is supposed be used. You can pull up all of the usages of a function and just look at them. There are also several analogs of slime-who-calls. There is slime-who-macroexpands (C-c C-w RET), which pulls up all of the places where a macro is used and there is also slime-who-references (C-C C-w C-r) which is the same thing only for variables.

Another important feature is how to pull up the source code of a function on the stack while inside of the debugger. One way to do it is to press the ‘v’ key with the cursor on the frame you want to view the source of. An alternative option is to use M-p (the alt key and the ‘p’ key at the same time) and M-n to move up and down the stack frame. When using these commands instead of the normal C-p and C-n for movement, Slime will automatically pull up the source code as you are moving through the stack. Here is what it would look like if you were to pass a malformed regular expression to cl-ppcre (so that an error will be signaled and you will enter the debugger), and then scroll through the stack trace using M-n:

 

ezgif.com-optimize (9)

 

And now for the most common of all IDE commands, jumping to source. I was recently talking with someone and he mentioned that the only feature he uses an IDE for is to easily find definitions in source code. In Emacs with Slime, it is possible to jump to the source of pretty much anything by hitting “M-.” (that is the control key followed by a period). This command works on functions, variables, classes, and more! When you jump to the source of a generic (virtual) function, you are given a list of all of the different methods that implement that function. For example if you weret to jump to the source of create-matcher-aux (the function that does most of the work in cl-ppcre), here is what you would see:

 

ezgif.com-optimize (8)

 

To jump back to wherever you were previously, use “M-,”.

And that is everything you should need to know about debugging Common Lisp.

Debugging Lisp Part 4: Restarts

This is part four of Debugging Lisp. Here are the previous parts on recompilation, inspecting, and class redefinition. The next post on miscellaneous debugging techniques can be found here.

Many languages provide error handling as two distinct parts, throw and catch. Throw is the part that detects something has gone wrong and in some way signals that an error has occurred. In the process, throw creates an exception object which contains information about the problem. The other part, catch, takes the exception object signaled by throw and attempts to recover from the error.

The issue with throw/catch is that throw acts like an unconditional goto to the catch part. Because of this, all of the state information that is available when throw is used that is not given to the exception object is lost. This becomes problematic if the code that catches the error wants to use some information about what happened when the error occurred in order to recover.

As an example, let’s say you are implementing a library which takes several files and parses a list of numbers from each one. One way to implement this library is as two functions. The first function, read-file, will read the contents of a single file and return a list of the results. The second, read-files, will take a list of files and return a list of the contents of each one. Here is what the code for those two functions might look like if they did not have any error handling:

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          collect (parse-integer line))))

(defun read-files (files)
  (loop for file in files
        collect (read-file file)))

To test the library you have two files. The first file contains the numbers 5, 10, 15, 20, 25 and the second contains 5, 10, 15, 20, a, 30, 40. In order to make sure your library handles errors properly, you decided to put a line which is just “a” in the second file. As it stands, parse-integer will signal an error when it comes across this line. To make testing the library easy, you have stored a list containing the pathnames of the two files in the variable *files*. Here is what happens when you try running the library on the two files:

(read-files *files*)

=> ERROR

An error occurred due to the “a” in the second file. As the designer of the library you have to decide what should happen when a situation like this one comes up. Below are several different options you could choose from if your language only provided catch/throw.

Your first option is to just skip the entry that caused the error. To do this, you could use handler-case, Common Lisp’s version of catch:

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (handler-case (parse-integer line)
                 ;; C is the name being used to
                 ;; refer to the exception object.
                 (error (c)
                   (declare (ignore c))
                   nil))
          collect it)))

(read-files *files*)

=> ((5 10 15 20 25) (5 10 15 20 30 40))

Another option is to provide a dynamic variable1 which the user of the library can use to specify a value to be used in place of the malformed entry:

(defvar *malformed-value* nil)

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (handler-case (parse-integer line)
                 (error (c)
                   (declare (ignore c))
                   *malformed-value*))
          collect it)))

(let ((*malformed-value* :malformed))
  (read-files *files*))

=> ((5 10 15 20 25) (5 10 15 20 :MALFORMED 30 40))

A third option is to have read-files catch the error and skip the entire file with the malformed entry:

(defun read-files (files)
  (loop for file in files
        when (handler-case (read-file file)
               (error (c)
                 (declare (ignore c))
                 nil))
        collect it))

(read-files *files*)

=> ((5 10 15 20 25))

Your last option is to let the user of the library handle the exception themselves:

(handler-case (read-files *files*)
  (error (c) (do-something)))

To the user, this last option is somewhat useful because it gives them some flexibility into how the error is handled. As mentioned above, the problem with doing this is that it becomes difficult for the user to properly recover from the error. If the user just wanted to skip the one corrupted file, there is no easy way to for them to do that due to the fact that by the time their error handling code is ran, execution would have left read-files. This means all of the state information, such as the remaining files that need to be read from, is completely lost by the time their code catches the exception.

Another problem with catch/throw is that of the four possible ways above you could handle the problem, you only get to choose one of them. Any one of them is in conflict with all of the others. Again, this is because throw acts like goto. Once you decide where you are jumping to, you have no control over what happens next. And, if you let the user handle the error themselves, they have no easy way to handle the error gracefully since all of the state information is lost.

This is where restarts come in. In Common Lisp, catch is provided as two separate pieces: handlers and restarts. A handler is bound by the user of the library in order to specify what should happen when an exception is thrown and a restart is defined by the library in order to provide a recovery option to the user. If you are using a language that supports restarts, you could implement the first three options above as restarts. Then when a user is using the library, they will get to select which of those restarts they want to have run when an error occurs. If they do not want to use any of the restarts, they can run their own code instead. Here is the code for the file reading library, but reimplemented to support three different restarts, one for each of the first three ways to handle errors.

(defun ask (string)
  (princ string *query-io*)
  (read *query-io*))

(defun read-file (file)
  (with-open-file (in file :direction :input)
    (loop for line = (read-line in nil in)
          until (eq in line)
          when (restart-case (parse-integer line)
                 (use-value (value)
                   :report "Use a new value."
                   :interactive (lambda ()
                                  (list (ask "Value: ")))
                   value)
                 (skip-entry ()
                   :report "Skip the entry."
                   nil))
          collect it)))

(defun read-files (files)
  (loop for file in files
        when (restart-case (read-file file)
               (skip-file ()
                 :report "Skip the entire file."
                 nil))
        collect it))

;;; The three functions below are predefined
;;; handlers for the most common ways the user
;;; will interact with the restarts.
(defun skip-entry (c)
  (declare (ignore c))
  (invoke-restart 'skip-entry))

(defun skip-file  (c)
  (declare (ignore c))
  (invoke-restart 'skip-file))

(defun use-value-handler (value)
  (lambda (c)
    (declare (ignore c))
    (invoke-restart 'use-value value)))

A restart is defined with the macro restart-case, and invoked by the function invoke-restart. This is a bit of a simplification, but invoking a restart is effectively equivalent to jumping to the body of the restart from where the error was signaled. This means that all of the state stored on the stack before the restart was established is still available when the restart is invoked. This gives the user of the library much finer grained control over what happens when an error is thrown.

To specify what should happen, all the user needs to do is use the macro handler-bindHandler-bind takes an error type and a handler (which should be a function) to call when an error of that type is thrown. The handler can then call invoke-restart in order to invoke one of the restarts provided by the library. As part of the library, there is one handler per restart provided, since those are the most common kinds of handlers. Here is what happens when each of the handlers are used when running the library on the two test files:

(handler-bind ((error #'skip-entry))
  (read-files files*))

=> ((5 10 15 20 25) (5 10 15 20 30 40))

(handler-bind ((error #'skip-file))
  (read-files files*))

=> ((5 10 15 20 25))

(handler-bind ((error (use-value-handler 0)))
  (read-files files*))

=> ((5 10 15 20 25) (5 10 15 20 0 30 40))

The really cool thing about restarts is what happens when the user doesn’t handle the error. When this happens they will enter the Slime Debugger. From there they will be given a list of the restarts that are available to them and they will be able to invoke them as if the error had been handled in the first place! Here is what happens when a user doesn’t handle the error, and then invokes the skip-entry restart on the fly:

 

ezgif.com-optimize (3)

 

What’s really cool about this is that this “interactive restarting” can use it to implement breakpoints! As I said in Part 1, Common Lisp provides breakpoints as a function “break” instead of as a feature of the editor. Here is code that could be used to implement break:

(defun break (&optional (format-control "Break")
              &rest format-arguments)
   (with-simple-restart (continue "Return from BREAK.")
     (let ((*debugger-hook* nil))
       (invoke-debugger
         (make-condition 'simple-condition
           :format-control   format-control
           :format-arguments format-arguments))))
   nil)

The code for break works by signalling an error while providing a “continue” restart. This means that as soon as the function break is called, you will enter the debugger with a restart available which will continue normal execution. Exactly what a breakpoint actually is.

Restarts are another fantastic part of debugging Common Lisp. They give you better control over what happens when an error occurs. And, if your code doesn’t handle the error itself, you can still recover the process by using an interactive restart.