This post is the first part of a two part series exploring the emulator cl-6502. This post will cover how addressing modes are implemented in cl-6502. The second part will go over the implementation of the opcodes.
cl-6502 is an emulator for the MOS 6502 processor, used in devices such as the Apple II and the NES. As an emulator, cl-6502 has three distinct roles. It needs to be able to convert assembly code into machine code (assembly), it needs to be able to convert machine code back into assembly (disassembly), and it needs to be able to actually interpret the machine code (execution). By using macros in clever ways, cl-6502 is able to create multiple DSLs for defining different components of the emulator. One of those macros is defaddress, which makes it easy to add addressing modes to the emulator. First some background.
Assembly language has what are known as addressing modes. Depending on which addressing mode is being used, the argument to the instruction will be calculated in a different manner. The programmer is able to specify different addressing modes by using slightly different syntaxes. As an example here is the same jump instruction just with two different addressing modes:
JMP $0 JMP ($0)
From here on out, Im going to use the term operand to refer to the value given to the instruction before the addressing mode has been taken into account and the term argument to refer to the value after the addressing mode has been considered. As you should be able to tell, both instructions above are passed the same operand of zero, but because they are using different addressing modes, they will calculate their arguments in two different ways.
Since the first instruction doesn’t use any extra syntax (except the dollar sign which just means base 16), it uses absolute addressing. With absolute addressing the argument is the same as the operand.1 The first instruction can be read as, continue execution at the instruction at address zero.
Since the second instruction has parens around the operand, it uses what is known as indirect addressing. For indirect addressing, the operand is actually the memory location of the argument.2 The second instruction can be read as, get the address that is stored at address zero, and continue execution at the instruction at that location in memory. Assuming the value 123 was stored at address zero, the operand would be zero, the argument would be 123, and the instruction would cause execution to be resumed at the instruction at location 123.
In total there are 13 different addressing modes for the 6502. In order to make it easy to define all of these different addressing modes, cl-6502 creates a macro defaddress. Defaddress is a DSL for the sole purpose of defining addressing modes. Each one of the main arguments to defaddress handles one of the jobs (assembly/disassembly/execution) that an emulator has to perform with respect to the addressing mode. As to what the defaddress DSL looks like, here is the code that defines the absolute addressing mode.
(defaddress absolute (:reader "^_$" :writer "$~{~2,'0x~}") (get-word (cpu-pc cpu)))
The code above has three distinct parts. The first piece is the reader, which is used to parse the assembly code:
"^_$"
The reader argument is a regular expression that recognizes the syntax of the addressing mode being defined, in this case absolute addressing. The regex is a normal perl compatible regex except it may use an underscore to match (and capture) an operand. The regex above matches a lone operand, which is exactly the syntax for absolute addressing. After the reader is the writer:
"$~{2,'0x~}"
The writer is a format string that is able to reproduce the original assembly (with the proper syntax for the addressing mode) from the machine code. The writer for absolute addressing says to print the operand as a zero padded, two digit, hexadecimal number. Basically, it just prints the lone operand in assembly language without any additional syntax. Since there is no extra syntax, that means the generated code is using absolute addressing.
The last part is the body. The body is a block of code that calculates the argument from the operands.3 For absolute addressing the body is:
(get-word (cpu-pc cpu))
When this code is ran, the variable cpu will be bound to an object representing the current state of the cpu. The pc of the cpu normally points to the current instruction being executed, but cl-6502 uses a slight trick. By incrementing the pc, it will now point to the first operand of the instruction! All the body does is take the value of the pc (which is the address of the argument/operand), and looks up the value at that address4 to get the actual argument.
As a second example of defaddress, here is the code for indirect addressing:
(defaddress indirect (:reader "^\\(_\\)$" :writer "($~{~2,'0x~})") (get-word (get-word (cpu-pc cpu)) t))
There are only a few differences between the code for indirect and absolute addressing. In the reader and writer, there are now an extra pair of parens around the operand. This is because the syntax for indirect addressing is an operand surrounded by parens. Another difference is with the body. Since there is an extra layer of indirection with indirect addressing, there is an additional call to get-word. For indirect addressing, the body says to calculate the argument, get the value of the pc (the address of the operand or the address of the address of the argument), get the value at that address (the operand or the address of the argument), and then get the value at that address (the actual argument).
Since I have already shown you some examples of how to use defaddress, I am now going to explain how defaddress works. Here is the complete definition of defaddress:
(defmacro defaddress (name (&key reader writer cpu-reg) &body body) <code>(progn (defmethod reader ((mode (eql ',name))) ,(cl-ppcre:regex-replace-all "_" reader "([^,()#&]+)")) (defmethod writer ((mode (eql ',name))) ,writer) (push ',name *address-modes*) (defun ,name (cpu) ,@body) (defun (setf ,name) (value cpu) ,(if cpu-reg </code>(setf ,@body value) <code>(setf (get-byte ,@body) value)))))
I’m going to break down the code for defaddress one part at a time. After explaining a piece does, I will show you what the expansion of that piece looks like when defining absolute addressing. The first part of defaddress handles the reader:
(defmethod reader ((mode (eql ',name))) ,(cl-ppcre:regex-replace-all "_" reader "([^,()#&]+)"))
This part generates code which will define a method on the generic (virtual) function reader. Reader takes in the name of the mode as an argument and is supposed to return a regex (a true perl compatible regex, i.e. no underscores) that will recognize the mode and extract the operands:
(reader 'absolute) => "^([^,()#&]+)$"
To produce the method, defaddress just takes the reader argument, substitutes the underscore with a regex that can be used to recognize operands, and uses that as the value reader should return for the mode being defined. Here is what the piece of code expands into for absolute addressing:
(defmethod reader ((mode (eql 'absolute))) "^([^,()#&]+)$")
The next part does pretty much the exact same thing, only for the writer:
(defmethod writer ((mode (eql ',name))) ,writer)
It generates the code for a method for the generic function writer. Since the format string is used unmodified, defaddress just inserts the string into the body of the function. There result winds up being:
(defmethod writer ((mode (eql 'absolute))) "$~{~2,'0x~}")
Next up is the piece:
(push ',name *address-modes*)
This piece of code adds the mode being defined to a list of all of the addressing modes. The list is used to find all of the addressing modes that match the syntax of a given instruction. The snippet simply expands into:
(push 'absolute *address-modes*)
Now for the most important part of defaddress the code that handles the body:
(defun ,name (cpu) ,@body)
It just puts the body inside of a function named by the addressing mode. The function is supposed to take the in the current state of the cpu as an object and return the argument used for the current instruction. Note that the variable cpu is available to the body. This is how the body of defaddress is able to access the cpu object. The expansion winds up looking like:
(defun absolute (cpu) (get-word (cpu-pc cpu)))
There is just one more part, a setf function for the addressing mode:
(defun (setf ,name) (value cpu) ,(if cpu-reg </code>(setf ,@body value) `(setf (get-byte ,@body) value)))
This code generates a setf function, basically a way to modify the argument of the instruction. Many instructions not only use the argument, but they store a new value to the memory location of the argument. The setf function defined by defaddress is just a way to do that. Im not going to go in depth about it, but this is the only piece of code that uses the cpu-reg argument. The cpu-reg argument is just used to smooth out some differences between different addressing modes. The code generated by the above code winds up looking like:
(defun (setf absolute) (value cpu) (setf (get-byte (get-word (cpu-pc cpu))) value))
As I just said, the setf function defined can be used to set the value of the argument. To do it for absolute addressing, get the operand and set the value at that memory location.56
And that is pretty much everything there is to know about defaddress. In the next post I am going to talk a bout defasm, a macro that makes it easy to define different instructions for the emulator. It piggybacks off of the information provided by defaddress in order to handle all of the instructions in all of the different possible addressing modes.
- Actually it isn’t. The operand is actually the address of the argument. The real argument to jump is the instruction being jumped to. Just for simplicity sake I am removing a layer of indirection. What I am saying only makes sense for jump. For pretty much every other instruction, the operand is the address of the argument.
- As I said in the previous footnote there is actually an extra layer of indirection that I am removing for simplicity.
- As I have said in previous footnotes it should actually calculate the address of the real argument. Jump is just a bit weird since the real argument is the instruction being jumped to, but it needs the address of it to get there. For simplicity sake I have been pretending the address of the argument is the actual argument, but this makes sense only for jump and pretty much nothing else.
- Get-word is just a function which looks up the 16-bit value at the given memory address.
- Get-byte is just like get-word only it access the 8-bit value at the given memory address.
- If jump were to modify the argument, it would actually be modifying the instruction being jumped to.