A 6502 Emulator
Introduction⌗
6502 is a 8bit processor with a 16bit address bus. The production launched in 1975, and the processor’s clock ran at around 13 MHz. I like 6502 and I think that it’s a notable processor & architecture for many reasons:
 6502 is simple. The base instruction set is just 56 instructions. The processor has a single accumulator,
A
, and two index registers X
andY
. It also has a stack pointer and an instruction pointer, but these are transparent to the programmer.  6502 is cheap. The processor had just around 3'500 transistors. Consequently, when it was launched, 6502 was the least expensive microprocessor on the market. It initially sold for <~ 1/6 the cost of competing CPUs  for instance, Intel 8080.
 6502 is historically relevant. Its introduction caused rapid decreases in pricing across the processor market. Along with the Zilog Z80, it sparked a series of projects that resulted in the home computer revolution of the early 1980s.
 6502 is relevant even nowadays! Because of it’s simplicity, cost, and power consumption, CMOS form of 6502 (65C02 developed by WDC), the 6502 family is widely used in embedded systems, with estimated production volumes in the hundreds of millions. 6502 shares this fate with 8051.
But.. why are these still relevant? Isn’t 6502 and 8051 too old? Well. Because they’re cheap, because they’re no longer covered by Intel’s and MOS Technology patents, because what’s referred to today as “8051” or “6502” isn’t a replica of what Intel and MOS Technology made in the 1970s and 1980s, but rather an ecosystem of softwarecompatible derivatives: same instruction set, same CPU registers, same memory architecture, and the list goes on and on. The 6502 and 8051 design has been refined by many silicon vendors over the decades who sell 8051based microcontroller products, and because of that, these old 8bit designs still have usefulness and relevance in the today’s market. They’re used in industrial control systems (industry hates changing “working” solutions  we covered that earlier), highperformance(!) devices, low power devices (c8051f98x) and IP cores of FPGAs. Actually, Furby runs 6502, just like many MP3 players, mouses, keyboards and monitors. Everything.
A deep dive⌗
Let’s set the goals first. With our emulator, we will want to achieve the following:
 small, portable code.
 decimal mode implemented.
 100% coverage of legal opcodes.
 emulating hardware glitches.
What isn’t the goal:
 cycle accurancy
 performance
6502 supports four kinds of interrupts  the one raised by the BRK
instruction, IRQ, NMI and a RESET interrupt. 6502 also has a stack. Because of the memory addressing limitations, only 65K of memory is needed. Knowing these, let’s start our emulator.
/* Memory size */
#define MEM 0x10000
/* Stack start address */
#define STK 0x200
/* BRK vector */
#define BRKV 0xFFF0
/* NMI vector */
#define NMIV 0xFFF4
/* Reset vector */
#define RESV 0xFFF8
/* IRQ vector */
#define IRQV 0xFFFB
The vectors below define where is the code handling some interrupt. We’ll keep these values, but if you want to build your own 6502centered machine, you can tweak these to your liking.
Generally speaking, since 6502 programs do a lot of 8bit loads, we will keep our memory as a pointer to a vector of 8bit integers. The program counter (or instruction pointer) is 16bit, and we have 4 another 8bit variables  A
, X
, Y
and SP
. A sharp eye will notice that this limits the stack to 256 bytes, but this isn’t a huge problem.
#include <stdint.h>
#include <assert.h>
/* All the integer types we use. Abstain from using 32bit data types.
We use _Bool for flags (we don't pack them into a single byte for
simplicity, even though it'd be worth it). The only place where signed
8bit arithmetic is needed is relative addressing. Modifying the
code to use only unsigned arithmetic is left as an exercise
to the reader :). */
typedef uint8_t u8;
typedef uint16_t u16;
typedef int8_t i8;
typedef _Bool n1;
/* The memory array, instruction pointer, registers and flags. */
u8*m;u16 pc;u8 a,x,y,sp;
n1 cf,zf,idf,df,bf,vf,nf;
6502 can be considered a predecessor of RISC architectures. It’s well known for having just a single accumulator and a large amount of addressing modes. Let’s look at these first:

Accumulator: the operation works on the accumulator. For example 
ASL A / ROL A / ROR A
(/
representing newline). A lot of assemblers allow omitting the explicitA
, soASL / ROL / ROR
would compile too. 
Absolute: data is accessed using a constant, 16bit address. For example 
LDA $0B26 / STY $1FE7
. 
Absolute with X or Y: data is accessed using a constant, 16bit value added to the
X
orY
register. For example STA $3FFF, X
stores the accumulator at$3FFF + X
, andSTA $3FFF, Y
stores the accumulator at$3FFF + Y
. 
Immediate: the operand is taken from the byte following the opcode. For example 
LDA #$FF

Implied: the instruction takes no arguments. Similar to the accumulator addressing mode. For example 
RTS
,RTI
. 
Indirect: data is accessed by dereferencing a constant pointer. For example 
JMP ($B800)
jumps to the location pointed by a word at$B800
(lower byte$B800
, higher byte$B801
). 
Indirect with X: An 8bit ZP address and the X register are added. If the addition overflows, the address wraps around within ZP. The resulting address is used as a pointer to the data being accessed. This makes the X register an index into a list of pointers. For example 
LDA ($05, X)
. 
Indirect with Y: An 8bit address identifies a pointer. The value of the Y register is added to the address contained in the pointer. Effectively, the pointer is the base address and the Y register is an index past that base address. There’s a (subtle) difference between this and indirect with X addressing mode. For example 
LDA ($05), Y
. 
Relative: An 8bit signed value is added to the current instruction pointer. The branch target must be in the range of
(128, 127)
bytes of the current value. For example BEQ $0300
. 
ZP (zero page): An 8bit address is used within the zero page (the area of memory from offset
$00
addressable using a single byte). For example ORA $F0
. 
ZP with X or Y: analogical to the absolute with X or Y addressing mode.
Now, since 6502 has quite a bit of valid opcodes, I decided that we’ll use a bunch of macros to make the code cleaner:
#define R return
#define ZFN(r,n) static inline r n(void)
#define MFN(r,n,a1) static inline r n(a1 A)
#define DFN(r,n,a1,a2) static inline r n(a1 A, a2 B)
And the explanations:
R
is a shorthand forreturn
.ZFN
is a zero argument function.MFN
is a monadic (one argument) function.DFN
is a dyadic (two argument) function.
When working with packed flags, we’ll need a macro to elegantly unpack nth bit from a bit vector. We’ll call it SHF
, a shorthand for SHift Flags.
#define SHF(x,b)x=(A>>b)&1
As you have seen above, the addressing modes are quite repetitive, so we will add macros that create register & memory addressing modes:
#define REGO(n)MFN(void,n ## R,u8*){*A=n(*A);}
#define MEMO(n)MFN(void,n ## A,u16){u8 v=R8(A);W8(A,n(v));}
REGO
is a shorthand for register operand and MEMO
is a shorthand for memory operand. We assume that both are monadic functions. Functions with register operands will have their name end with R
, while functions with memory operands will have their name end with A
.
The memory operand macro is using R8
and W8
functions  respectively, fetch (read) a byte (8bit value) from memory and write a byte to memory.
Finally, we will define a macro that will be used in the instruction decoding switch..case
statement:
#define IH(V) ;break;case V:
IH
is a shorthand for Instruction Handler. It breaks from the previous case value (to stop fallthrough  we don’t want it anywhere in the decoding switch..case
statement) and starts a new case label.
Memory⌗
Let’s implement a few basic functions for interfacing with memory.
MFN(u8,R8,u16){R m[A];}
MFN(u16,R16,u16){R(m[A+1]<<8)m[A];}
R8
is a monadic function taking the address (a u16
) and returning a byte (u8
) located at that address. R16
does exactly the same, but it reads a littleendian word instead. So, we read the u8
, but we also take the byte that follows it (A+1
) and shift it a byte left, to make a word.
There’s a known 6502 bug that has been patched in 65C02. The low byte is sometimes wrapped without incrementing the high byte. The implementation isn’t much different from R16
.
MFN(u16,RB16,u16){R(m[(A&0xFF00)((A+1)&0xFF)]<<8)m[A];}
Writing to memory is fairly simple. We won’t need W16
in our emulator, so we implement just W8:
DFN(void,W8,u16,u8){m[A]=B;}
Next up, stack operations. We need to support POP8
, PSH8
, POP8
and POP16
. Let’s do that:
MFN(void,PSH8,u8){W8(STK+sp,A);} ZFN(u8,POP8){R R8(++sp+STK);}
MFN(void,PSH16,u16){u16 l=STK+sp;W8(l,A>>8);W8(l1,A&0xFF);sp=2;}
ZFN(u16,POP16){R POP8()(POP8()<<8);}
On 6502, the stack grows downward, so pushing an 8bit value is just writing a byte to a location pointed by sp
accounting the stack offset and decrementing the stack pointer, while popping an 8bit value is just incrementing the sp
, adjusting for the stack offset, and reading a value.
Pushing and popping 16bit numbers is a bit more complex. In PSH16
, we first take the value of stack pointer and fix it with the stack offset. Then, we push the higher bits (extracted using A>>8
) and the lower bits (extracted using A&0xFF
). This operation changes the stack pointer by two (the size of a word). POP16
is fairly straightforward too  popping a value and masking it with another popped value, which will constitute the higher bits of newly created word (A<<8
).
Moving swiftly on, let’s take on the implementation of addressing modes, starting with the zero page:
ZFN(u8,ZPG){R R8(pc++);}ZFN(u8,ZPX){R ZPG()+x;}ZFN(u8,ZPY){R ZPG()+y;}
We read an immediate which is the address within the zero page, then dereference it. ZPX
and ZPY
account for X and Ywise addressing. Absolute addressing requires reading a word. X and Ywise addressing looks exactly the same in this addressing mode. Because absolute addressing itself isn’t used all that often, we won’t implement it as a function.
ZFN(u16,ABS){u16 A=R16(pc);pc+=2;R A;}
ZFN(u16,ABX){R ABS()+x;}ZFN(u16,ABY){R ABS()+y;}
Relative addressing requires the use of signed arithmetic. Because not all architectures support it, it’d be smart to leave a comment in the source code explaining this.
ZFN(i8,REL){R(i8)R8(pc++);} // XXX: Relative addressing requires signed arithmetic.
Finishing the addressing saga, we implement the immediate addressing:
ZFN(u16,IMM){R pc++;}
Flags⌗
6502 has a set of 7 flags. They are generally represented as a single byte, but the 5th flag is left unused (fixed to 1
).
Flag  Description 

NF  Negative flag 
VF  Overflow flag 
**  Unused, fixed to 1 
BF  Break flag 
DF  Decimal flag (BCD) 
IDF  Interrupt (IRQ) disable flag 
ZF  Zero flag 
CF  Carry flag 
A lot of operations will alter the zero and negative flag (ZF
and NF
), so let’s implement a function that adjusts them based on an operation result:
MFN(void,ADJZN,u8){zf=A==0;nf=A>>7;}
Since 6502 has instructions that push and pop the processor’s flags, we need a way to serialize and deserialize flags into a single byte:
ZFN(u8,GFL){R nf<<7vf<<632bf<<4df<<3idf<<2zf<<1cf;}
MFN(void,SFL,u8){SHF(nf,7);SHF(vf,6);SHF(bf,4);SHF(df,3);SHF(idf,2);SHF(zf,1);SHF(cf,0);}
Interrupts⌗
We talked for a brief moment about interrupts, but we didn’t quite explain them in enough detail. Some code will aid us:
MFN(void,INTR,u16){PSH16(pc);PSH8(GFL());pc=R16(A);idf=1;}
ZFN(void,nmi){bf=0;INTR(NMIV);}
ZFN(void,res){bf=0;INTR(RESV);}
ZFN(void,irq){if(idf==0){bf=0;INTR(IRQV);}}
So, nmi
, res
and irq
are a part of our “public” API. nmi
and res
clear the bf
flag and call INTR
with the corresponding vector, meanwhile irq
does it only if idf
(the interrupt/IRQ disable flag) is not set.
The INTR
function is a bit more interesting. We can notice that it:
 pushes the current instruction pointer on the stack
 pushes the flags on the stack
 loads the new instruction pointer from memory at
A
(the argument to INTR, usually it’s the value ofNMIV
,IRQV
, etc…)  disables IRQ/interrupt handling (because we’re already inside an interrupt!)
Arithmetics & bit operations⌗
You probably remember the decimal flag we talked about earlier. BCD (Binary Coded Decimal) numbers can be directly added or subracted on the 6502 by using decimal mode. The basics of decimal mode are generally fairly simple, but the specifics of its operation and use aren’t always well documented or explained.
A byte has 256 possible states  $00
to $FF
. These values may represent any sort of data. The most common way of representing numbers is as a binary number (or more specifically, an unsigned binary integer), where $00 … $FF represents numbers 0 … 255. In BCD, a byte represents a number from 0 to 99, where the byte values represented in hexadecimal are treated as decimals. So, for example $39
is 39, $05
is 5, $99
is 99. The largest value such a decimal can take is thus $99
and it can’t contain hexadecimal digits (ABCDEF
). In other words, the upper digit (0 to 9) of the BCD number is stored in the upper 4 bits of the byte, and the lower digit is stored in the lower 4 bits. These 100 values are called valid BCD numbers. The other 156 possible values of a byte (i.e. where either or both hex digits are A to F) are called invalid BCD numbers. By contrast, all 256 possible values of a byte are valid binary numbers.
When $28
is a BCD number it represents 28 (in decimal), and when $28
is a binary number it represents 40 (in decimal).
The term BCD arithmetic means arthmetic that involves BCD numbers, whereas the term binary arithmetic means arithmetic that involves binary numbers. The same naming convention applies to addition and subtraction. For example, $19 + $28 = $47
using BCD addition (df = 1
), but $19 + $28 = $41
(i.e. 25 + 40 = 65) using binary addition.
6502 code usually assumes that the processor in is binary mode (df = 0
) upon entry, and these programs will often return in binary mode (usually because they don’t affect df
). In this case, BCD calculations are performed by starting with a SED
instruction, followed by the instructions to perform calculation itself, and finishing with an CLD
instruction.
Let’s look at the ADC
instruction implementation now:
MFN(void,ADC,u16){u8 V=R8(A),cy,l,h;if(df){cy=cf;l=(V&0xF)+(a&0xF)+cy;if(l>9)l+=6;
h=(V>>4)+(a>>4)+(l>0xF);vf=~(a^V)&(a^(h<<4))&0x80;nf=vf=zf=cf=0;if(h>9)h+=6;
if(h>0xF)cf=1;a=(h<<4)(l&0xF);if(!a)zf=1;if(h&0x8)nf=1;}else{u16 r=a+V+cf;
ADJZN(r&0xFF);vf=(~(a^V)&(a^r)&0x80);cf=r&0xFF00;a=r&0xFF;}}
Big and scary, but we can manage. First, let’s notice that this implementation basically boils down to:
MFN(void,ADC,u16){u8 V=R8(A),cy,l,h;if(df){
/* decimal implementation */
}else{
/* binary implementation */
}}
V
is the other operand for addition (read from memory). So, because the binary implementation is simpler, we will now zoom on it:
u16 r=a+V+cf;
ADJZN(r&0xFF);
vf=(~(a^V)&(a^r)&0x80);
cf=r&0xFF00;
a=r&0xFF;
First, we calculate the addition result  accumulator, the 2nd operand and the carry. Then, we adjust zf
and nf
. The rest of the code sets flags approperiately. (~(a^V)&(a^r)&0x80)
detects if the addition has overflowed, r&0xFF00
extracts the carry (everything that doesn’t fit to a byte, which could be also rewritten as r>0xFF
) and a=r&0xFF
sets the addition result.
Wait a second. How does this obscure carry formula work? Let the carry out of the full adder adding the least significant bit be called $c_0$. Then, the carry out of the full adder adding the next least significant bit is $c_1$. Thus, the carry out of the full adder adding the most significant bits is $c_{k  1}$. This assumes that we are adding two $k$ bit numbers. We can write the formula as $V = c_{k  1}$. This is effectively XORing the carryin and the carryout of the leftmost full adder. Why does this work? The XOR of the carryin and carryout differ if there’s either a 1 being carried in, and a 0 being carried out, or if there’s a 0 being carried in, and a 1 being carried out.
If you thought that this is hard, the BCD part is even worse. There are a dozen of sequences for performing this operation on binarycoded decimals:
Sequence 1:
AL = (A & $0F) + (B & $0F) + C
 If
AL >= $0A
, thenAL = ((AL + $06) & $0F) + $10
A = (A & $F0) + (B & $F0) + AL
A
can be>= $100
at this point If
A >= $A0
, thenA = A + $60
 The accumulator result is the lower 8 bits of
A
 The carry result is 1 if
A >= $100
, and 0 ifA < $100
Sequence 2:
AL = (A & $0F) + (B & $0F) + C
 If
AL >= $0A
, thenAL = ((AL + $06) & $0F) + $10
A = (A & $F0) + (B & $F0) + AL
, using signed (twos complement) arithmetic The
N
flag result is 1 if bit 7 ofA
is 1, and 0 if bit 7 ifA
is 0  The
V
flag result is 1 ifA < 128
orA > 127
, and 0 if128 <= A <= 127
On 6502, sequence 1 is used to compute A
and C
, while sequence 2 is used to compute N
and V
. Z
is based on the BCD accumulator result. For instance, 99 + 1
in decimal mode yields the accumulator to be equal to $00
:
SED
CLC
LDA #$99
ADC #$01
… and this is the algorithm the bit of code above implements. Moving on, to SBC
:
MFN(void,SBC,u16){u8 V=R8(A),l,h;u16 r;if(df){u8 cy=!cf;r=aVcy;l=(a&0xF)(V&0xF)cy;
if(l>>7)l=6;h=(a>>4)(V>>4)(l>>7);nf=vf=zf=cf=0;if((a^V)&(a^r)&0x80)vf=1;cf=!(r&0xFF00);
if(h>>7)h=6;a=(h<<4)(l&0xF);if(!a)zf=1;if(h&0x8)nf=1;}else{r=aV!cf;ADJZN(r&0xFF);
vf=(a^V)&(a^r)&0x80;cf=!(r&0xFF00);a=r&0xFF;}}
As most of it is isomorphic to the description before, I’ll skip the binary part and describe just the decimal case of this algorithm:
AL = (A & $0F)  (B & $0F) + C1
 If
AL < 0
, thenAL = ((AL  $06) & $0F)  $10
A = (A & $F0)  (B & $F0) + AL
 If
A < 0
, thenA = A  $60
 The accumulator result is the lower 8 bits of
A
On 6502, this algorithm is used to compute the value of A
. C
, V
, N
and Z
flags are computed the same way as in the binary mode.
Time for something more straightforward now. Let’s implement increment and decrement:
MFN(u8,INC,u8){u8 r=A+1;ADJZN(r);R r;}
MFN(u8,DEC,u8){u8 r=A1;ADJZN(r);R r;}
Nothing to see here, just +1 / 1
and adjusting the Z
and N
flags. A bunch of other operations:
/* shift left; adjust the carry flags, perform our
general routine of fixing up Z/N flags. */
MFN(u8,ASL,u8){u8 r=A<<1;cf=A>>7;ADJZN(r);R r;}
/* shift right; adjust the carry, z and n flag.
carry is set, because the least significant bit
might have been shifted out. */
MFN(u8,LSR,u8){u8 r=A>>1;cf=A&1;ADJZN(r);R r;}
/* bitwise operations  and, xor, or */
MFN(void,AND,u16){a&=R8(A);ADJZN(a);}
MFN(void,EOR,u16){a^=R8(A);ADJZN(a);}
MFN(void,ORA,u16){a=R8(A);ADJZN(a);}
/* set the Z flag as though the value in the address tested
were ANDed with the accumulator. The N and V flags are
set to match bits 7 and 6 respectively in the value
stored at the tested address. */
MFN(void,BIT,u16){u8 v=R8(A);vf=(v>>6)&1;zf=(v&a)==0;nf=v>>7;}
/* rotate one bit right & left */
MFN(u8,ROR,u8){u8 r=A>>1;r=cf<<7;cf=A&1;ADJZN(r);R r;}
MFN(u8,ROL,u8){u8 r=A<<1;r=cf;cf=A>>7;ADJZN(r);R r;}
Branching⌗
This will be fairly simple. Let’s start with a conditional jump that jumps if the flag provided is set:
DFN(void,J,i8,n1){if(B)pc+=A;}
We also want an unconditional jump:
MFN(void,U,u16){pc=A;}
A thing that I haven’t explained earlier (when we were discussing interrupts) is “how to return from an interrupt?" The answer is fairly simple:
ZFN(void,RTI){SFL(POP8());pc=POP16();}
First, we pop off the flags, then we pop off the instruction pointer. So we’re doing the exact reverse of what happened in the INTR
function. Speaking of returning, we can implement JSR
and RTS
now:
MFN(void,JSR,u16){PSH16(pc1);pc=A;}
ZFN(void,RTS){pc=POP16()+1;}
… and CMP
:
DFN(void,CMP,u16,u8){u8 v=R8(A);u8 r=Bv;cf=B>=v;ADJZN(r);}
The CMP
instruction is used for comparisons as the mnemonics suggests. The way it works is simple  they just perform a subtraction. Strictly speaking, CMP A
is similar to SEC / SBC N
. Both of these variants affect the N
, Z
, and C
flags in exactly the same way. However, unlike SBC
:
CMP
is not affected by df The accumulator is not affected by the operation
 the V flag is not affected
A useful property of CMP is that it performs an equality comparison and an unsigned comparison. After CMP, the Z
flag contains the equality comparison result and the C
flag contains the unsigned comparison result, specifically:
Z = 0 => A != N
,BNE
will branch.Z = 1 => A == N
.BEQ
will branch.C = 0 => A < N
,BCC
will branch.C = 1 => A >= N
,BCS
will branch.
Additional operand combinations⌗
Many instructions take additional operand combinations:
MEMO(INC)REGO(INC)
MEMO(DEC)REGO(DEC)
MEMO(ASL)
MEMO(LSR)
MEMO(ROR)
MEMO(ROL)
…according to this table:
mnemonic  register operand  memory operand 

INC 
X  X 
DEC 
X  X 
ROL 
X  
ROR 
X  
ASL 
X  
LSR 
X 
State initialisation and step function⌗
We’re nearly done! Everything left now is initialising the processor state and implementing a step function. Let’s do that now:
ZFN(void,init){sp=0xFD;pc=a=x=y=zf=cf=idf=df=bf=vf=nf=0;}
The initialisation functions gives some default value to SP
(just in case the program doesn’t initialise it for any reason), clears all the flags, the instruction pointer, address registers and the accumulator.
Finally, the code for the stepping function follows:
ZFN(void,step){switch(R8(pc++)){
default:
IH(0xA9)LDR(&a,IMM()) /* LDA IMM */ IH(0xA5)LDR(&a,ZPG()) /* LDA ZPG */
IH(0xB5)LDR(&a,ZPX()) /* LDA ZPX */ IH(0xAD)LDR(&a,ABS()) /* LDA ABS */
IH(0xBD)LDR(&a,ABX()) /* LDA ABX */ IH(0xB9)LDR(&a,ABY()) /* LDA ABY */
IH(0xA1)LDR(&a,INX()) /* LDA INX */ IH(0xB1)LDR(&a,INY()) /* LDA INY */
IH(0xA2)LDR(&x,IMM()) /* LDX IMM */ IH(0xA6)LDR(&x,ZPG()) /* LDX ZPG */
IH(0xB6)LDR(&x,ZPY()) /* LDX ZPY */ IH(0xAE)LDR(&x,ABS()) /* LDX ABS */
IH(0xBE)LDR(&x,ABY()) /* LDX ABY */ IH(0xA0)LDR(&y,IMM()) /* LDY IMM */
IH(0xA4)LDR(&y,ZPG()) /* LDY ZPG */ IH(0xB4)LDR(&y,ZPX()) /* LDY ZPX */
IH(0xAC)LDR(&y,ABS()) /* LDY ABS */ IH(0xBC)LDR(&y,ABX()) /* LDY ABX */
IH(0x85)W8(ZPG(),a) /* STA ZPG */ IH(0x95)W8(ZPX(),a) /* STA ZPX */
IH(0x8D)W8(ABS(),a) /* STA ABS */ IH(0x9D)W8(ABX(),a) /* STA ABX */
IH(0x99)W8(ABY(),a) /* STA ABY */ IH(0x81)W8(INX(),a) /* STA INX */
IH(0x91)W8(INY(),a) /* STA INY */ IH(0x86)W8(ZPG(),x) /* STX ZPG */
IH(0x96)W8(ZPY(),x) /* STX ZPY */ IH(0x8E)W8(ABS(),x) /* STX ABS */
IH(0x84)W8(ZPG(),y) /* STY ZPG */ IH(0x94)W8(ZPX(),y) /* STY ZPX */
IH(0x8C)W8(ABS(),y) /* STY ABS */
IH(0xAA)x=a;ADJZN(x) /* TAX */ IH(0xA8)y=a;ADJZN(y) /* TAY */
IH(0xBA)x=sp;ADJZN(x) /* TSX */ IH(0x8A)a=x;ADJZN(a) /* TXA */
IH(0x9A)sp=x /* TXS */ IH(0x98)a=y;ADJZN(a) /* TYA */
IH(0x69)ADC(IMM()) /* ADC IMM */ IH(0x65)ADC(ZPG()) /* ADC ZPG */
IH(0x75)ADC(ZPX()) /* ADC ZPX */ IH(0x6D)ADC(ABS()) /* ADC ABS */
IH(0x7D)ADC(ABX()) /* ADC ABX */ IH(0x79)ADC(ABY()) /* ADC ABY */
IH(0x61)ADC(INX()) /* ADC INX */ IH(0x71)ADC(INY()) /* ADC INY */
IH(0xC6)DECA(ZPG()) /* DEC ZPG */ IH(0xD6)DECA(ZPX()) /* DEC ZPX */
IH(0xCE)DECA(ABS()) /* DEC ABS */ IH(0xDE)DECA(ABX()) /* DEC ABX */
IH(0xCA)DECR(&x) /* DEX */ IH(0x88)DECR(&y) /* DEY */
IH(0xE6)INCA(ZPG()) /* INC ZPG */ IH(0xF6)INCA(ZPX()) /* INC ZPX */
IH(0xEE)INCA(ABS()) /* INC ABS */ IH(0xFE)INCA(ABX()) /* INC ABX */
IH(0xE8)INCR(&x) /* INX */ IH(0xC8)INCR(&y) /* INY */
IH(0xE9)SBC(IMM()) /* SBC IMM */ IH(0xE5)SBC(ZPG()) /* SBC ZPG */
IH(0xF5)SBC(ZPX()) /* SBC ZPX */ IH(0xED)SBC(ABS()) /* SBC ABS */
IH(0xFD)SBC(ABX()) /* SBC ABX */ IH(0xF9)SBC(ABY()) /* SBC ABY */
IH(0xE1)SBC(INX()) /* SBC INX */ IH(0xF1)SBC(INY()) /* SBC INY */
IH(0x29)AND(IMM()) /* AND IMM */ IH(0x25)AND(ZPG()) /* AND ZPG */
IH(0x35)AND(ZPX()) /* AND ZPX */ IH(0x2D)AND(ABS()) /* AND ABS */
IH(0x3D)AND(ABX()) /* AND ABX */ IH(0x39)AND(ABY()) /* AND ABY */
IH(0x21)AND(INX()) /* AND INX */ IH(0x31)AND(INY()) /* AND INY */
IH(0x0A)a=ASL(a) /* ASL ACC */ IH(0x06)ASLA(ZPG()) /* ASL ZPG */
IH(0x16)ASLA(ZPX()) /* ASL ZPX */ IH(0x0E)ASLA(ABS()) /* ASL ABS */
IH(0x1E)ASLA(ABX()) /* ASL ABX */
IH(0x24)BIT(ZPG()) /* BIT ZPG */ IH(0x2C)BIT(ABS()) /* BIT ABS */
IH(0x49)EOR(IMM()) /* EOR IMM */ IH(0x45)EOR(ZPG()) /* EOR ZPG */
IH(0x55)EOR(ZPX()) /* EOR ZPX */ IH(0x4D)EOR(ABS()) /* EOR ABS */
IH(0x5D)EOR(ABX()) /* EOR ABX */ IH(0x59)EOR(ABY()) /* EOR ABY */
IH(0x41)EOR(INX()) /* EOR INX */ IH(0x51)EOR(INY()) /* EOR INY */
IH(0x4A)a=LSR(a) /* LSR ACC */ IH(0x46)LSRA(ZPG()) /* LSR ZPG */
IH(0x56)LSRA(ZPX()) /* LSR ZPX */ IH(0x4E)LSRA(ABS()) /* LSR ABS */
IH(0x5E)LSRA(ABX()) /* LSR ABX */ IH(0x09)ORA(IMM()) /* ORA IMM */
IH(0x05)ORA(ZPG()) /* ORA ZPG */ IH(0x15)ORA(ZPX()) /* ORA ZPX */
IH(0x0D)ORA(ABS()) /* ORA ABS */ IH(0x1D)ORA(ABX()) /* ORA ABX */
IH(0x19)ORA(ABY()) /* ORA ABY */ IH(0x01)ORA(INX()) /* ORA IMM */
IH(0x11)ORA(INY()) /* ORA IMM */ IH(0x2A)a=ROL(a) /* ROL ACC */
IH(0x26)ROLA(ZPG()) /* ROL ZPG */ IH(0x36)ROLA(ZPX()) /* ROL ZPX */
IH(0x2E)ROLA(ABS()) /* ROL ABS */ IH(0x3E)ROLA(ABX()) /* ROL ABX */
IH(0x6A)a=ROR(a) /* ROR ACC */ IH(0x66)RORA(ZPG()) /* ROR ZPG */
IH(0x76)RORA(ZPX()) /* ROR ZPX */ IH(0x6E)RORA(ABS()) /* ROR ABS */
IH(0x7E)RORA(ABX()) /* ROR ABX */ IH(0x90)J(REL(),!cf) /* BCC REL */
IH(0xB0)J(REL(),cf) /* BCS REL */ IH(0xD0)J(REL(),!zf) /* BNE REL */
IH(0xF0)J(REL(),zf) /* BEQ REL */ IH(0x10)J(REL(),!nf) /* BPL REL */
IH(0x30)J(REL(),nf) /* BMI REL */ IH(0x50)J(REL(),!vf) /* BVC REL */
IH(0x70)J(REL(),vf) /* BVS REL */ IH(0x4C)U(ABS()) /* JMP */
IH(0x6C)U(RB16(ABS()))/* JMP */ IH(0x20)JSR(ABS()) /* JSR */
IH(0x40)RTI() /* RTI */ IH(0x60)RTS() /* RTS */
IH(0x38)cf=1 /* SEC */ IH(0x18)cf=0 /* CLC */
IH(0xF8)df=1 /* SED */ IH(0xD8)df=0 /* CLD */
IH(0x78)idf=1 /* SEI */ IH(0x58)idf=0 /* CLI */
IH(0xB8)vf=0 /* CLV */
IH(0xC9)CMP(IMM(),a) /* CMP IMM */ IH(0xC5)CMP(ZPG(),a) /* CMP ZPG */
IH(0xD5)CMP(ZPX(),a) /* CMP ZPX */ IH(0xCD)CMP(ABS(),a) /* CMP ABS */
IH(0xDD)CMP(ABX(),a) /* CMP ABX */ IH(0xD9)CMP(ABY(),a) /* CMP ABY */
IH(0xC1)CMP(INX(),a) /* CMP INX */ IH(0xD1)CMP(INY(),a) /* CMP INY */
IH(0xE0)CMP(IMM(),x) /* CPX IMM */ IH(0xE4)CMP(ZPG(),x) /* CPX ZPG */
IH(0xEC)CMP(ABS(),x) /* CPX ABS */ IH(0xC0)CMP(IMM(),y) /* CPY IMM */
IH(0xC4)CMP(ZPG(),y) /* CPY ZPG */ IH(0xCC)CMP(ABS(),y) /* CPY ABS */
IH(0x48)PSH8(a) /* PHA */ IH(0x68)a=POP8();ADJZN(a) /* PLA */
IH(0x08)bf=1;PSH8(GFL()) /* PHP */ IH(0x28)SFL(POP8()) /* PLP */
IH(0x00)bf=1;pc+=1;INTR(BRKV) /* BRK */
IH(0xEA)break; /* NOP */
}
}
There are many sources for 6502 instruction encodings, but I like this one. Maybe you’ll like it too.
Testing our emulator⌗
To test our emulator, let’s tack some code on the bottom:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef uint32_t u32;
DFN(void, loadmem, const char *, u16) {
u32 size;
FILE * f = fopen(A, "rb"); assert(f);
fseek(f, 0, SEEK_END); size = ftell(f); rewind(f);
assert(size + B <= MEM);
assert(fread(&m[B], sizeof(u8), size, f) == size);
fclose(f);
}
This small routine will read a ROM image to memory and put it on a specified offset.
ZFN(void, test1) {
memset(m, 0, MEM);
loadmem("test1.bin", 0x4000);
init(); res();
for(;;) {
step();
if (pc == 0x45C0) {
printf("%s\n", m[0x0210] == 0xFF ? "passed" : "failed");
return;
}
}
}
Now, we load a testing memory image (available in the source code repository mentioned down below alongside a disassembly and a bunch of other files) at #$4000
. An infinite loop calls step();
until the memory pointer reaches #$45C0
, in which case, if the byte under #$0210
is #$FF
, the test has passed. Otherwise, it has failed.
A tiny bit of driver code will run this example program:
int main(void) {
m = malloc(MEM);
test1();
}
… and, as expected, the tests passed!~
0 [20:23] ~/workspace/6502emu % gcc 6502.c o 6502
0 [20:23] ~/workspace/6502emu % ./6502
passed
Klaus Dormann’s testing suite⌗
This one will be a bit tricky, because Klaus' testing suite makes some assumption about our very own homebrew 6502 computer that aren’t true for our model. But we can easily fix that.
Let’s start off with changing the “tweakable” defaults:
// Stack start address
#define STK 0x100
// BRK vector
#define BRKV 0xFFFE
// NMI vector
#define NMIV 0xFFFA
// Reset vector
#define RESV 0xFFFC
// IRQ vector
#define IRQV 0xFFFE
… and then create a new testing routine:
ZFN(void, test2) {
memset(m, 0, MEM);
loadmem("6502_functional_test.bin", 0);
init(); pc = 0x400;
u16 prev_pc = 0, ins = 0;
for(;;) {
step();
if (prev_pc == pc) {
if(pc == 0x3469) {
printf("passed");
break;
}
printf("failed (trap at $%04X)", pc);
break;
}
prev_pc = pc;
ins++;
if(ins >= 100000) {
printf("test took too long, broke at $%04X", pc);
break;
}
}
}
The program’s expected entry point lies at #$0400
. If the program ends up in an infinite loop at any point, we assume that the test either passed or failed (which is why we keep track of prev_pc
; if something goes really wrong, we also have a check for the test taking too many cycles  ins
and the if
statement near the end guards against that).
If the program loops at pc=#$3469
, we assume it succeeded. Let’s try it now:
0 [20:40] ~/workspace/6502emu % gcc main.c o 6502
0 [21:40] ~/workspace/6502emu % ./6502
passed%
… and it works! This means that our 6502 emulator is a perfect replica in terms of functionality of a genuine 6502. Let’s look at our checklist:
Let’s set the goals first. With our emulator, we will want to achieve the following:
 small, portable code (check)
 decimal mode implemented (check)
 100% coverage of legal opcodes (check)
 emulating hardware glitches (check) What isn’t the goal:
 cycle accurancy
 performance
Making a disassembler⌗
When I was digging through 6502 testing suites, I found a small binary blob that had a description attached to it (it explained what kind of hardware does it expect). I really wanted to know what’s exactly inside of it, and to do that I’d have to disassemble the hex code.
I thought it’d be a nice project to make my own 6502 disassembler to go with the emulator. So, let’s do that:
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Instruction tables. Copied from some online source.
enum { IMP, IMPA, MARK2, BRA, IMM, ZP, ZPX, ZPY, INDX, INDY, IND, MARK3, ABS, ABSX, ABSY, IND16, IND1X };
enum { I_ADC, I_AND, I_ASL, I_BCC, I_BCS, I_BEQ, I_BIT, I_BMI, I_BNE, I_BPL, I_BRA, I_BRK, I_BVC, I_BVS,
I_CLC, I_CLD, I_CLI, I_CLV, I_CMP, I_CPX, I_CPY, I_DEC, I_DEX, I_DEY, I_EOR, I_INC, I_INX, I_INY,
I_JMP, I_JSR, I_LDA, I_LDX, I_LDY, I_LSR, I_NOP, I_ORA, I_PHA, I_PHP, I_PHX, I_PHY, I_PLA, I_PLP,
I_PLX, I_PLY, I_ROL, I_ROR, I_RTI, I_RTS, I_SBC, I_SEC, I_SED, I_SEI, I_STA, I_STP, I_STX, I_STY,
I_STZ, I_TAX, I_TAY, I_TRB, I_TSB, I_TSX, I_TXA, I_TXS, I_TYA, I_WAI, I_XXX };
char ops[] =
"ADCANDASLBCCBCSBEQBITBMIBNEBPLBRABRKBVCBVS"
"CLCCLDCLICLVCMPCPXCPYDECDEXDEYEORINCINXINY"
"JMPJSRLDALDXLDYLSRNOPORAPHAPHPPHXPHYPLAPLP"
"PLXPLYROLRORRTIRTSSBCSECSEDSEISTASTPSTXSTY"
"STZTAXTAYTRBTSBTSXTXATXSTYAWAI???";
char insname[256] = {
I_BRK, I_ORA, I_XXX, I_XXX, I_TSB, I_ORA, I_ASL, I_XXX, I_PHP, I_ORA, I_ASL, I_XXX, I_TSB, I_ORA, I_ASL, I_XXX,
I_BPL, I_ORA, I_ORA, I_XXX, I_TRB, I_ORA, I_ASL, I_XXX, I_CLC, I_ORA, I_INC, I_XXX, I_TRB, I_ORA, I_ASL, I_XXX,
I_JSR, I_AND, I_XXX, I_XXX, I_BIT, I_AND, I_ROL, I_XXX, I_PLP, I_AND, I_ROL, I_XXX, I_BIT, I_AND, I_ROL, I_XXX,
I_BMI, I_AND, I_AND, I_XXX, I_BIT, I_AND, I_ROL, I_XXX, I_SEC, I_AND, I_DEC, I_XXX, I_BIT, I_AND, I_ROL, I_XXX,
I_RTI, I_EOR, I_XXX, I_XXX, I_XXX, I_EOR, I_LSR, I_XXX, I_PHA, I_EOR, I_LSR, I_XXX, I_JMP, I_EOR, I_LSR, I_XXX,
I_BVC, I_EOR, I_EOR, I_XXX, I_XXX, I_EOR, I_LSR, I_XXX, I_CLI, I_EOR, I_PHY, I_XXX, I_XXX, I_EOR, I_LSR, I_XXX,
I_RTS, I_ADC, I_XXX, I_XXX, I_STZ, I_ADC, I_ROR, I_XXX, I_PLA, I_ADC, I_ROR, I_XXX, I_JMP, I_ADC, I_ROR, I_XXX,
I_BVS, I_ADC, I_ADC, I_XXX, I_STZ, I_ADC, I_ROR, I_XXX, I_SEI, I_ADC, I_PLY, I_XXX, I_JMP, I_ADC, I_ROR, I_XXX,
I_BRA, I_STA, I_XXX, I_XXX, I_STY, I_STA, I_STX, I_XXX, I_DEY, I_BIT, I_TXA, I_XXX, I_STY, I_STA, I_STX, I_XXX,
I_BCC, I_STA, I_STA, I_XXX, I_STY, I_STA, I_STX, I_XXX, I_TYA, I_STA, I_TXS, I_XXX, I_STZ, I_STA, I_STZ, I_XXX,
I_LDY, I_LDA, I_LDX, I_XXX, I_LDY, I_LDA, I_LDX, I_XXX, I_TAY, I_LDA, I_TAX, I_XXX, I_LDY, I_LDA, I_LDX, I_XXX,
I_BCS, I_LDA, I_LDA, I_XXX, I_LDY, I_LDA, I_LDX, I_XXX, I_CLV, I_LDA, I_TSX, I_XXX, I_LDY, I_LDA, I_LDX, I_XXX,
I_CPY, I_CMP, I_XXX, I_XXX, I_CPY, I_CMP, I_DEC, I_XXX, I_INY, I_CMP, I_DEX, I_WAI, I_CPY, I_CMP, I_DEC, I_XXX,
I_BNE, I_CMP, I_CMP, I_XXX, I_XXX, I_CMP, I_DEC, I_XXX, I_CLD, I_CMP, I_PHX, I_STP, I_XXX, I_CMP, I_DEC, I_XXX,
I_CPX, I_SBC, I_XXX, I_XXX, I_CPX, I_SBC, I_INC, I_XXX, I_INX, I_SBC, I_NOP, I_XXX, I_CPX, I_SBC, I_INC, I_XXX,
I_BEQ, I_SBC, I_SBC, I_XXX, I_XXX, I_SBC, I_INC, I_XXX, I_SED, I_SBC, I_PLX, I_XXX, I_XXX, I_SBC, I_INC, I_XXX
};
char addrmode[256] = {
IMP, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMPA, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZP, ZPX, ZPX, IMP, IMP, ABSY, IMPA, IMP, ABS, ABSX, ABSX, IMP,
ABS, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMPA, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZPX, ZPX, ZPX, IMP, IMP, ABSY, IMPA, IMP, ABSX, ABSX, ABSX, IMP,
IMP, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMPA, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZP, ZPX, ZPX, IMP, IMP, ABSY, IMP, IMP, ABS, ABSX, ABSX, IMP,
IMP, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMPA, IMP, IND16, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZPX, ZPX, ZPX, IMP, IMP, ABSY, IMP, IMP, IND1X, ABSX, ABSX, IMP,
BRA, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMP, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZPX, ZPX, ZPY, IMP, IMP, ABSY, IMP, IMP, ABS, ABSX, ABSX, IMP,
IMM, INDX, IMM, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMP, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZPX, ZPX, ZPY, IMP, IMP, ABSY, IMP, IMP, ABSX, ABSX, ABSY, IMP,
IMM, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMP, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZP, ZPX, ZPX, IMP, IMP, ABSY, IMP, IMP, ABS, ABSX, ABSX, IMP,
IMM, INDX, IMP, IMP, ZP, ZP, ZP, IMP, IMP, IMM, IMP, IMP, ABS, ABS, ABS, IMP,
BRA, INDY, IND, IMP, ZP, ZPX, ZPX, IMP, IMP, ABSY, IMP, IMP, ABS, ABSX, ABSX, IMP
};
I copied a bunch of instruction tables from the internet. Now, we can dive into the actual disassembler:
uint16_t disassemble(uint16_t addr) {
uint16_t temp, mode, ind;
uint8_t p1, p2;
uint8_t op = getchar();
mode = addrmode[op];
p1 = (mode > MARK2) ? getchar() : 0;
p2 = (mode > MARK3) ? getchar() : 0;
First, we determine the size of an instruction and we load it into our variables.
ind = insname[op] * 3;
printf("%04X ", addr);
if(mode > MARK3)
printf("%02X %02X %02X ", op, p1, p2);
else if(mode > MARK2)
printf("%02X %02X ", op, p1);
else
printf("%02X ", op);
for(temp = 0; temp < 3; temp++)
putchar(ops[ind + temp]);
putchar(' ');
… then, we display the hexadecimal dump of said instruction alongside its address.
switch (mode) {
case IMPA: putchar('A'); break;
case BRA: temp = addr + 2 + (signed char) p1; printf("%04X", temp); addr++; break;
case IMM: printf("#$%02X", p1); addr++; break;
case ZP: printf("$%02X", p1); addr++; break;
case ZPX: printf("$%02X,X", p1); addr++; break;
case ZPY: printf("$%02X,Y", p1); addr++; break;
case IND: printf("($%02X)", p1); addr++; break;
case INDX: printf("($%02X,X)", p1); addr++; break;
case INDY: printf("($%02X),Y", p1); addr++; break;
case ABS: printf("$%02X%02X", p2, p1); addr += 2; break;
case ABSX: printf("$%02X%02X,X", p2, p1); addr += 2; break;
case ABSY: printf("$%02X%02X,Y", p2, p1); addr += 2; break;
case IND16: printf("($%02X%02X)", p2, p1); addr += 2; break;
case IND1X: printf("($%02X%02X,X)", p2, p1); addr += 2; break;
}
Since the addressing modes on 6502 are really simple, and this is all it takes to disassemble a mnemonic, we can just switch over the addressing mode used by that particular opcode. Finally, we skip to a new line, go to the next instruction, and return the current address.
putchar('\n');
addr++;
return addr;
}
Now, we add a main function stub that runs disassemble
on itself as long as there’s input on stdin
:
int main(int argc, char * argv[]) {
uint16_t addr = 0;
while(!feof(stdin))
addr = disassemble(addr);
}
… but, wait. Some programs, like the one above, required us to put them on a specific offset in memory. Our assembler should also support that. So let’s do that  it’s just a simple patch that comes before the while
loop:
uint16_t addr = argc == 2 ? atoi(argv[1]) : 0;
printf(" * = $%04X\n", addr);
There’s a small problem with this code  if I want my code on offset #$0400
, I don’t want to have to convert it to decimal. It’s just inconvenient. So, let’s replace atoi
with a call to our function that detects the base and converts the number:
static int smart_cvt(char * number) {
int base = 10;
if(!strncmp(number, "0x", 2)) base = 16;
else if(!strncmp(number, "0b", 2)) base = 2;
else if(!strncmp(number, "0o", 2)) base = 8;
return strtol((base != 10) * 2 + number, NULL, base);
}
int main(int argc, char * argv[]) {
uint16_t addr = argc == 2 ? smart_cvt(argv[1]) : 0;
printf(" * = $%04X\n", addr);
while(!feof(stdin))
addr = disassemble(addr);
}
Done! Now we can run it on this mysterious program:
* = $4000
4000 A9 00 LDA #$00
4002 8D 10 02 STA $0210
4005 A9 55 LDA #$55
4007 8D 00 02 STA $0200
400A A9 AA LDA #$AA
400C 8D 01 02 STA $0201
400F A9 FF LDA #$FF
4011 8D 02 02 STA $0202
[...]
4070 91 60 STA ($60),Y
4072 A9 7E LDA #$7E
4074 B1 60 LDA ($60),Y
4076 9D FF 07 STA $07FF,X
4079 A9 7E LDA #$7E
407B BD FF 07 LDA $07FF,X
407E 99 FF 07 STA $07FF,Y
4081 A9 7E LDA #$7E
4083 B9 FF 07 LDA $07FF,Y
4086 81 36 STA ($36,X)
4088 A9 7E LDA #$7E
408A A1 36 LDA ($36,X)
408C 86 50 STX $50
[...]
42E5 A9 00 LDA #$00
42E7 60 RTS
42E8 95 0D STA $0D,X
42EA A5 40 LDA $40
42EC CD 04 02 CMP $0204
42EF F0 08 BEQ $42F9
42F1 A9 04 LDA #$04
42F3 8D 10 02 STA $0210
42F6 4C C0 45 JMP $45C0
42F9 A9 35 LDA #$35
42FB AA TAX
42FC CA DEX
42FD CA DEX
42FE E8 INX
42FF 8A TXA
4300 A8 TAY
4301 88 DEY
4302 88 DEY
4303 C8 INY
4304 98 TYA
4305 AA TAX
4306 A9 20 LDA #$20
4308 9A TXS
4309 A2 10 LDX #$10
430B BA TSX
430C 8A TXA
430D 85 40 STA $40
430F A5 40 LDA $40
4311 CD 05 02 CMP $0205
4314 F0 08 BEQ $431E
4316 A9 05 LDA #$05
4318 8D 10 02 STA $0210
431B 4C C0 45 JMP $45C0
431E 2A ROL A
… and it seems like it worked!