The Zilog Z80 is an 8-bit microprocessor developed by Federico Faggin and his 11 employees in early 1975. The first working samples were delivered in March 1976, and Z80 was officially introduced on the market in July 1976.

The Z80 is a backwards-compatible enhancement of the Intel 8080. Z80 quickly became one of the most widely used CPUs in desktop computers and home computers from the 1970s to the 1980s. It was also common in military applications, musical equipment such as synthesizers, calculators such as the classy TI83+, and arcade games such as Pac-Man.


I like to define a few convenience macros for my small C projects to make the code more succinct:

#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

// ---------------------------------------

typedef int8_t i8;
typedef uint8_t u8;
typedef uint16_t u16;
typedef _Bool bv;
#define SI static inline
#define R return

// Defining a single-expression function body.
// Equivalent declarations:
// u8 f(u8 a)_(a * 2)
// u8 f(u8 a) { return a * 2; }
#define _(a...) {return({a;});}

Next up, we define the processor state. Generally speaking, there are a few things we need to take care of:

  • The processor’s memory.
  • The instruction pointer, stack pointer and index registers (ix, iy).
  • The WZ register1. We introduce it for compatibility with some of the Z80’s undocumented features. For example, the BIT n,(HL) instruction is known to notoriously leak bits 11 and 13 of wz into bits 3 and 5 of the WZ register.
  • The exchange registers. The z80 actually has two sets of general purpose registers and we can switch between those freely. They were initially designed to make handling interrupts easier, but programmers found a way to use and abuse them for sizecoding or performance tricks.
  • Interrupt vectors and the refresh register. While the refresh register is not very useful, the program can load it to A and depend on its behaviour, so we must implement it correctly.
  • The flags: sign, zero, halfcarry, parity, negative, carry. My emulator also adds flags yf and xf (3rd and 5th bits of the result). They are undocumented, but we must implement them, since the program can perceive their values.

The Z80 microprocessor has two interrupt lines, a software-masked INT line and an unmasked NMI line. The unmasked interrupt cannot be disabled by the programmer and is accepted whenever a peripheral device requests it. The second type of interrupt, masked INT, is usually reserved for very important functions that the programmer can selectively block or unblock. This allows the programmer to block this interrupt at times when their program has time constraints that do not allow it to handle interrupts. The Z80 microprocessor has an Interrupt Flip-Flop (IFF) interrupt handler activation flip-flop, which is set or reset by the programmer using the EI (Enable Interrupt) and DI (Disable Interrupt) instructions. When the IFF interrupt is reset, the microprocessor does not accept INT interrupts. The state of the IFF1 metastable is used to block interrupt handling, while the IFF2 metastable is used as a temporary place to remember the state of the IFF1 metastable. A reset of the microprocessor forces both the IFF1 and IFF2 metastables to reset to zero, which blocks interrupt handling. This support can obviously be enabled programmatically at any time with the EI instruction. When an EI instruction is in the process of execution, no active interrupt is accepted until the next instruction after EI is executed. This delay of one instruction is necessary when the next instruction is a return instruction. Interrupts will not be handled until the execution of the return instruction is complete. The EI instruction sets both the IFF1 and IFF2 interrupts to the on state. When the microprocessor accepts a masked interrupt, both IFF1 and IFF2 are automatically reset to zero, which blocks handling of further interrupts until the programmer issues a new EI instruction. Note that in all previous cases IFF1 and IFF2 are always equal.

The purpose of IFF2 is to store the state of IFF1 when an unmasked interrupt occurs. When it is accepted, IFF1 resets to prevent the acceptance of further interrupts until the programmer re-enables their handling. In this way, when an unmasked interrupt is accepted, the masked interrupts are disabled, but the previous state of IFF1 is remembered, so the complete state of the microprocessor before the unmasked interrupt can be restored at any time. With the instructions LD A,I and LD A,R, the state of the IFF2 interrupt is copied to the parity marker, where it can be tested or stored. The second method of reproducing the state of IFF1 is to execute the RETN (return from a NMI) instruction. This instruction signifies that the procedure for handling the non-maskable interrupt has been completed and the contents of IFF2 are now copied back to IFF1, so that the state of this interrupt is recreated automatically from before the non-maskable interrupt was accepted.

// ---------------------------------------

// The z80 memory.
u8 * m = NULL;

// Instruction counter, stack pointer, index registers.
u16 pc, sp, ix, iy;

// "WZ" register.
u16 wz;

// Main and exchange registers.
u8 a, b, c, d, e, h, l;
u8 exa, exb, exc, exd, exe, exh, exl, exf;

// Interrupt vectors and the refresh register.
// Note that the refresh register is useless but since
// it can be loaded to a, we must implement it properly.
// Some programs use it as a semi-random entropy source.
u8 i, r;

// Flags: sign, zero, halfcarry, parity, negative, carry.
// yf and xf are undocumented flags. In this emulator,
// they are hardwired to 3rd and 5th bits of the result.
bv sf, zf, yf, hf, xf, pf, nf, cf;

// Interrupt flip-flops. `iff_set' is set when `EI' is executed.
u8 iff_set;
bv iff1, iff2;

// The interrupt mode. Set by `IM', is either 0, 1 or 2.
// Mode 2 is rarely used, but it needs to be implemented.
u8 im;
u8 int_vec;
bv int_pending, nmi_pending;

bv halted;

Let’s define the API of our emulator now. We want to be supplied port I/O handlers by the client application and expose a few functions to reset the CPU, perform an emulation step, or generate a NMI or an interrupt.

// ---------------------------------------
// API

// Port I/O functions supplied by the user.
extern u8 port_in(u8 port);
extern void port_out(u8 port, u8 val);

// Exposed API.
void init();
void step();
void gen_nmi();
void gen_int(u8 data);

It is important to define a few convenience functions when delaing with emulator code, starting with memory accesses, extracting a higher/lower byte of a word and extracting a given bit from a word. All of these can be implemented as straightforward macros.

// ---------------------------------------

// Extracting n-th bit of a number, byte-wise memory access.
#define bit(n, val) (((val) >> (n)) & 1)
#define r8(a) m[a]
#define w8(a, v) m[a] = v

// Higher/lower byte of a u16.
#define HI(a) ((a) >> 8)
#define LO(a) ((a) & 0xFF)

Going further, we need to define functions that let us access data with word granularity:

// Memory access with word granularity.
SI u16 r16(u16 a) _ ((r8(a + 1) << 8) | r8(a))
SI void w16(u16 a, u16 v) { w8(a, LO(v)); w8(a + 1, HI(v)); }

The Z80 also has a 16-bit stack, so we must implement the approperiate operations for it.

// Stack operations.
SI void psh(u16 v) { w16(sp -= 2, v); }
SI u16 pop() _ (r16((sp += 2) - 2))

The Z80 is known for uniquely dealing with instruction “pairs” to form together 16-bit numbers that use one register as the lower byte, and another register as the higher byte.

// 16-bit register pairs.
SI u16 get_bc() _ ((b << 8) | c)
SI u16 get_hl() _ ((h << 8) | l)
SI u16 get_de() _ ((d << 8) | e)
SI u16 get_af() _ ((a << 8) | get_f())

SI void set_bc(u16 v) { b = HI(v); c = LO(v); }
SI void set_hl(u16 v) { h = HI(v); l = LO(v); }
SI void set_de(u16 v) { d = HI(v); e = LO(v); }

To facilitate working with the rest of the processor state, the following straightforward functions simplify working with the flags register and obtaining instruction data from the program:

// Querying program data and incrementing the instruction pointer.
SI u8 p8() _ (r8(pc++))
SI u16 p16() _ (r16((pc += 2) - 2))

// Setting / getting the "full" flags register.
// Order: cf nf pf xf hf yf zf sf.
SI u8 get_f() _ (
  cf | (nf << 1) | (pf << 2) | (xf << 3)
| (hf << 4) | (yf << 5) | (zf << 6) | (sf << 7))

SI void set_f(u8 val) {
  cf = bit(0, val);
  nf = bit(1, val);
  pf = bit(2, val);
  xf = bit(3, val);
  hf = bit(4, val);
  yf = bit(5, val);
  zf = bit(6, val);
  sf = bit(7, val);

Instruction implementation

We can start implementing z80 instructions now, starting with a few utility functions that will be useful later:

// ---------------------------------------

// increments R, preserving the highest bit.
SI void inc_r() { r = (r & 0x80) | ((r + 1) & 0x7f); }

// find the carry bit in addition on bits n and n-1
SI bv carry(int n, u16 a, u16 b, bv cy) _ (bit(n, (a + b + cy) ^ a ^ b))

// Check whether the parity of a byte is even.
SI bv parity(u8 x) {
  x ^= x >> 4;
  x ^= x >> 2;
  x ^= x >> 1;
  return (~x) & 1;

The implementation of jumps, calls, conditional jumps, conditional calls and returns is easy. We need to take care of the WZ register, the nuissances of which are described in the first footnote1.

// Branching
SI void jmp(u16 a) { wz = pc = a; }
SI void call(u16 dest) { psh(pc); jmp(dest); }
SI void cjmp(bv c) { u16 t = p16(); if (c) jmp(t); wz = t; }
SI void ccall(bv c) { u16 t = p16(); if (c) call(t); wz = t; }
SI void ret() { jmp(pop()); }
SI void cret(bv c) { if (c) ret(); }

Z80 also employs relative jumps, which are cheaper to encode (8-bit displacement instead of 16-bit absolute address):

// Relative jumps.
SI void jr(i8 disp) { wz = pc += disp; }
SI void cjr(bv c) { if (c) jr(p8()); else pc++; }

It is common for z80 instructions to update the flags register in a very monotone way (in particular, setting the sign and zero flags) and seemingly randomly tweak the “extra” xf and yf flags. Let’s implement these:

// Updating XF and YF in two ways.
SI void xyf1(u8 x) { xf = bit(3, x); yf = bit(5, x); }
SI void xyf2(u8 x) { xf = bit(3, x); yf = bit(1, x); }

// Adjust S and Z flags.
SI void szf8(u8 x) { zf = x == 0; sf = x >> 7; }
SI void szf16(u16 x) { zf = x == 0; sf = x >> 15; }

Now we can implement some arithmetics, starting with addition and subtraction, which is implemented as addition of the negated value:

// Arithmetics
SI u8 add8(u8 a, u8 b, bv cy) {
  u8 res = a + b + cy;
  hf = carry(4, a, b, cy);
  pf = carry(7, a, b, cy) != carry(8, a, b, cy);
  cf = carry(8, a, b, cy);
  nf = 0;
  R res;

SI u8 sub8(u8 a, u8 b, bv cy) {
  u8 v = add8(a, ~b, !cy);
  cf = !cf; hf = !hf;
  nf = 1;
  R v;

16-bit variants of these instructions & the increment and decrement instructions trivially follow:

SI u16 add16(u16 a, u16 b, bv cy) {
  u8 lo = add8(a, b, cy);
  u16 res = (add8(HI(a), HI(b), cf) << 8) | lo;
  zf = res == 0; wz = a + 1; R res;

SI u16 sub16(u16 a, u16 b, bv cy) {
  u8 lo = sub8(a, b, cy);
  u16 res = (sub8(HI(a), HI(b), cf) << 8) | lo;
  zf = res == 0; wz = a + 1; R res;

SI u8 inc(u8 a) { bv ncf = cf; u8 q = add8(a, 1, 0); cf = ncf; R q; }
SI u8 dec(u8 a) { bv ncf = cf; u8 q = sub8(a, 1, 0); cf = ncf; R q; }

A recurring theme in the z80 instruction set is adding something to the 16-bit register pairs, so we implement two convenience functions for it. One is specialised in adding a word to the HL register, and the other is more general and can add a word to any 16-bit register we pass to it via a pointer. Additionally, we add two convenience functions to add/subtract to/from the HL register with carry.

// Add a word to hl.
SI void addhl(u16 val) {
  bv nsf = sf; bv nzf = zf; bv npf = pf;
  set_hl(add16(get_hl(), val, 0));
  sf = nsf; zf = nzf; pf = npf;

// Add a word to a native 16-bit register.
SI void addiz(u16 * reg, u16 val) {
  bv nsf = sf; bv nzf = zf; bv npf = pf;
  *reg = add16(*reg, val, 0);
  sf = nsf; zf = nzf; pf = npf;

// Add/subtract to HL with carry.
SI void adchl(u16 v) { u16 q = add16(get_hl(), v, cf); szf16(q); set_hl(q); }
SI void sbchl(u16 v) { u16 q = sub16(get_hl(), v, cf); szf16(q); set_hl(q); }

The z80 has a “comparison instruction” to compare with A (which is actually defined using subtraction), so we define it too:

// compare with A.
SI void cmpa(u8 val) { sub8(a, val, 0); xyf1(val); }

All bit operations on z80 impact the flags register in the same way, so we can implement a helper macro for putting these together:

// Bit operations.
#define bit_op(name, op, halfcarry) \
  SI void name(u8 val) { \
    u8 res = a op val; \
    szf8(res); \
    xyf1(res); \
    hf = halfcarry; \
    pf = parity(res); \
    nf = cf = 0; \
    a = res; \

bit_op(land, &, 1)
bit_op(lor,  |, 0)
bit_op(lxor, ^, 0)

#undef bit_op

The 0xCB opcodes (shifts) similarly share the flag update pattern characteristics, so we implement a helper macro for them too:

// A stub for 0xCB opcodes.
#define op_cbh(name, op...) \
  SI u8 name(u8 val) { \
    op \
    szf8(val); \
    xyf1(val); \
    pf = parity(val); \
    nf = hf = 0; \
    R val; \

// Rotate left/right with carry.
op_cbh(rlc, bv old = val >> 7; val = (val << 1) | old; cf = old;)
op_cbh(rrc, bv old = val & 1; val = (val >> 1) | (old << 7); cf = old;)

// Simple shifts.
op_cbh(rl, bv ncf = cf; cf = val >> 7; val = (val << 1) | ncf;)
op_cbh(rr, bv nc = cf; cf = val & 1; val = (val >> 1) | (nc << 7);)

// Shift and preserve sign (possibly: set the bit to 1).
op_cbh(sla, cf = val >> 7; val <<= 1;)
op_cbh(sll, cf = val >> 7; val <<= 1; val |= 1;)
op_cbh(sra, cf = val & 1; val = (val >> 1) | (val & 0x80);)
op_cbh(srl, cf = val & 1; val >>= 1;)

For the last bit operations, we want to test individual bits in a byte:

// Test n-th bit.
SI u8 bt(u8 val, u8 n) {
  u8 res = val & (1 << n);
  szf8(res); xyf1(val);
  hf = 1; pf = zf; nf = 0; R res;

LDI and LDD will make heavy use of 16-bit registers, so we implement a few helper macros to offset them by some constant value:

// Altering 16-bit register pairs.
#define dbc(n) set_bc(get_bc() + (n))
#define dhl(n) set_hl(get_hl() + (n))
#define dde(n) set_de(get_de() + (n))

The implementations are notoriously simple, though (beside the flag update black magic):

// Copying data.
SI void ldi() {
  u8 val = r8(get_hl());
  w8(get_de(), val);
  dhl(1); dde(1); dbc(-1);
  xyf2(val + a);
  nf = hf = 0;
  pf = get_bc() > 0;

SI void ldd() { ldi(); dhl(-2); dde(-2); }

// Comparison
SI void cpi() {
  bv ncf = cf;
  u8 v = sub8(a, r8(get_hl()), 0);
  dhl(1); dbc(-1);
  xyf2(v - hf);
  pf = get_bc() != 0;
  cf = ncf;
  wz += 1;

SI void cpd() { cpi(); dhl(-2); wz -= 2; }

Moving on, we implement port I/O functions that are merely just wrappers over the user-supplied port_in and port_out functions:

// PIO
// Port input to an arbitrary register
SI void inr(u8 * r) {
  *r = port_in(c);
  pf = parity(*r);
  nf = hf = 0;

// Adjust after port I/O
SI void adji() {
  zf = --b == 0;
  nf = 1;
  wz = get_bc() + 1;

SI void ini() { w8(get_hl(), port_in(c)); adji(); }
SI void outi() { port_out(c, r8(get_hl())); adji(); }

SI void ind() { ini(); dhl(-2); wz = get_bc() - 2; }
SI void outd() { outi(); dhl(-2); wz = get_bc() - 2; }

The decimal adjust is probably one of the most complicated devices that the z80 consists of. Let’s define it in pseudocode and then transcribe it into C:

  • Take A as our binary-coded decimal number.
  • If the second digit is “greater” than 9 or the half-carry flag is set, add 6 to the second digit.
  • If the binary-coded decimal number exceeds 99 or the carry flag is set, add 6 to the first digit.
  • If the negative flag is set, subtract the result from 0 and set the half-carry flag to 1 only if it was already set, and the second digit was less than 6. Otherwise, set the half-carry flag to 1 if the last digit is greater than 9.
  • Add the result to the A register.
  • Set the parity flag to the parity of the A register.
  • Perform standard flag adjustments with respect to A.
// Decimal adjust.
SI void daa() {
  u8 bcd = 0;
  if ((a & 0x0F) > 0x09 || hf)
    bcd += 0x06;
  if (a > 0x99 || cf)
    bcd += 0x60, cf = 1;
  if (nf) {
    hf = hf && (a & 0x0F) < 0x06;
    bcd = -bcd;
  } else {
    hf = (a & 0x0F) > 0x09;
  pf = parity(a += bcd);
  xyf1(a); szf8(a);

As a final piece of the puzzle, we simplify updating the WZ register to the sum of the B and D registers:

// Displacement computation. Updates the WZ pair.
SI u16 dp(u16 b, i8 d) _ (wz = b + d)

Implementing the interface

Let’s implement the interface we have defined earlier, starting with some helper functions to execute particular opcodes/prefixed opcodes:

SI void exec(u8 opc);
SI void exec_cb(u8 opc);
SI void exec_cb2(u8 opc, u16 addr);
SI void exec_ed(u8 opc);
SI void exec_ind(u8 opc, u16 * ir);

The init function simply resets the emulator state:

// Reset the emulator state.
void init() {
  pc = ix = iy = wz = 0;
  sp = 0xFFFF;
  // AF = 0xFFFF, zero the rest.
  a = 0xFF;
  sf = zf = 1;
  xf = yf = 1;
  hf = pf = 1;
  nf = cf = 1;
  b = c = d = e = h = l = 0;
  exa = exb = exc = exd = exe = exh = exl = 0;
  i = r = 0;
  iff_set = 0;
  im = 0;
  iff1 = iff2 = 0;
  halted = 0;
  int_pending = nmi_pending = 0;
  int_vec = 0;

The step function executes a single instruction, taking care of the interrupt flip-flops and correct dispatching of the interrupts:

// Execute an instruction and handle interrupts.
void step() {
  exec(halted ? 0x00 : p8());
  if (iff_set) {
    iff_set = 0;
    iff1 = iff2 = 1;
  if (nmi_pending) {
    nmi_pending = halted = iff1 = 0;
    inc_r(); call(0x66); R;
  if (int_pending && iff1) {
    int_pending = halted = 0;
    iff1 = iff2 = 0;
    // Dispatch the interrupt based on the interrupt mode.
    switch (im) {
    case 0: exec(int_vec); R;
    case 1: call(0x38); R;
    case 2: call(r16((i << 8) | int_vec)); R;

Finally, we implement the emulator API for scheduling interrupts:

// Schedule a NMI
void gen_nmi() {
  nmi_pending = 1;

// Schedule an interrupt.
void gen_int(u8 data) {
  int_pending = 1;
  int_vec = data;

The exec function is supposed to execute a regular opcode, or if the opcode is not regular, call the function that handles the approperiate prefixed opcode. The exec_ind function performs the same operation, but on a restricted set of opcodes further modified by the values of the index registers. Because of the length of both of the functions and their dubious value to the blog post as a whole, I have put it in a collapsed code block (TL;DR: Just a massive

// ---------------------------------------

#define H(n, c...) case n: c; break;

// Adjust WZ after operation.
SI void awz1(u16 val) { wz = (a << 8) | LO(val + 1); }

// Execute a regular opcode.
SI void exec(u8 opc) {
  u16 t1; bv t2; u8 t3;
  switch (opc) {
  // ld X, Y block.
  H(0x47, b=a) H(0x40, b=b) H(0x41, b=c) H(0x42, b=d) H(0x43, b=e) H(0x44, b=h) H(0x45, b=l)
  H(0x57, d=a) H(0x50, d=b) H(0x51, d=c) H(0x52, d=d) H(0x53, d=e) H(0x54, d=h) H(0x55, d=l)
  H(0x67, h=a) H(0x60, h=b) H(0x61, h=c) H(0x62, h=d) H(0x63, h=e) H(0x64, h=h) H(0x65, h=l)
  H(0x4F, c=a) H(0x48, c=b) H(0x49, c=c) H(0x4A, c=d) H(0x4B, c=e) H(0x4C, c=h) H(0x4D, c=l)
  H(0x5F, e=a) H(0x58, e=b) H(0x59, e=c) H(0x5A, e=d) H(0x5B, e=e) H(0x5C, e=h) H(0x5D, e=l)
  H(0x6F, l=a) H(0x68, l=b) H(0x69, l=c) H(0x6A, l=d) H(0x6B, l=e) H(0x6C, l=h) H(0x6D, l=l)
  H(0x7F, a=a) H(0x78, a=b) H(0x79, a=c) H(0x7A, a=d) H(0x7B, a=e) H(0x7C, a=h) H(0x7D, a=l)
  // ld X, imm
  H(0x0E, c=p8()) H(0x06, b=p8())
  H(0x1E, e=p8()) H(0x16, d=p8())
  H(0x2E, l=p8()) H(0x26, h=p8())
  H(0x3E, a=p8()) H(0x36, w8(get_hl(),p8()))
  // ld X, (HL) and ld (HL), X
  H(0x6E, l=r8(get_hl()))
  H(0x7E, a=r8(get_hl())) H(0x46, b=r8(get_hl()))
  H(0x4E, c=r8(get_hl())) H(0x56, d=r8(get_hl()))
  H(0x5E, e=r8(get_hl())) H(0x66, h=r8(get_hl()))
  H(0x77, w8(get_hl(),a)) H(0x70, w8(get_hl(),b))
  H(0x71, w8(get_hl(),c)) H(0x72, w8(get_hl(),d))
  H(0x73, w8(get_hl(),e)) H(0x74, w8(get_hl(),h))
  H(0x75, w8(get_hl(),l))
  // ld (bc), a / ld (de), a / ld *word, a
  H(0x02, w8(get_bc(),a);awz1(get_bc()))
  H(0x12, w8(get_de(),a);awz1(get_de()))
  H(0x32, t1=p16();w8(t1,a);awz1(t1))
  // ld a, (bc) / ld a, (de) / ld a, *word
  H(0x0A, a=r8(get_bc());wz=get_bc()+1)
  H(0x1A, a=r8(get_de());wz=get_de()+1)
  H(0x3A, wz=p16()+1;a=r8(wz-1))
  // ld r16, imm word
  H(0x01, set_bc(p16())) H(0x11, set_de(p16()))
  H(0x21, set_hl(p16())) H(0x31, sp=p16())
  // ld r16, *word
  H(0x2A, set_hl(r16(t1=p16()));wz=t1+1)
  H(0x22, w16(t1=p16(),get_hl());wz=t1+1)
  // ld sp, hl
  H(0xF9, sp=get_hl())
  // ex de/(sp), hl
  H(0xEB, t1=get_de();set_de(get_hl());set_hl(t1))
  H(0xE3, t1=r16(sp);w16(sp,get_hl());set_hl(wz=t1))
  // add block
  #define op(opc, src) case opc: a = add8(a, src, 0); break;
  op(0x87, a)
  op(0x80, b)
  op(0x81, c)
  op(0x82, d)
  op(0x83, e)
  op(0x84, h)
  op(0x85, l)
  op(0x86, r8(get_hl()))
  op(0xC6, p8())
  #undef op
  // add with carry
  #define op(opc, src) case opc: a = add8(a, src, cf); break;
  op(0x8F, a)
  op(0x88, b)
  op(0x89, c)
  op(0x8A, d)
  op(0x8B, e)
  op(0x8C, h)
  op(0x8D, l)
  op(0x8E, r8(get_hl()))
  op(0xCE, p8())
  #undef op
  // sub block
  #define op(opc, src) case opc: a = sub8(a, src, 0); break;
  op(0x97, a)
  op(0x90, b)
  op(0x91, c)
  op(0x92, d)
  op(0x93, e)
  op(0x94, h)
  op(0x95, l)
  op(0x96, r8(get_hl()))
  op(0xD6, p8())
  #undef op
  // sub with carry
  #define op(opc, src) case opc: a = sub8(a, src, cf); break;
  op(0x9F, a)
  op(0x98, b)
  op(0x99, c)
  op(0x9A, d)
  op(0x9B, e)
  op(0x9C, h)
  op(0x9D, l)
  op(0x9E, r8(get_hl()))
  op(0xDE, p8())
  #undef op
  // add hl, ...
  H(0x09, addhl(get_bc())) H(0x19, addhl(get_de()))
  H(0x29, addhl(get_hl())) H(0x39, addhl(sp))
  // di, ei, nop, halt
  H(0xF3, iff1=iff2=0)
  H(0xFB, iff_set=1)
  H(0x00, )
  H(0x76, halted=1)
  // inc/dec block.
  H(0x3C, a=inc(a)) H(0x04, b=inc(b))
  H(0x0C, c=inc(c)) H(0x14, d=inc(d))
  H(0x1C, e=inc(e)) H(0x24, h=inc(h))
  H(0x2C, l=inc(l))
  H(0x34, w8(get_hl(),inc(r8(get_hl()))))
  H(0x3D, a=dec(a)) H(0x05, b=dec(b))
  H(0x0D, c=dec(c)) H(0x15, d=dec(d))
  H(0x1D, e=dec(e)) H(0x25, h=dec(h))
  H(0x2D, l=dec(l))
  H(0x35, w8(get_hl(),dec(r8(get_hl()))))
  // inc/dec r16
  H(0x03, dbc(1)) H(0x13, dde(1)) H(0x23, dhl(1)) H(0x33, ++sp)
  H(0x0B, dbc(-1)) H(0x1B, dde(-1)) H(0x2B, dhl(-1)) H(0x3B, --sp)
  // decimal adjust
  H(0x27, daa())
  // cpl/scf/ccf
  H(0x2F, a=~a;nf=hf=1;xyf1(a))
  H(0x37, cf=1;nf=hf=0;xyf1(a))
  H(0x3F, hf=cf;cf=!cf;nf=0;xyf1(a))
  // rlca, rrca, rrl
  H(0x07, cf=a>>7;a=(a<<1)|cf;nf=hf=0;xyf1(a))
  H(0x17, t2=cf;cf=a>>7;a=(a<<1)|t2;nf=hf=0;xyf1(a))
  H(0x0F, cf=a&1;a=(a>>1)|(cf<<7);nf=hf=0;xyf1(a))
  H(0x1F, t2=cf;cf=a&1;a=(a>>1)|(t2<<7);nf=hf=0;xyf1(a))
  // bit operations
  #define op(opA, opB, opC, opD, opE, opH, opL, opDHL, opI, name) \
    case opA: name(a); break; \
    case opB: name(b); break; \
    case opC: name(c); break; \
    case opD: name(d); break; \
    case opE: name(e); break; \
    case opH: name(h); break; \
    case opL: name(l); break; \
    case opDHL: name(r8(get_hl())); break; \
    case opI: name(p8()); break;
  op(0xA7, 0xA0, 0xA1, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6, 0xE6, land)
  op(0xB7, 0xB0, 0xB1, 0xB2, 0xB3, 0xB4, 0xB5, 0xB6, 0xF6, lor)
  op(0xAF, 0xA8, 0xA9, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xEE, lxor)
  op(0xBF, 0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 0xBE, 0xFE, cmpa)
  #undef op
  // jmp imm
  H(0xC3, jmp(p16()))
  // conditional jumps.
  H(0xC2, cjmp(!zf)) H(0xCA, cjmp(zf))
  H(0xD2, cjmp(!cf)) H(0xDA, cjmp(cf))
  H(0xE2, cjmp(!pf)) H(0xEA, cjmp(pf))
  H(0xF2, cjmp(!sf)) H(0xFA, cjmp(sf))
  H(0x10, cjr(--b)) H(0x18, pc+=(i8)p8())
  H(0x20, cjr(!zf)) H(0x28, cjr(zf))
  H(0x30, cjr(!cf)) H(0x38, cjr(cf))
  H(0xE9, pc=get_hl()) H(0xCD, call(p16()))
  // call/ret
  H(0xC4, ccall(!zf)) H(0xCC, ccall(zf))
  H(0xD4, ccall(!cf)) H(0xDC, ccall(cf))
  H(0xE4, ccall(!pf)) H(0xEC, ccall(pf))
  H(0xF4, ccall(!sf)) H(0xFC, ccall(sf))
  H(0xC9, ret())
  H(0xC0, cret(!zf)) H(0xC8, cret(zf))
  H(0xD0, cret(!cf)) H(0xD8, cret(cf))
  H(0xE0, cret(!pf)) H(0xE8, cret(pf))
  H(0xF0, cret(!sf)) H(0xF8, cret(sf))
  // rst - 1 byte calls to fixed address.
  // similar to 8086's `int', except we
  // call the IVT, not the address in it.
  H(0xC7, call(0x00)) H(0xCF, call(0x08))
  H(0xD7, call(0x10)) H(0xDF, call(0x18))
  H(0xE7, call(0x20)) H(0xEF, call(0x28))
  H(0xF7, call(0x30)) H(0xFF, call(0x38))
  // psh
  H(0xC5, psh(get_bc())) H(0xD5, psh(get_de()))
  H(0xE5, psh(get_hl())) H(0xF5, psh(get_af()))
  // pop
  H(0xC1, set_bc(pop()))
  H(0xD1, set_de(pop()))
  H(0xE1, set_hl(pop()))
  H(0xF1, t1=pop();a=t1>>8;set_f(t1&0xFF))
  // in/out
  H(0xDB, t3=a;a=port_in(p8());wz=(t3<<8)|(a+1))
  H(0xD3, t3=p8();port_out(t3,a);wz=(t3+1)|(a<<8))
  // swapping exhange/main registers.
  case 0x08: {
    u8 na = a, nf = get_f();
    a = exa; set_f(exf);
    exa = na; exf = nf;
  case 0xD9: {
    u8 nb = b, nc = c, nd = d, ne = e, nh = h, nl = l;
    b = exb;
    c = exc;
    d = exd;
    e = exe;
    h = exh;
    l = exl;
    exb = nb;
    exc = nc;
    exd = nd;
    exe = ne;
    exh = nh;
    exl = nl;
  H(0xCB, exec_cb(p8())) H(0xDD, exec_ind(p8(), &ix))
  H(0xED, exec_ed(p8())) H(0xFD, exec_ind(p8(), &iy))
  default: fprintf(stderr, "Unknown opcode: 0x%02X\n", opc);

// Execute an opc operating on IX or IY.
SI void exec_ind(u8 opc, u16 * ir) {
  u16 t1;
  #define IHI HI(*ir)
  #define ILO LO(*ir)
  #define IDP dp(*ir, p8())
  switch (opc) {
  // Stack operations.
  H(0xE1, *ir = pop()) H(0xE5, psh(*ir))
  // Jumps
  H(0xE9, jmp(*ir))
  // Arithmetics.
  H(0x09, addiz(ir, get_bc())) H(0x19, addiz(ir, get_de()))
  H(0x29, addiz(ir, *ir)) H(0x39, addiz(ir, sp))
  // hi/lo math.
  // add/adc a, IHI/ILO
  H(0x84, a = add8(a, IHI, 0)) H(0x85, a = add8(a, ILO, 0))
  H(0x8C, a = add8(a, IHI, cf)) H(0x8D, a = add8(a, ILO, cf))
  // add/adc/sub/sbc a, byte *(ir + imm)
  H(0x86, a = add8(a, r8(IDP), 0)) H(0x8E, a = add8(a, r8(IDP), cf))
  H(0x96, a = sub8(a, r8(IDP), 0)) H(0x9E, a = sub8(a, r8(IDP), cf))
  // sub/sbc a, IHI/ILO
  H(0x94, a = sub8(a, IHI, 0)) H(0x95, a = sub8(a, ILO, 0))
  H(0x9C, a = sub8(a, IHI, cf)) H(0x9D, a = sub8(a, ILO, cf))
  #define op3(opA, opB, opC, f) H(opA, f(r8(IDP))) H(opB, f(IHI)) H(opC, f(ILO))
  op3(0xA6, 0xA4, 0xA5, land)
  op3(0xAE, 0xAC, 0xAD, lxor)
  op3(0xB6, 0xB4, 0xB5, lor)
  op3(0xBE, 0xBC, 0xBD, cmpa)
  #undef op3
  H(0x23, ++*ir) H(0x2B, --*ir)
  // inc/dec *(ix/iy + imm)
  H(0x34, t1=IDP;w8(t1,inc(r8(t1))))
  H(0x35, t1=IDP;w8(t1,dec(r8(t1))))
  // inc/dec IHI/ILO
  H(0x24, *ir=ILO|(inc(IHI)<<8))
  H(0x25, *ir=ILO|(dec(IHI)<<8))
  H(0x2C, *ir=(IHI<<8)|inc(ILO))
  H(0x2D, *ir=(IHI<<8)|dec(ILO))
  // loading IX/IY.
  H(0x2A, *ir=r16(p16()))
  H(0x22, w16(p16(),*ir))
  H(0x21, *ir=p16())
  // Load imm to (ix/iy + imm)
  H(0x36, t1=IDP;w8(t1,p8()))
  // ld (ix/iy + imm), r8
  H(0x70, w8(IDP,b)) H(0x71, w8(IDP,c))
  H(0x72, w8(IDP,d)) H(0x73, w8(IDP,e))
  H(0x74, w8(IDP,h)) H(0x75, w8(IDP,l))
  H(0x77, w8(IDP,a))
  // loading to registers.
  #define op(ob, oc, od, oe, oh, ol, oa, var) \
    H(ob, b = var) H(oc, c = var) H(od, d = var) H(oe, e = var) \
    H(oh, h = var) H(ol, l = var) H(oa, a = var)
  #define op2(ob, oc, od, oe, oa, var) \
    H(ob, b = var) H(oc, c = var) H(od, d = var) H(oe, e = var) \
    H(oa, a = var)
  op(0x46, 0x4E, 0x56, 0x5E, 0x66, 0x6E, 0x7E, r8(IDP))
  op2(0x44, 0x4C, 0x54, 0x5C, 0x7C, IHI)
  op2(0x45, 0x4D, 0x55, 0x5D, 0x7D, ILO)
  #undef op
  #undef op2
  // ld IHI/ILO r/imm/IHI/ILO
  H(0x67, *ir=ILO|(a<<8))
  H(0x60, *ir=ILO|(b<<8))
  H(0x61, *ir=ILO|(c<<8))
  H(0x62, *ir=ILO|(d<<8))
  H(0x63, *ir=ILO|(e<<8))
  H(0x26, *ir=ILO|(p8()<<8))
  H(0x64, )
  H(0x65, *ir=(ILO<<8)|ILO)
  H(0x6F, *ir=(IHI<<8)|a)
  H(0x68, *ir=(IHI<<8)|b)
  H(0x69, *ir=(IHI<<8)|c)
  H(0x6A, *ir=(IHI<<8)|d)
  H(0x6B, *ir=(IHI<<8)|e)
  H(0x2E, *ir=(IHI<<8)|p8())
  H(0x6D, )
  H(0x6C, *ir=(IHI<<8)|IHI)
  // ld sp, ix/iy
  H(0xF9, sp=*ir)
  // ex (sp), ix/iy
  H(0xE3, t1=r16(sp);w16(sp,*ir);wz=*ir=t1)
  // two byte prefix
  H(0xCB, t1=IDP;exec_cb2(p8(),t1))
    // non-prefixed opc & decrement r
    r = (r & 0x80) | ((r - 1) & 0x7f);
  #undef IDP
  #undef ILO
  #undef IHI

One way more interesting part is the implementation of the 0xCB prefix opcode executors: they have a very unique encoding implemented below.

// executes a CB opcode
SI void exec_cb(u8 opc) {
  // Extract some information about what is about to happen:
  u8 dk = (opc >> 6) & 0b11; // operation kind
  u8 da = (opc >> 3) & 0b111; // auxiliary / op0 kind type
  u8 dr = opc & 0b111; // data
  // auxiliary storage for data under hl.
  u8 hlr = 0;
  // obtain pointer to the storage.
  u8* reg = 0;
  switch (dr) {
  H(0,reg=&b) H(1,reg=&c) H(2,reg=&d) H(3,reg=&e) H(4,reg=&h)
  H(5,reg=&l) H(6, hlr=r8(get_hl());reg=&hlr) H(7,reg=&a)
  switch (dk) {
  // Rotation
  case 0:
    switch (da) {
    #define op(n,f) case n: *reg = f(*reg); break;
    op(0, rlc) op(1, rrc)
    op(2, rl) op(3, rr)
    op(4, sla) op(5, sra)
    op(6, sll) op(7, srl)
    #undef op
  // Testing bits
  case 1:
    bt(*reg, da);
    // Odd edge case. See the WZ comment.
    if (dr == 6)
  H(2, *reg&=~(1<<da))
  H(3, *reg|=1<<da)
  // Write back to hl ptr if needed.
  if (reg == &hlr)
    w8(get_hl(), hlr);

Since the version of this function that depends on the IX and IY is almost analogically the same, I decided to also collapse it in a code block.

SI void exec_cb2(u8 opc, u16 addr) {
  u8 val = r8(addr);
  u8 res = 0;
  u8 dk = (opc >> 6) & 0b11; // op kind
  u8 da = (opc >> 3) & 0b111; // auxiliary / op0 kind type
  u8 dr = opc & 0b111; // data
  switch (dk) {
  // Rotation
  case 0:
    switch (da) {
    #define op(n,f) case n: res = f(val); break;
    op(0, rlc) op(1, rrc)
    op(2, rl) op(3, rr)
    op(4, sla) op(5, sra)
    op(6, sll) op(7, srl)
    #undef op
  case 1:
    res = bt(val, da);
  H(2, res=val&~(1<<da))
  H(3, res=val|(1<<da))
  default: fprintf(stderr, "Invalid IX/IY CB-prefixed opcode: %02X\n", opc);
  if (dk != 1 && dr != 6)
    switch (dr) {
    H(0, b=res) H(1, c=res) H(2, d=res) H(3, e=res)
    H(4, h=res) H(5, l=res) H(6, w8(get_hl(), res))
    H(7, a=res)
  if (dk != 1)
    w8(addr, res);

The final part of our emulator is the 0xED opcode handling (primarily I/O, memory, some obscure fluff), implemented below:

SI void rot_ep() {
  nf = hf = 0;
  pf = parity(a);
  wz = get_hl() + 1;

// Execute a ED opcode.
SI void exec_ed(u8 opc) {
  u8 t1; u16 t2;
  switch (opc) {
  H(0x47, i=a)
  H(0x4F, r=a)
  // ld a, i/r
  H(0x57, a=i;szf8(a);hf=nf=0;pf=iff2)
  H(0x5F, a=r;szf8(a);hf=nf=0;pf=iff2)
  // retn
  case 0x45: case 0x55: case 0x65: case 0x75:
  case 0x5D: case 0x6D: case 0x7D:
    iff1 = iff2; ret(); break;
  // reti
  H(0x4D, ret())
  // ldi & ldir, ldd & lddr
  H(0xA0, ldi())
  H(0xB0, ldi();if(get_bc())wz=--pc,--pc;)
  H(0xA8, ldd())
  H(0xB8, ldd();if(get_bc())wz=--pc,--pc;)
  // cpi & cpd
  H(0xA1, cpi()) H(0xA9, cpd())
  // cpir & cpdr
  H(0xB1, cpi();if(get_bc()&&!zf)wz=--pc,--pc;else++wz)
  H(0xB9, cpd();if(get_bc()&&!zf)pc-=2;else++wz)
  // in r, (c)
  H(0x40, inr(&b)) H(0x48, inr(&c))
  H(0x50, inr(&d)) H(0x58, inr(&e))
  H(0x60, inr(&h)) H(0x68, inr(&l))
  // in + discard
  H(0x70, inr(&t1));
  // in a, (c). note different wz behavior
  H(0x78, inr(&a);wz=get_bc()+1)
  // ini/ind
  H(0xA2, ini()) H(0xAA, ind())
  // inir / indr
  H(0xB2, ini();if(b)pc-=2)
  H(0xBA, ind();if(b)pc-=2)
  // out
  H(0x41, port_out(c, b)) H(0x49, port_out(c, c))
  H(0x51, port_out(c, d)) H(0x59, port_out(c, e))
  H(0x61, port_out(c, h)) H(0x69, port_out(c, l))
  H(0x71, port_out(c, 0))
  // again, out a is special-cased for wz.
  H(0x79, port_out(c,a);wz=get_bc()+1)
  // outi/outd + repeated versions
  H(0xA3, outi()) H(0xAB, outd())
  H(0xB3, outi();if(b)pc-=2)
  H(0xBB, outd();if(b)pc-=2)
  // sbc/adc hl, r16
  H(0x42, sbchl(get_bc())) H(0x52, sbchl(get_de()))
  H(0x62, sbchl(get_hl())) H(0x72, sbchl(sp))
  H(0x4A, adchl(get_bc())) H(0x5A, adchl(get_de()))
  H(0x6A, adchl(get_hl())) H(0x7A, adchl(sp))
  // ld [imm], r16
  H(0x43, w16(t2=p16(),get_bc());wz=t2+1)
  H(0x53, w16(t2=p16(),get_de());wz=t2+1)
  H(0x63, w16(t2=p16(),get_hl());wz=t2+1)
  H(0x73, w16(t2=p16(),sp);wz=t2+1)
  // ld r16, [imm]
  H(0x4B, set_bc(r16(t2=p16()));wz=t2+1)
  H(0x5B, set_de(r16(t2=p16()));wz=t2+1)
  H(0x6B, set_hl(r16(t2=p16()));wz=t2+1)
  H(0x7B, sp = r16(t2=p16());wz=t2+1)
  // neg
  case 0x44: case 0x54: case 0x64: case 0x74:
  case 0x4C: case 0x5C: case 0x6C: case 0x7C:
    a = sub8(0, a, 0);
  // im x
  case 0x46: case 0x66: im = 0; break;
  case 0x56: case 0x76: im = 1; break;
  case 0x5E: case 0x7E: im = 2; break;
  // rrd / rld
  case 0x67: {
    u8 na = a, val = r8(get_hl());
    a = (na & 0xF0) | (val & 0xF);
    w8(get_hl(), (val >> 4) | (na << 4));
  case 0x6F: {
    u8 na = a, val = r8(get_hl());
    a = (na & 0xF0) | (val >> 4);
    w8(get_hl(), (val << 4) | (na & 0xF));
  default: fprintf(stderr, "unknown ED opcode: %02X\n", opc); break;

#undef bit
#undef H
#undef HI
#undef LO

Emulator testing

I have been using the zexall and zexdoc tests to verify the emulator and its implementation of the undocumented capabilities. Writing a client application for this purpose turned out to be very simple:

#include "../z80.c"

#define MEM_SIZE 0x10000

static int load_file(const char* filename, u16 addr) {
  FILE* f = fopen(filename, "r");
  fseek(f, 0, SEEK_END);
  size_t s = ftell(f);

  if (s + addr >= MEM_SIZE) {
    fprintf(stderr, "%s too big to fit in memory.\n", filename);
    R 1;

  fread(m + addr, sizeof(u8), s, f);
  R 0;

u8 port_in(u8 port) { R 0; }

void port_out(u8 port, u8 val) {
  if(val == 1) {
    // Service 2: print character.
    if (c == 2) {
      printf("%c", e);

    // Service 9: print a '$'-terminated string.
    if (c == 9) {
      u16 addr = (d << 8) | e;
      while (r8(addr) != '$');

static void run_test(const char* filename) {
  memset(m, 0, MEM_SIZE);

  if (load_file(filename, 0x100) != 0)

  pc = 0x100;

  // write 1 to port 0 - handle BDOS services.
  // ld a, 1
  // out 0, a
  // ret
  m[5] = 0x3E;
  m[6] = 0x01;
  m[7] = 0xD3;
  m[8] = 0x00;
  m[9] = 0xC9;

  do step(); while(pc);

int main(void) {
  m = malloc(MEM_SIZE);


All tests pass, meaning that we are ready for something bigger!


To run MS BASIC we slighly need to modify our code. First, MS BASIC will expect the input to be provided in raw format, so we need to tweak the terminal via termios accordingly:

#include "../z80.c"
#include <termios.h>
#include <sys/ioctl.h>

#define MEM_SIZE 0x10000

void erm() {
  struct termios term;
  tcgetattr(0, &term);
  term.c_lflag &= ~(ICANON | ECHO);
  tcsetattr(0, TCSANOW, &term);

void drm() {
  struct termios term;
  tcgetattr(0, &term);
  term.c_lflag |= ICANON | ECHO;
  tcsetattr(0, TCSANOW, &term);

int crlf = 0;

_Bool kbhit() {
    int bw;
    ioctl(0, FIONREAD, &bw);
    return bw > 0;

Next up, we copy over our standard load_file function from the zexall/zexdoc testing suite client and stub out port I/O (unused):

static int load_file(const char* filename, u16 addr) {
  FILE* f = fopen(filename, "r");
  fseek(f, 0, SEEK_END);
  size_t s = ftell(f);

  if (s + addr >= MEM_SIZE) {
    fprintf(stderr, "%s too big to fit in memory.\n", filename);
    R 1;

  fread(m + addr, sizeof(u8), s, f);
  R 0;

// Stubs.
u8 port_in(u8 port) { R 0; }
void port_out(u8 port, u8 val) { }

At this point, we are expected to implement a few BDOS routines to handle the input and output. The code below accomplishes this and handles restarts.

int basic_started = 0;

// Initialisation routine.
void rst00() {
  sp = 0x20ED;
  printf("BasicBIOS for Microsoft BASIC v4.7 by Kamila Szewczyk.\n");
    goto cold_start;
  printf("Cold/warm boot? (C/W) ");
  int c = getchar();
  if(c == 'W')
    goto warm_start;

    basic_started = 1;
    pc = 0x0150;
    pc = 0x0153;

void rst08() { putchar(a); ret(); }
void rst10() {
  if(crlf) {
    crlf = 0;
    a = '\n';
    ret(); return;
  a = getchar();
  if(a == '\n')
    a = '\r', crlf = 1;
  // Hack!
  if(a == 127) { a = '\b'; printf("\b "); }
void rst18() { zf = !kbhit(); ret(); }
void rst38() { ret(); } // interrupt stub.

Finally, we load the BASIC interpreter into memory and run it, accordingly supplying the BDOS service support:

static void run_basic(const char* filename) {
  memset(m, 0, MEM_SIZE);

  if (load_file(filename, 0x150) != 0)

  pc = 0x0000;

  do {
    case 0x0000: rst00(); break; // rst 0x00
    case 0x0008: rst08(); break; // rst 0x08
    case 0x0010: rst10(); break; // rst 0x10
    case 0x0018: rst18(); break; // rst 0x18
    case 0x0038: rst38(); break; // rst 0x38
    default: step();
  } while(pc);

int main(void) {
  m = malloc(MEM_SIZE);


Finally, I ran the emulator and played a little with the classy MS BASIC:

A MS BASIC sine wave! Kind of!


In this post, we have implemented a Z80 emulator in C. We have also used it to run the zexall/zexdoc tests (all of which pass) and the MS BASIC interpreter. The code is available on GitHub: