I'm implementing a 6502 emulator in C# as part of a project to emulate the
Nintendo Entertainment System.
I've implemented all the 6502 opcodes, and now I'm going back through and
implementing cycle counting, so that each instruction consumes the correct
number of clock cycles.
On the 6502, some instructions take one extra cycle if a "page boundary is
crossed". Different references use slightly different wording, but none which
I've found so far really clearly specify exactly what constitutes a page
boundary crossing. Take Post-indexed indirect addressing:
The CPU takes the byte following the opcode. The CPU goes to the specified
zeropage address and reads a word. It adds the value of the Y register to that.
It reads a byte from the location specified by the resulting 16 bit value.
- There could be a page boundary between the opcode and the operand.
- I could cross a page boundary going to the zero page from the operand (or
the opcode, or the start of the next instruction).
- The word I read there could span a page boundary by starting at address
0x**FF.
- Adding the Y register to that word could cause a page boundary crossing.
Of those, only 3 and 4 seem likely candidates for the "page boundary
crossing" we're interested in.
Reading 64doc, the
section "Absolute indexed addressing" describes what the CPU's up to at each
cycle during one of the absolute indexed addressing read instructions. It
implies that the "type 4" page boundary crossing in the list above is the type
that causes the CPU to take an extra cycle for the instruction: if adding an
offset to an address causes a page boundary crossing, then the instruction takes
1 extra cycle.
This model has good explanatory power too. The LDA (load accumulator)
instruction has 6 opcodes, each for a different addressing mode. Their cycle
timings are 3, 4, 6, 5+,4, 4+ (where '+' means an optional extra cycle if a page
boundary is crossed). The linked document above says that the 6502
optimistically fetches from the calculated address assuming there was no page
boundary crossing, and then spends the extra cycle, if necessary, re-fetching
from the correct page if it guessed wrong. Look now at the STA (store
accumulator) instruction. Its cycle timings for the same addressing modes are
3,4,6,6,4,5. Notice that the timings are identical, except that we always get
charged the extra cycle. Because STA writes to memory, the CPU can't
optimistically write to its best guess of the desired address, and then write
again to the correct address if it turns out it guessed wrong; very sensibly,
the store instructions always wait for the address to be calculated with
certainty before going ahead and writing to the bus.