Skip to content

Commit 6f7ab51

Browse files
pinobatchISSOtm
andauthored
OAM DMA: speed, CGB buses, di, mode 1, and no soda (#483)
Co-authored-by: Eldred Habert <[email protected]>
1 parent 3f1c43e commit 6f7ab51

File tree

1 file changed

+56
-24
lines changed

1 file changed

+56
-24
lines changed

src/OAM_DMA_Transfer.md

Lines changed: 56 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,62 @@
33

44
## FF46 — DMA: OAM DMA source address & start
55

6-
Writing to this register launches a DMA transfer from ROM or RAM to OAM
7-
(Object Attribute Memory). The written value specifies the
6+
Writing to this register starts a DMA transfer from ROM or RAM to OAM
7+
(Object Attribute Memory). The written value specifies the
88
transfer source address divided by $100, that is, source and destination are:
99

1010
```
1111
Source: $XX00-$XX9F ;XX = $00 to $DF
1212
Destination: $FE00-$FE9F
1313
```
1414

15-
The transfer takes 160 machine cycles: 152 microseconds in normal speed
16-
or 76 microseconds in CGB Double Speed Mode. On DMG, during this time,
17-
the CPU can access only HRAM (memory at $FF80-$FFFE); on CGB, the bus used
18-
by the source area cannot be used (this isn't understood well at the
19-
moment; it's recommended to assume same behavior as DMG). For this
20-
reason, the programmer must copy a short procedure into HRAM, and use
21-
this procedure to start the transfer from inside HRAM, and wait until
22-
the transfer has finished:
15+
The transfer takes 160 machine cycles: 640 dots (1.4 lines) in normal speed,
16+
or 320 dots (0.7 lines) in CGB Double Speed Mode.
17+
This is much faster than a CPU-driven copy.
18+
19+
## OAM DMA bus conflicts
20+
21+
On DMG, during OAM DMA, the CPU can access only HRAM (memory at $FF80-$FFFE).
22+
For this reason, the programmer must copy a short procedure (see below) into HRAM, and use
23+
this procedure to start the transfer **from inside HRAM**, and wait until
24+
the transfer has finished.
25+
26+
On CGB, the cartridge and WRAM are on separate buses.
27+
This means that the CPU can access ROM or cartridge SRAM during OAM DMA from WRAM, or WRAM during OAM DMA from ROM or SRAM.
28+
However, because a `call` writes a return address to the stack, and the stack and variables are usually in WRAM,
29+
it's still recommended to busy-wait in HRAM for DMA to finish even on CGB.
30+
31+
::: warning Interrupts
32+
33+
An interrupt writes a return address to the stack and fetches the interrupt handler's instructions from ROM.
34+
Thus, it's critical to prevent interrupts during OAM DMA, especially in a program that uses timer, serial, or joypad interrupts, since they are not synchronized to the LCD.
35+
This can be done by executing DMA within the VBlank interrupt handler or through the `di` instruction.
36+
37+
:::
38+
39+
While an OAM DMA is in progress, the PPU cannot read OAM properly either.
40+
Thus, most programs execute DMA during [Mode 1](<#STAT modes>), inside or immediately after their VBlank handler.
41+
But it is also possible to execute it during display redraw (Modes 2 and 3),
42+
allowing to display more than 40 objects on the screen (that is, for
43+
example 40 objects in the top half, and other 40 objects in the bottom half of
44+
the screen), at the cost of a couple lines that lack objects.
45+
If the transfer is started during Mode 3, graphical glitches may happen.
46+
47+
The details:
48+
49+
* If OAM DMA is active during OAM scan (mode 2), most PPU revisions read each object
50+
as being off-screen and thus hidden on that line.
51+
* If OAM DMA is active during rendering (mode 3), the PPU reads whatever 16-bit word
52+
the DMA unit is writing to OAM when the object is fetched.
53+
This causes an incorrect tile number and attributes for objects already determined to be in range.
54+
55+
<!-- TODO: find Hacktix test ROM -->
56+
<!-- TODO: keep working on "Red from OAM", a reproducer that races the beam to overwrite tile number and attributes of objects previously seen in Mode 2 -->
57+
58+
## Best practices
59+
60+
This 10-byte routine starts a transfer and waits for it to finish.
61+
Many games copy a routine like it into HRAM and call it during Mode 1.
2362

2463
```rgbasm
2564
run_dma:
@@ -32,30 +71,23 @@ run_dma:
3271
ret
3372
```
3473

35-
Because sprites are not displayed while an OAM DMA transfer is in progress, most
36-
programs execute this procedure from inside their VBlank
37-
handler. But it is also possible to execute it during display redraw (Modes 2 and 3),
38-
allowing to display more than 40 sprites on the screen (that is, for
39-
example 40 sprites in the top half, and other 40 sprites in the bottom half of
40-
the screen), at the cost of a couple lines that lack sprites due to the fact that
41-
during those couple lines the PPU reads OAM as $FF. Besides, graphic glitches may
42-
happen if an OAM DMA transfer is started during Mode 3.
43-
44-
A more compact procedure is
74+
If HRAM is tight, this more compact procedure saves 5 bytes of HRAM
75+
at the cost of a few cycles spent jumping to the tail in HRAM.
4576

4677
```rgbasm
4778
run_dma: ; This part is in ROM
4879
ld a, HIGH(start address)
4980
ld bc, $2846 ; B: wait time; C: LOW($FF46)
50-
jp run_dma_hrampart
81+
jp run_dma_tail
5182
52-
run_dma_hrampart:
83+
run_dma_tail: ; This part is in HRAM
5384
ldh [c], a
5485
.wait
5586
dec b
5687
jr nz, .wait
5788
ret
5889
```
5990

60-
This saves 5 bytes of HRAM, but is slightly slower in most cases due to
61-
the jump into the HRAM part.
91+
If starting a mid-frame transfer, wait for Mode 0 first
92+
so that the transfer cleanly overlaps Mode 2 on the next two lines,
93+
making objects invisible on those lines.

0 commit comments

Comments
 (0)