Skip to content

Commit f8b09ec

Browse files
authored
Reloc addrs docs (#467)
* Add documentation for the `reloc_addrs` format * Write symbol_addrs.txt and reloc_addrs.txt in create_config.py * version bump * black * forgor this * Mention MIPS_26 and MIPS_PC16 * Simplify line
1 parent a6a7542 commit f8b09ec

File tree

6 files changed

+375
-39
lines changed

6 files changed

+375
-39
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
* Add `pair_segment` option to segments:
66
* Allows pairing the sections of two different segments together, making cross-segment rodata migration possible.
7+
* Now `create_config` for N64 games can create basic `symbol_addrs.txt` and `reloc_addrs.txt` files from the information inferred from its analysis.
78

89
### 0.34.3
910

docs/Advanced-Reloc.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# Advanced: Manual relocation handling
2+
3+
Sometimes the disassembler is unable to determine the correct symbol to be referenced by a given function or data symbol. For cases like this you can manually tell the disassembler what symbol should be used on each specific instance and how it should be referenced in a `reloc_addrs.txt` file.
4+
5+
Assembly function or data symbol references other symbol by using relocations (relocs for short). This section will explain how to override the automatic relocations generated by the disassembler.
6+
7+
By default splat will read the `reloc_addrs.txt` file from the root of the project, if any, but it is possible to rename the file, move it to a different folder or even provide multiple files for organizational purposes. Please refer to the [`reloc_addrs_path`](Configuration.md#reloc_addrs_path) for more information.
8+
9+
## Syntax
10+
11+
To override relocs you need to have at least one `reloc_addrs.txt` file. Each line of this file corresponds to a reloc override entry. Empty lines are allowed and comments are used with `//`.
12+
13+
The format for defining an entry is:
14+
15+
```ini
16+
rom:0x04B440 reloc:MIPS_HI16 symbol:BonusWait addend:-0x3
17+
```
18+
19+
Each attribute is defined as follows:
20+
21+
- `rom`: The rom address of the instruction or data reference you want to affect is located.
22+
- `symbol`: The symbol that you want to reference at this given place.
23+
- `addend`: Optional. A displacement into the symbol. It can be either positive or negative.
24+
- `reloc`: The relocation kind to be used. It mirrors the standard MIPS relocations. The following values are valid for this attribute:
25+
- Function relocs:
26+
- `MIPS_HI16`: Corresponds to the `%hi` reloc operator.
27+
- `MIPS_LO16`: Corresponds to the `%lo` reloc operator.
28+
- `MIPS_GPREL16`: Corresponds to the `%gp_rel` reloc operator.
29+
- `MIPS_GOT16`: Corresponds to the `%got` reloc operator.
30+
- `MIPS_CALL16`: Corresponds to the `%call16` reloc operator.
31+
- `MIPS_GOT_HI16`: Corresponds to the `%got_hi` reloc operator.
32+
- `MIPS_GOT_LO16`: Corresponds to the `%got_lo` reloc operator.
33+
- `MIPS_CALL_HI16`: Corresponds to the `%call_hi` reloc operator.
34+
- `MIPS_CALL_LO16`: Corresponds to the `%call_lo` reloc operator.
35+
- `MIPS_26`: No direct operator. Used in `jal` (jump and link) and `j` (jump) instructions.
36+
- `MIPS_PC16`: No direct operator. Used in branch instructions.
37+
- Data relocs:
38+
- `MIPS_32`: Corresponds to `.word`.
39+
- `MIPS_GPREL32`: Corresponds to `.gpword`.
40+
- No reloc:
41+
- `MIPS_NONE`: Makes no reloc to be used at all, making the disassembler to use the raw value instead. Useful for fake positives on the symbol detector.
42+
43+
## Examples
44+
45+
Next are a common patterns where providing manual relocs is useful.
46+
47+
### Negative offsets
48+
49+
A common that can be seen on many C functions is to access a global array with an index substraction, making the compiler to emit an assembly access that ends up pointing to a different index than the one would expect.
50+
51+
Take for example the following C code.
52+
53+
```c
54+
int some_sym = 0;
55+
int some_array[3] = {0};
56+
57+
int get_value(int index) {
58+
return some_array[index - 1];
59+
}
60+
```
61+
62+
Some compilers with some optimization flags enabled may optimize the generated assembly into something that doesn't need to actually perform the `- 1` subtraction, as seen in the following example assembly:
63+
64+
```mips
65+
sll $t6, $a0, 0x2
66+
lui $v0, %hi(some_array - 0x4)
67+
addu $v0, $v0, $t6
68+
jr $ra
69+
lw $v0, %lo(some_array - 0x4)($v0)
70+
```
71+
72+
This then gets build and linked into a final binary. That binary won't have those explicit relocations, it will just have raw addresses, meaning that code will techincally reference the symbol "behind" the array we want to actually use, in this case we end up refering to `some_sym` instead.
73+
74+
A direct disassembly of this assembly would look like similar to the following assembly. Note there's no mention of `some_array` anywhere.
75+
76+
```mips
77+
/* 0000 80000000 00047080 */ sll $t6, $a0, 0x2
78+
/* 0004 80000004 3C028000 */ lui $v0, %hi(some_sym)
79+
/* 0008 80000008 004E1021 */ addu $v0, $v0, $t6
80+
/* 000C 8000000C 03E00008 */ jr $ra
81+
/* 0010 80000010 8C420020 */ lw $v0, %lo(some_sym)($v0)
82+
```
83+
84+
While this assembly is very likely to build to a matching binary again, it may cause some issues under a few circunstances, like when a poject aims to achieve proper shiftability while it haven't been completely matched.
85+
86+
To fix this disassembly it is needed to provide reloc entries in a `reloc_addrs.txt` file like the following:
87+
88+
```ini
89+
rom:0x0004 reloc:MIPS_HI16 symbol:some_array addend:-0x4
90+
rom:0x0010 reloc:MIPS_LO16 symbol:some_array addend:-0x4
91+
```
92+
93+
This tells the disassembler to reference `some_array - 0x4` at the instruction at rom address `0x0004` (the `lui` instruction), and that it should use the `%hi` reloc operator to do so. A similar logic is used for the instruction at rom address `0x0010` (the `lw`), but instead we told it to use the `%lw` reloc operator instead. This generates an assembly like the following:
94+
95+
```mips
96+
/* 0000 80000000 00047080 */ sll $t6, $a0, 0x2
97+
/* 0004 80000004 3C028000 */ lui $v0, %hi(some_array - 0x4)
98+
/* 0008 80000008 004E1021 */ addu $v0, $v0, $t6
99+
/* 000C 8000000C 03E00008 */ jr $ra
100+
/* 0010 80000010 8C420020 */ lw $v0, %lo(some_array - 0x4)($v0)
101+
```
102+
103+
### Segment symbols in code
104+
105+
A common pattern seen on N64 projects is when the code references special symbols known as "segment symbols" to load some fragments or segments of the ROM into VRAM. N64 games do this because it isn't possible to load the whole ROM into VRAM, and also the N64 games lack a proper filesystem.
106+
107+
Segment symbols exist to describe addresses or sizes of specific regions of the ROM, like the start and end of the ROM addresses of a given segment, the expected start and end of the VRAM addresses of that segment, the size of the segment, etc.
108+
109+
Sadly the disassembler is unable to properly disambiguate these segment addresses from other symbols or even plain numbers, so disassembly of code referencing segment symbols tend to be far from optimal.
110+
111+
Take for example the following C code:
112+
113+
```c
114+
/* segment symbols */
115+
u32 segment_menu_ROM_START[];
116+
u32 segment_menu_ROM_END[];
117+
118+
void load_menu_segment(void *dst) {
119+
load_segment(segment_menu_ROM_START, (u32)segment_menu_ROM_END - (u32)segment_menu_ROM_START, dst);
120+
}
121+
```
122+
123+
After compiling this code, the generated assembly would look like the following:
124+
125+
```mips
126+
addiu $sp, $sp, -0x18
127+
sw $ra, 0x10($sp)
128+
addu $a2, $a0, $zero
129+
lui $a0, %hi(segment_menu_ROM_START)
130+
addiu $a0, $a0, %lo(segment_menu_ROM_START)
131+
lui $a1, %hi(segment_menu_ROM_END)
132+
addiu $a1, $a1, %lo(segment_menu_ROM_END)
133+
jal load_segment
134+
subu $a1, $a1, $a0
135+
lw $ra, 0x10($sp)
136+
addiu $sp, $sp, 0x18
137+
jr $ra
138+
nop
139+
```
140+
141+
But when this gets linked into a ROM those symbols get replaced with their raw numeric values, and since they point to rom addresses or vram addresses that are at the boundaries of each segment, the disassembler strugles symbolizing them, so it ends up symbolizing them into generic symbols, like the following:
142+
143+
```mips
144+
/* 0000 80000000 27BDFFE8 */ addiu $sp, $sp, -0x18
145+
/* 0004 80000004 AFBF0010 */ sw $ra, 0x10($sp)
146+
/* 0008 80000008 00803021 */ addu $a2, $a0, $zero
147+
/* 000C 8000000C 3C040000 */ lui $a0, %hi(D_000FB480)
148+
/* 0010 80000010 24840000 */ addiu $a0, $a0, %lo(D_000FB480)
149+
/* 0014 80000014 3C050000 */ lui $a1, %hi(D_00101A80)
150+
/* 0018 80000018 24A50000 */ addiu $a1, $a1, %lo(D_00101A80)
151+
/* 001C 8000001C 0C000000 */ jal load_segment
152+
/* 0020 80000020 00A42823 */ subu $a1, $a1, $a0
153+
/* 0024 80000024 8FBF0010 */ lw $ra, 0x10($sp)
154+
/* 0028 80000028 27BD0018 */ addiu $sp, $sp, 0x18
155+
/* 002C 8000002C 03E00008 */ jr $ra
156+
/* 0030 80000030 00000000 */ nop
157+
```
158+
159+
To fix the disassembly and make it use the proper segment symbols, we add more entries to the reloc_addrs.txt file. Note here we don't need to specify an `addend`, since we just want to refer to the symbol without any other calculation.
160+
161+
```ini
162+
rom:0x000C reloc:MIPS_HI16 symbol:segment_menu_ROM_START
163+
rom:0x0010 reloc:MIPS_LO16 symbol:segment_menu_ROM_START
164+
rom:0x0014 reloc:MIPS_HI16 symbol:segment_menu_ROM_END
165+
rom:0x0018 reloc:MIPS_LO16 symbol:segment_menu_ROM_END
166+
```
167+
168+
### Segment symbols in data
169+
170+
On the other side, segment symbols may be referenced in data structures like arrays or structs. [Here you can read a bit more about what segment symbols are](#segment-symbols-in-code).
171+
172+
Take for example the following C code and its corresponding compiled assembly:
173+
174+
```c
175+
u32 segment_menu_ROM_START[];
176+
u32 segment_menu_ROM_END[];
177+
178+
u32 *menu_addresses[] = {
179+
segment_menu_ROM_START, segment_menu_ROM_END,
180+
}
181+
```
182+
183+
```mips
184+
.word segment_menu_ROM_START
185+
.word segment_menu_ROM_END
186+
```
187+
188+
But when we try disassembling a rom with this data, the disassembler won't be able to recognize these as segment symbols:
189+
190+
```mips
191+
/* 0100 80000100 000FB480 */ .word D_000FB480 # It may use a D_ symbol
192+
/* 0104 80000104 000FB480 */ .word 0x000FB480 # Or it may even fail completely to symbolize it
193+
```
194+
195+
In this case we can use the `MIPS_32` reloc in our `reloc_addrs` file to fix this kind of issue.
196+
197+
```ini
198+
rom:0x0100 reloc:MIPS_32 symbol:segment_menu_ROM_START
199+
rom:0x0104 reloc:MIPS_32 symbol:segment_menu_ROM_END
200+
```

docs/Configuration.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,15 @@ symbol_addrs_path: path/to/symbol_addrs
203203

204204

205205

206-
### reloc_addrs_paths
206+
### reloc_addrs_path
207207

208+
Determines the path to the reloc addresses file(s). A `reloc_addrs` file contains metadata to override relocations within the generated assembly. For more information about the syntax and how to use it refer to the corresponding [reloc_addrs chapter](Advanced-Reloc.md).
208209

210+
It's possible to use more than one file by supplying a list instead of a string.
211+
212+
#### Default
213+
214+
`reloc_addrs.txt`
209215

210216
### build_path
211217
Path that built files will be found. Used for generation of the linker script.

src/splat/scripts/create_config.py

Lines changed: 83 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def create_n64_config(rom_path: Path):
111111
type: header
112112
start: 0x0
113113
114-
- name: boot
114+
- name: ipl3
115115
type: bin
116116
start: 0x40
117117
@@ -122,7 +122,7 @@ def create_n64_config(rom_path: Path):
122122
subsegments:
123123
- [0x1000, hasm]
124124
"""
125-
if rom.entrypoint_info.data_size > 0:
125+
if rom.entrypoint_info.data_size is not None:
126126
segments += f"""\
127127
- [0x{0x1000 + rom.entrypoint_info.entry_size:X}, data]
128128
"""
@@ -139,7 +139,7 @@ def create_n64_config(rom_path: Path):
139139

140140
if rom.entrypoint_info.bss_size is not None:
141141
segments += f"""\
142-
bss_size: 0x{rom.entrypoint_info.bss_size:X}
142+
bss_size: 0x{rom.entrypoint_info.bss_size.value:X}
143143
"""
144144

145145
segments += f"""\
@@ -152,11 +152,13 @@ def create_n64_config(rom_path: Path):
152152
and rom.entrypoint_info.bss_start_address is not None
153153
and first_section_end > main_rom_start
154154
):
155-
bss_start = rom.entrypoint_info.bss_start_address - rom.entry_point + 0x1000
155+
bss_start = (
156+
rom.entrypoint_info.bss_start_address.value - rom.entry_point + 0x1000
157+
)
156158
# first_section_end points to the start of data
157159
segments += f"""\
158160
- [0x{first_section_end:X}, data]
159-
- {{ type: bss, vram: 0x{rom.entrypoint_info.bss_start_address:08X} }}
161+
- {{ type: bss, vram: 0x{rom.entrypoint_info.bss_start_address.value:08X} }}
160162
"""
161163
# Point next segment to the detected end of the main one
162164
first_section_end = bss_start
@@ -180,6 +182,82 @@ def create_n64_config(rom_path: Path):
180182
f.write(header)
181183
f.write(segments)
182184

185+
# Write reloc_addrs.txt file
186+
reloc_addrs = []
187+
if rom.entrypoint_info.bss_start_address is not None:
188+
reloc_addrs.append(
189+
f"rom:0x{rom.entrypoint_info.bss_start_address.rom_hi:06X} reloc:MIPS_HI16 symbol:main_BSS_START"
190+
)
191+
reloc_addrs.append(
192+
f"rom:0x{rom.entrypoint_info.bss_start_address.rom_lo:06X} reloc:MIPS_LO16 symbol:main_BSS_START"
193+
)
194+
reloc_addrs.append("")
195+
if rom.entrypoint_info.bss_size is not None:
196+
reloc_addrs.append(
197+
f"rom:0x{rom.entrypoint_info.bss_size.rom_hi:06X} reloc:MIPS_HI16 symbol:main_BSS_SIZE"
198+
)
199+
reloc_addrs.append(
200+
f"rom:0x{rom.entrypoint_info.bss_size.rom_lo:06X} reloc:MIPS_LO16 symbol:main_BSS_SIZE"
201+
)
202+
reloc_addrs.append("")
203+
if rom.entrypoint_info.bss_end_address is not None:
204+
reloc_addrs.append(
205+
f"rom:0x{rom.entrypoint_info.bss_end_address.rom_hi:06X} reloc:MIPS_HI16 symbol:main_BSS_END"
206+
)
207+
reloc_addrs.append(
208+
f"rom:0x{rom.entrypoint_info.bss_end_address.rom_lo:06X} reloc:MIPS_LO16 symbol:main_BSS_END"
209+
)
210+
reloc_addrs.append("")
211+
if rom.entrypoint_info.stack_top is not None:
212+
reloc_addrs.append(
213+
'// This entry corresponds to the "stack top", which is the end of the array used as the stack for the main segment.'
214+
)
215+
reloc_addrs.append(
216+
"// It is commented out because it was not possible to infer what the start of the stack symbol is, so you'll have to figure it out by yourself."
217+
)
218+
reloc_addrs.append(
219+
"// Once you have found it you can properly name it and specify the length of this stack as the addend value here."
220+
)
221+
reloc_addrs.append(
222+
f"// The address of the end of the stack is 0x{rom.entrypoint_info.stack_top.value:08X}."
223+
)
224+
reloc_addrs.append(
225+
f"// A common size for this stack is 0x2000, so try checking for the address 0x{rom.entrypoint_info.stack_top.value-0x2000:08X}. Note the stack may have a different size."
226+
)
227+
reloc_addrs.append(
228+
f"// rom:0x{rom.entrypoint_info.stack_top.rom_hi:06X} reloc:MIPS_HI16 symbol:main_stack addend:0xXXXX"
229+
)
230+
reloc_addrs.append(
231+
f"// rom:0x{rom.entrypoint_info.stack_top.rom_lo:06X} reloc:MIPS_LO16 symbol:main_stack addend:0xXXXX"
232+
)
233+
reloc_addrs.append("")
234+
if reloc_addrs:
235+
with Path("reloc_addrs.txt").open("w", newline="\n") as f:
236+
print("Writing reloc_addrs.txt")
237+
f.write(
238+
"// Visit https://github.com/ethteck/splat/wiki/Advanced-Reloc for documentation about this file\n"
239+
)
240+
f.write("// entrypoint relocs\n")
241+
contents = "\n".join(reloc_addrs)
242+
f.write(contents)
243+
244+
# Write symbol_addrs.txt file
245+
symbol_addrs = []
246+
symbol_addrs.append(f"entrypoint = 0x{rom.entry_point:08X}; // type:func")
247+
if rom.entrypoint_info.main_address is not None:
248+
symbol_addrs.append(
249+
f"main = 0x{rom.entrypoint_info.main_address.value:08X}; // type:func"
250+
)
251+
if symbol_addrs:
252+
symbol_addrs.append("")
253+
with Path("symbol_addrs.txt").open("w", newline="\n") as f:
254+
print("Writing symbol_addrs.txt")
255+
f.write(
256+
"// Visit https://github.com/ethteck/splat/wiki/Adding-Symbols for documentation about this file\n"
257+
)
258+
contents = "\n".join(symbol_addrs)
259+
f.write(contents)
260+
183261

184262
def create_psx_config(exe_path: Path, exe_bytes: bytes):
185263
exe = psxexeinfo.PsxExe.get_info(exe_path, exe_bytes)

0 commit comments

Comments
 (0)