Skip to content

Commit 31aa52a

Browse files
authored
Merge pull request #560 from stravant/wip-ghidra-works
Implement Ghidra Importer
2 parents ea58420 + d0e4b43 commit 31aa52a

14 files changed

+1859
-1
lines changed

configure.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@
179179
]
180180
if args.debug:
181181
config.ldflags.append("-g") # Or -gdwarf-2 for Wii linkers
182+
config.ldflags.append("-sym full")
182183
if args.map:
183184
config.ldflags.append("-mapunused")
184185
# config.ldflags.append("-listclosure") # For Wii linkers
@@ -226,7 +227,7 @@
226227
# Debug flags
227228
if args.debug:
228229
# Or -sym dwarf-2 for Wii compilers
229-
cflags_base.extend(["-sym on", "-DDEBUG=1"])
230+
cflags_base.extend(["-sym full", "-DDEBUG=1"])
230231
else:
231232
cflags_base.append("-DNDEBUG=1")
232233

ghidra_scripts/README.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Ghidra Importer
2+
3+
## What do you get from using the importer?
4+
5+
`bfbb_import` is a script which take basic symbols from the original game (in symbols.txt), and more detailed symbols from the reverse engineered code we can compile so far, and imports them into a Ghidra for easier reverse engineering.
6+
7+
Results of running the import:
8+
9+
* Full parameter type, return type information, parameter names, global variable types etc are imported for the contents of cpp files listed as `Matching` in `configure.py`:
10+
11+
* ![test](gimport/function_with_return.png)
12+
13+
* All struct types referenced in `Matching` files are imported:
14+
15+
* ![test](gimport/struct_import.png)
16+
17+
* Name and parameter types but _not_ return types are imported for other name mangled functions in `symbols.txt`:
18+
19+
* ![test](gimport/function_with_paramn.png)
20+
21+
* All other remaining symbols from `symbols.txt` are annotated in some way in the main Ghidra listing via labels.
22+
23+
## Import Instructions
24+
25+
### Step 1: Install Ghidra
26+
27+
Download and "install" a recent version of Ghidra from https://github.com/NationalSecurityAgency/ghidra/releases. "Install" here just means unzipping the folder, there is no global install process for Ghidra.
28+
29+
Note: You may need to install the JDK if you don't have it already. You will be prompted for this when running Ghidra if you don't have it.
30+
31+
### Step 2: Install the DOL Extension
32+
33+
Ghidra can't understand Gamecube DOL files out of the box. Install the Ghidra Gamecube loader from https://github.com/Cuyler36/Ghidra-GameCube-Loader/releases.
34+
35+
### Step 3: Import the DOL
36+
37+
Open Ghidra and `File > Import File...`, selecting the DOL file you put in `bfbb/orig/GQPE78/sys/main.dol` when setting up the repo.
38+
39+
Open up the imported file and ***allow analysis to run when prompted***. This importer script expects the functions to already be created by analysis.
40+
41+
### Step 4: Install Ghidrathon
42+
43+
We need to give Ghidra the ability to run Python 3 code, we do this with the Ghidrathon extension. Download Ghidrathon from the releases page: https://github.com/mandiant/Ghidrathon/releases
44+
45+
Follow the installation instructions on that page. You probably don't need to create a venv in this case, but you do need to run `ghidrathon_configure.py`.
46+
47+
### Step 5: Install Importer Script Dependencies
48+
49+
The importer script has a single additional Python package dependency on `elftools` to parse the elf file. Install it with the following command:
50+
51+
```bash
52+
pip install pyelftools
53+
```
54+
55+
### Step 6: Add Script Directory
56+
57+
In Ghidra, `Window > Script Manager` to open the script manager. This is what we ill use to run the script.
58+
59+
In the script manager, at the top right, click the "Manage Script Directories" button: ![image](manage_script_directories.png)
60+
61+
Click `+` at the top right of the script manager, and add `bfbb/ghidra_scripts` to the list of script directories.
62+
63+
### Step 7: Run the Importer
64+
65+
In the Script Manager, you should now be able to filter for `bfbb_import.py`. Select it and run it through the context menu or the run button at the top of the Script Manager.
66+
67+
Importing will take as long as a clean build does because we temporarily have to make a debug build of the executable to get the parameter names and other info from already reverse engineered functions (the script will restore your previous build settings after doing so)
68+
69+
### Step 8: (Optionally) Change Additional Files to Matching
70+
71+
The importer script only imports types referenced in files linked into the final DOL file the bulid generates. To generate matching DOLs, the build normally only links compilation units which are 100% matching.
72+
73+
If you're working on a cpp file with structures you want to import into Ghidra, you're not bound by this limitation! As long as enough contents are defined in the file you're working on for it to link you can import things from it.
74+
75+
Temporarily change the file in question to "Matching" in `configure.py`, and re-run the importer. Note that if you build with the file changed to Matching when it is not a 100% match yet, this will give you a "not matching" error at the end of the build. That's expected: The import will still be able to import the symbols correctly regardless because it uses the memory mapping in symbols.txt.
76+
77+
### Step 9: Enjoy The Results
78+
79+
Most functions should now have name / parameter info rather than just being FUN_xxxxxxxx. No more having to look stuff up in symbols.txt!
80+
81+
<!-- ## Ghidra Basics
82+
83+
TODO: Basic guide on using Ghidra -->

ghidra_scripts/bfbb_import.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import gimport.extract_info
2+
import gimport.import_info
3+
4+
if __name__ == "__main__":
5+
extracted_info = gimport.extract_info.extract_info()
6+
print("Importing info into Ghidra")
7+
gimport.import_info.import_info(currentProgram(), extracted_info)

ghidra_scripts/gimport/__init__.py

Whitespace-only changes.

ghidra_scripts/gimport/demangle.py

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
from typing import Tuple, List
2+
import re
3+
from .dwarf import DW_FT, DwarfSubscriptDataItem
4+
from .gtypes import GType, GPointerType, GFundType, GArrayType
5+
6+
7+
SPECIAL_NAME_TO_OPERATOR = {
8+
"__as": "=",
9+
"__ml": "*",
10+
"__amu": "*=",
11+
"__mi": "-",
12+
"__ami": "-=",
13+
"__dv": "/",
14+
"__adv": "/=",
15+
"__pl": "+",
16+
"__apl": "+=",
17+
"__nw": "new",
18+
"__dl": "delete",
19+
"__aor": "|=",
20+
"__or": "|",
21+
"__eq": "==",
22+
"__ne": "!=",
23+
"__vc": "<<",
24+
"__mm": "--",
25+
"__pp": "++",
26+
"__rf": "*",
27+
"__cl": "()",
28+
}
29+
30+
31+
SPECIAL_IGNORE = {
32+
# Things containing a "T#", don't know what that means
33+
"setevenodd__FUlPUlUlUlP5BLITST1": True,
34+
"YUV_blit__FPvUlUlUlT0UlUlUlUlUlUlUlT0P5BLITS": True,
35+
"YUV_blit_mask__FPvUlUlUlPUcUlT0UlUlUlUlUlUlUlT0P5BLITS": True,
36+
37+
# Can't disambiguate with normal mangling
38+
"__end__catch": True,
39+
}
40+
41+
42+
def demangle(mangled_name: str, resolve_ud) -> Tuple[str, List[GType]]:
43+
# Cut off name
44+
index = mangled_name.find("__", 1) # 1 instead of 0 to skip __ in names like __ct
45+
if index == -1:
46+
# Not a mangled function
47+
return None
48+
# Don't know how to demangle some things
49+
if mangled_name in SPECIAL_IGNORE:
50+
return None
51+
name = mangled_name[:index] # Name part only
52+
without_name = mangled_name[index+2:] # Cut off name
53+
54+
# Cut off namespacing bits
55+
namespaces = []
56+
while len(without_name) > 0 and (without_name[0] == "Q" or str.isdigit(without_name[0])):
57+
if without_name[0] == "Q":
58+
qualification_count = int(without_name[1])
59+
without_name = without_name[2:]
60+
for i in range(qualification_count):
61+
(namespace_len_text, rest) = re.match(r"^(\d+)(.*)", without_name).groups()
62+
namespace_len = int(namespace_len_text)
63+
namespaces.append(rest[:namespace_len])
64+
without_name = rest[namespace_len:]
65+
else:
66+
(len_str, rest) = re.match(r"^(\d+)(.*)", without_name).groups()
67+
namespace_len = int(len_str)
68+
namespaces.append(rest[:namespace_len])
69+
without_name = rest[namespace_len:]
70+
this_type = resolve_ud(namespaces[-1]) if namespaces else None
71+
72+
# Namespaced global variable, not a function
73+
if len(without_name) == 0:
74+
return None
75+
76+
# Handle special names
77+
if name.startswith("__"):
78+
if name in SPECIAL_NAME_TO_OPERATOR:
79+
name = f"operator{SPECIAL_NAME_TO_OPERATOR[name]}"
80+
elif name == "__ct":
81+
name = namespaces[-1]
82+
elif name == "__dt":
83+
name = f"~{namespaces[-1]}"
84+
85+
# Add namespaces to name
86+
name = "::".join(namespaces + [name])
87+
88+
# C -> Const method.
89+
is_const = without_name[0] == "C"
90+
if is_const:
91+
without_name = without_name[1:]
92+
93+
# F -> function, no F -> method.
94+
is_member = without_name[0] != "F"
95+
whole_text = without_name if is_member else without_name[1:]
96+
97+
# Easier to handle this here
98+
if whole_text == "v":
99+
return (name, [])
100+
101+
"""
102+
Ann_ Array
103+
P pointer
104+
C constant
105+
Qn qualified name, n parts
106+
107+
b bool
108+
c char
109+
s short
110+
i int
111+
l long
112+
x long long
113+
f float
114+
d double
115+
e vararg
116+
nn <name> struct
117+
"""
118+
def parse_type(text: str) -> Tuple[GType, str]:
119+
if text.startswith("A"):
120+
(dim, rest) = re.match(r"^A([0-9]+)_(.*)", text).groups()
121+
(type, rest) = parse_type(rest)
122+
array_type = GArrayType()
123+
count = DwarfSubscriptDataItem()
124+
count.highBound.isConstant = True
125+
count.highBound.constant = int(dim) + 1
126+
element_type = DwarfSubscriptDataItem()
127+
element_type.type = type
128+
array_type.subscripts = [count, element_type]
129+
return (array_type, rest)
130+
elif text.startswith("F"):
131+
text = text[1:]
132+
if text.startswith("v"):
133+
text = text[1:]
134+
else:
135+
while text and not text.startswith("_"):
136+
(param_type, text) = parse_type(text)
137+
assert text.startswith("_"), f"Expect _ after function type in {mangled_name}"
138+
# TODO: Actually handle function type
139+
return (GPointerType(GFundType(DW_FT.void)), text[1:])
140+
elif text.startswith("Q"):
141+
qualification_count = int(text[1])
142+
text = text[2:]
143+
parts = []
144+
for i in range(qualification_count):
145+
(namespace_len_text, rest) = re.match(r"^(\d+)(.*)", text).groups()
146+
namespace_len = int(namespace_len_text)
147+
parts.append(rest[:namespace_len])
148+
text = rest[namespace_len:]
149+
return (resolve_ud(parts[-1]), text)
150+
elif text.startswith("Pv"):
151+
# Pointer to void is special
152+
return (GPointerType(GFundType(DW_FT.void)), text[2:])
153+
elif text.startswith("PCv"):
154+
return (GPointerType(GFundType(DW_FT.void)), text[3:])
155+
elif text.startswith("P") or text.startswith("R"):
156+
(type, rest) = parse_type(text[1:])
157+
pointer_type = GPointerType(type)
158+
return (pointer_type, rest)
159+
elif text.startswith("C"):
160+
# Constness ignored here
161+
return parse_type(text[1:])
162+
elif text.startswith("b"):
163+
return (GFundType(DW_FT.bool), text[1:])
164+
elif text.startswith("c"):
165+
return (GFundType(DW_FT.S8), text[1:])
166+
elif text.startswith("s"):
167+
return (GFundType(DW_FT.S16), text[1:])
168+
elif text.startswith("i"):
169+
return (GFundType(DW_FT.S32), text[1:])
170+
elif text.startswith("l"):
171+
return (GFundType(DW_FT.SLong), text[1:])
172+
elif text.startswith("x"):
173+
return (GFundType(DW_FT.S64), text[1:])
174+
elif text.startswith("f"):
175+
return (GFundType(DW_FT.F32), text[1:])
176+
elif text.startswith("d"):
177+
return (GFundType(DW_FT.F64), text[1:])
178+
elif text.startswith("Uc"):
179+
return (GFundType(DW_FT.U8), text[2:])
180+
elif text.startswith("Us"):
181+
return (GFundType(DW_FT.U16), text[2:])
182+
elif text.startswith("Ui"):
183+
return (GFundType(DW_FT.U32), text[2:])
184+
elif text.startswith("Ul"):
185+
return (GFundType(DW_FT.ULong), text[2:])
186+
else:
187+
# Handle struct
188+
if match := re.match(r"^(\d+)(.*)", text):
189+
(ident_len_text, rest) = match.groups()
190+
ident_len = int(ident_len_text)
191+
ident = rest[:ident_len]
192+
rest = rest[ident_len:]
193+
return (resolve_ud(ident), rest)
194+
else:
195+
print("Unexpected mangle:", text, mangled_name)
196+
exit(0)
197+
198+
result = []
199+
if this_type:
200+
result.append(GPointerType(this_type))
201+
while whole_text:
202+
# End of empty arg list, or variable args
203+
if whole_text.startswith("v") or whole_text.startswith("e"):
204+
return (name, result)
205+
(type, whole_text) = parse_type(whole_text)
206+
result.append(type)
207+
return (name, result)

0 commit comments

Comments
 (0)