Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,22 @@ BIN_PLUGIN = src/r2/bin_r2unity.$(SOEXT)
PLUGINS = $(CORE_PLUGIN) $(BIN_PLUGIN)
CC = gcc

CFLAGS = -Wall -Wextra -g -I. $(shell pkg-config --cflags r_util 2>/dev/null || echo "")
LDFLAGS = $(shell pkg-config --libs r_util 2>/dev/null || echo "")
CFLAGS = -Wall -Wextra -g -I. $(shell pkg-config --cflags r_util r_bin 2>/dev/null || echo "")
LDFLAGS = $(shell pkg-config --libs r_util r_bin 2>/dev/null || echo "")

# r_core plugin flags (full radare2)
CORE_PLUGIN_CFLAGS = -Wall -Wextra -g -fPIC $(shell pkg-config --cflags r_core 2>/dev/null || echo "")
CORE_PLUGIN_LDFLAGS = $(shell pkg-config --libs r_core 2>/dev/null || echo "")
CORE_PLUGIN_CFLAGS = -Wall -Wextra -g -fPIC $(shell pkg-config --cflags r_core r_bin 2>/dev/null || echo "")
CORE_PLUGIN_LDFLAGS = $(shell pkg-config --libs r_core r_bin 2>/dev/null || echo "")

# r_bin plugin flags
BIN_PLUGIN_CFLAGS = -Wall -Wextra -g -fPIC $(shell pkg-config --cflags r_bin 2>/dev/null || echo "")
BIN_PLUGIN_LDFLAGS = $(shell pkg-config --libs r_bin 2>/dev/null || echo "")

R2_USER_PLUGINS = $(shell r2 -H R2_USER_PLUGINS 2>/dev/null)

LIB_SRCS = $(wildcard src/lib/*.c)
LIB_SRCS = $(wildcard src/lib/*.c) $(wildcard src/lib/bin/*.c)
LIB_OBJS = $(LIB_SRCS:.c=.o)
LEGACY_OBJS = src/lib/elf.o src/lib/macho.o src/lib/pe.o src/lib/native.o
CLI_SRCS = src/main.c
CLI_OBJS = $(CLI_SRCS:.c=.o)
OBJS = $(CLI_OBJS) $(LIB_OBJS)
Expand Down Expand Up @@ -81,7 +82,7 @@ $(BIN_PLUGIN): src/r2/bin_r2unity.c $(LIB_SRCS)
$(CC) $(CFLAGS) -c -o $@ $<

clean:
rm -f $(EXEC) $(OBJS) $(PLUGINS) $(CONFIG_H)
rm -f $(EXEC) $(OBJS) $(LEGACY_OBJS) $(PLUGINS) $(CONFIG_H)


.PHONY: all clean plugin install-plugin uninstall-plugin fmt
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ IL2CPP binary, and exposes managed metadata for reverse engineering.
Linux, and flat fixture layouts.
- Recovers managed images, assemblies, types, methods, method flags, and `ldstr`
string literals.
- Finds method-pointer tables heuristically in ELF, Mach-O, and PE binaries.
- Resolves method pointers through r_bin symbols/CodeRegistration, with
r_bin or simple ELF/Mach-O/PE section-scan fallback for stripped binaries.
- Lists P/Invoke and v29+ reverse-P/Invoke metadata, and emits CycloneDX 1.5
SBOMs for managed assemblies.
- Provides both a core r2 command plugin and an `r_bin` plugin for direct
Expand Down Expand Up @@ -67,6 +68,9 @@ The normal inputs are the native IL2CPP binary and the matching
# recover method flags/comments as r2 commands
./r2unity -f /path/to/GameAssembly.dll /path/to/global-metadata.dat > methods.r2

# override a known native registration symbol address
./r2unity -f -O g_CodeRegistration=0x1234 /path/to/GameAssembly.dll /path/to/global-metadata.dat

# list managed strings, interop metadata, or managed-assembly SBOM data
./r2unity -z /path/to/global-metadata.dat
./r2unity -P -j /path/to/GameAssembly.dll /path/to/global-metadata.dat
Expand Down Expand Up @@ -100,8 +104,8 @@ classes, imports, libraries, and header fields.
## Current Limits

- v24.0 metadata, v36/v37 metadata, and WebAssembly are not supported.
- Method-pointer recovery is heuristic; manual `-a` / `-c` pointer reads are
not implemented yet.
- Method-pointer recovery needs CodeRegistration symbols/addresses or the
section-scan fallback; manual `-a` pointer reads are not implemented yet.
- P/Invoke and reverse-P/Invoke output is metadata-first and does not fully
recover native wrapper addresses or every `DllImportAttribute` detail.
- SBOM output covers managed assemblies only, not native dependencies or file
Expand Down
106 changes: 106 additions & 0 deletions doc/datvsbin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# IL2CPP Metadata vs Native Binary

Unity IL2CPP builds split managed-code information across two files:

- `global-metadata.dat`
- the native IL2CPP binary (`GameAssembly.dll`, `GameAssembly.so`,
`GameAssembly.dylib`, `libil2cpp.so`, `UnityFramework`, etc.)

They must be read together to recover named native symbols and runtime
layouts.

## `global-metadata.dat`

`global-metadata.dat` is the managed/logical metadata blob. It is
platform-independent for a given IL2CPP run: it stores byte offsets and
dense table indices, not native pointers.

Typical contents:

- metadata magic, version, and table-of-contents header
- assemblies and images
- type definitions for classes, structs, enums, and interfaces
- method definitions: names, signatures, parameters, return types,
tokens, and method indices
- field definitions
- properties and events
- nested-type, interface, and vtable index tables
- generic containers, generic parameters, and method specs
- custom-attribute and default-value blobs, depending on metadata
version
- identifier strings and managed `ldstr` literal payloads

It does not contain:

- native method bodies
- method RVAs or function addresses
- final field offsets
- final type sizes
- native vtables, invokers, wrappers, or trampolines
- Unity asset, scene, prefab, or AssetBundle data

In short, the `.dat` file says what managed things exist and how they
are named, typed, and indexed.

## Native IL2CPP binary

The native binary is the physical/runtime side of the same build. It
contains the compiled machine code and the registration structures that
tie the generated code back to metadata indices.

Important native-side data:

- compiled method bodies
- `Il2CppCodeRegistration`
- `Il2CppMetadataRegistration`
- `methodPointers`
- `genericMethodPointers`
- `reversePInvokeWrappers`
- `invokerPointers`
- `codeGenModules`
- `fieldOffsets`
- `typeDefinitionsSizes`
- native type and generic-instantiation tables
- runtime helper code, wrappers, trampolines, and runtime strings

These tables provide the addresses and runtime layouts that
`global-metadata.dat` intentionally does not carry.

## Mapping model

The two files are correlated by indices:

```text
global-metadata.dat native IL2CPP binary
------------------- --------------------
MethodDefinition.methodIndex --> CodeRegistration / CodeGenModule methodPointers[index]
TypeDefinition index --> MetadataRegistration.fieldOffsets[type]
Type/generic indices --> MetadataRegistration.types and generic tables
```

For r2unity this means:

- parsing only `global-metadata.dat` can recover names, signatures,
tokens, string literals, and table relationships
- resolving those names to native addresses requires locating
`Il2CppCodeRegistration` in the binary; r2unity accepts an explicit
address, an r2 flag/r_bin symbol, or the r_bin/simple-parser
section-scan fallback
- recovering field offsets and final type sizes requires
`Il2CppMetadataRegistration`
- a complete symbol map needs both files from the same build

Today r2unity takes managed structure from `.dat` and native addresses
from the binary:

- `.dat`: image/type/method rows, method indices, names, tokens,
strings, assemblies, P/Invoke metadata, and reverse-P/Invoke
attribute metadata
- binary: executable address ranges, native symbols/flags for
`g_CodeRegistration` and `g_MetadataRegistration`, and method pointer
tables reached from CodeRegistration or the r_bin/simple-parser
fallback scan

The current method-address path only needs `g_CodeRegistration`.
`g_MetadataRegistration` is tracked as the companion anchor for native
layout work such as field offsets and type sizes.
23 changes: 12 additions & 11 deletions doc/future.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,22 +535,23 @@ are always zero; a defensive parser bounds-checks and skips.

## 3. Native binary — pointer arrays we don't follow

`src/lib/elf.c` and `src/lib/macho.c` currently locate exactly one
array: `CodeRegistration.methodPointers` (or its v24.2+ per-image
equivalent, if the heuristic happens to land on the right
`Il2CppCodeGenModule`). The registration structures actually expose
many more pointer arrays, each with its own metadata table partner.
Each entry below is a `{ ulong count; ulong ptr; }` pair inside the
registration (see §3 of `doc/r2unity.md` for the full struct).
`src/lib/bin/native.c` currently recovers method pointers from
`Il2CppCodeRegistration` when the registration anchor is available,
and falls back to a generic section scan backed by r_bin or the simple
ELF/Mach-O/PE parsers for stripped binaries.
The registration structures expose many more pointer arrays, each
with its own metadata table partner. Each entry below is a
`{ ulong count; ulong ptr; }` pair inside the registration (see §3 of
`doc/r2unity.md` for the full struct).

### 3.1 `methodPointers` (≥v24.1 global; v≥24.2 per-module)

What r2unity extracts today. From v24.2 onwards this field is on
`Il2CppCodeGenModule`, **not** `CodeRegistration`, and there is one
module per image. If r2unity finds a single `{count, ptr}` on a
v24.2+ binary it is only extracting **one image's** methods.
Walking `codeGenModules[]` and enumerating each module is mandatory
for full coverage.
module per image. The structural path walks `codeGenModules[]` and
maps modules back to `.dat` image rows; the fallback section scan may
still find only one image's table because it does not know the
registration structure.

### 3.2 `invokerPointers`

Expand Down
82 changes: 39 additions & 43 deletions doc/r2unity.md
Original file line number Diff line number Diff line change
Expand Up @@ -1010,13 +1010,13 @@ File-by-file map:
type/method/image/assembly/referenced-assembly decoders, P/Invoke
and reverse-P/Invoke enumerators, endian-safe LE readers
(`RD_LE32`, `RD_LE16`).
- `src/lib/elf.c` — ELF32/64 loader, dynamic-table walk, relative
relocation application (`DT_REL`, `DT_RELA`, `DT_RELR`),
method-pointer-array heuristic.
- `src/lib/macho.c` — Mach-O 64 loader (thin + FAT first-ARM64),
`LC_SEGMENT_64` walk, method-pointer-array heuristic.
- `src/lib/pe.c` — PE32/PE32+ loader, section walk, method-pointer-
array heuristic.
- `src/lib/bin/native.c` — shared native-binary view,
CodeRegistration/MetadataRegistration anchor resolution, structural
CodeRegistration parsing, RBin adapter, and the generic section-scan
fallback.
- `src/lib/bin/elf.c`, `src/lib/bin/macho.c`, `src/lib/bin/pe.c` —
simple file-backed format parsers used when the RBin path cannot
recover method pointers.
- `src/main.c` — CLI entry point and output emitters.

Every row decoder reads via `r_read_le32`/`r_read_le16` (LE on all
Expand All @@ -1027,45 +1027,41 @@ retained for the two string pools.
## 6. Native-binary scanning, in one picture

```text
ELF/Mach-O/PE image on disk
Native IL2CPP image opened by r_bin or a simple ELF/Mach-O/PE mapper
├─ load & parse segments/sections
├─ use r_bin sections/symbols/relocs when available
│ or simple file-backed sections for fallback
│ ↓
segments { vaddr/vmaddr, size, perms, file mapping }
sections { vaddr, size, perms }
│ ↓
│ [text_lo, text_hi) (executable union)
├─ ELF only: apply DT_REL / DT_RELA / DT_RELR relative fixups
│ so data-segment pointer arrays match the runtime
│ state (addends resolved, RELR bitmap expanded).
├─ resolve g_CodeRegistration / g_MetadataRegistration
│ order: CLI -O / r2 eval vars / r2 flags / r_bin symbols
├─ scan each writable/data segment:
pass 1: {count32, pad32, ptr} tuple
(CodeRegistration-shaped anchor pair)
pass 2: {count32, ptr} generic
├─ parse Il2CppCodeRegistration:
v24.2+: match codeGenModules[] to metadata images and
copy each module's methodPointers[]
older: recover the global methodPointers[] pair
└─ accept if a sample of entries at *ptr[] lands in text,
either already (post-relocations) or after + base_vaddr
(raw RVA case). Emit absolute VAs, one per method index.
└─ fallback when forced or unresolved:
scan non-executable data/readable sections for {count, ptr}
pairs whose table entries land in executable code.
Emit absolute VAs, one per method index.
```

The heuristic is deliberately weaker than a structural
`Il2CppCodeRegistration` match, but it works on every supported
target and doesn't need symbol tables. It does, however, lock onto
**one** `{count, ptr}` array, which on v24.2+ means one image's
methods, not all of them (§3.1 / §3.7). A proper structural match
that walks `codeGenModules[]` is on the roadmap.
The structural path is preferred because it follows Unity's native
registration structures instead of guessing which `{count, ptr}` pair
is the method-pointer table. The fallback remains useful for stripped
binaries or builds where the registration symbols cannot be resolved.
The simple ELF/Mach-O/PE parsers do not reimplement full symbol-table
parsing; they use explicit registration addresses when provided and
otherwise feed their sections into the fallback scanner.

For ELF the relocation pass matters because the Android linker
produces method-pointer arrays almost entirely as
`R_AARCH64_RELATIVE` (type 1027) entries. Without applying them,
the raw array on disk is a run of zeros. Packed Android relocations
(`DT_ANDROID_RELA`, `DT_ANDROID_RELR`) are not handled yet and
cause the same "empty array" symptom on Play Store builds.

For Mach-O and PE the linker has already materialised concrete
values; no explicit relocation pass is required for the tables
r2unity currently scans.
Relocation handling is delegated to r_bin (`r_bin_patch_relocs`) on
the RBin path. The simple ELF parser also applies the common relative
REL/RELA/RELR forms so stripped Android/Linux inputs still have a
lightweight fallback.

## 7. Data we can extract today vs. data we do not

Expand All @@ -1083,7 +1079,7 @@ Already extracted by r2unity (library + CLI):
| referenced assemblies | flat int32 array |
| P/Invoke marker methods | `-P` enumeration |
| reverse-P/Invoke on v29+ | `-R` enumeration via BLOB |
| method-pointer VA (global) | ELF/Mach-O/PE heuristic |
| method-pointer VA | CodeRegistration parse + r_bin/simple-parser section-scan fallback |

Data present on disk / in the binary but **not yet consumed**:

Expand All @@ -1109,14 +1105,14 @@ Data present on disk / in the binary but **not yet consumed**:
metadata load in compiled code (§2.12).
- `fieldMarshaledSizes`, `unresolvedVirtualCall*`, WinRT tables,
`exportedTypeDefinitions`, RGCTX tables (§2.14–2.18).
- Native-side `CodeRegistration` and `MetadataRegistration` walk
- Native-side registration data beyond method pointers
`invokerPointers`, `customAttributeGenerators`,
`reversePInvokeWrappers`, `genericMethodPointers`,
`interopData`, `codeGenModules[]`, `types`, `fieldOffsets`,
`typeDefinitionsSizes`, `metadataUsages` (§3).
- Richer native scanning: per-module `methodPointers` on v24.2+,
packed Android relocations, Mach-O FAT multi-slice,
chained-fixups, PE import table.
`interopData`, `types`, `fieldOffsets`, `typeDefinitionsSizes`,
`metadataUsages` (§3).
- Richer native support: packed Android relocations and other loader
details not yet handled by r_bin for a given target, Mach-O FAT
multi-slice selection, chained-fixups, PE import table.

## 8. validation corpus

Expand Down
15 changes: 8 additions & 7 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,28 @@ add_project_arguments(
)

r_util_dep = dependency('r_util')
r_bin_dep = dependency('r_bin')

conf = configuration_data()
conf.set_quoted('R2UNITY_VERSION', meson.project_version())
configure_file(output: 'r2unity_config.h', configuration: conf)

lib_inc = include_directories('src/lib')
lib_sources = files(
'src/lib/elf.c',
'src/lib/lib.c',
'src/lib/macho.c',
'src/lib/bin/elf.c',
'src/lib/bin/macho.c',
'src/lib/bin/native.c',
'src/lib/bin/pe.c',
'src/lib/paths.c',
'src/lib/pe.c',
'src/lib/sbom.c',
)

r2unity = executable(
'r2unity',
files('src/main.c') + lib_sources,
include_directories: lib_inc,
dependencies: r_util_dep,
dependencies: [r_util_dep, r_bin_dep],
install: true,
)

Expand All @@ -46,7 +48,6 @@ test('r2unity-version', r2unity, args: ['-v'])

plugins_opt = get_option('plugins')
r_core_dep = dependency('r_core', required: plugins_opt)
r_bin_dep = dependency('r_bin', required: plugins_opt)
build_plugins = plugins_opt.enabled() or (
plugins_opt.auto() and r_core_dep.found() and r_bin_dep.found()
)
Expand All @@ -72,7 +73,7 @@ if build_plugins
'core_r2unity',
files('src/r2/core_r2unity.c') + lib_sources,
include_directories: lib_inc,
dependencies: r_core_dep,
dependencies: [r_core_dep, r_bin_dep],
name_prefix: '',
install: true,
install_dir: r2_plugindir,
Expand All @@ -82,7 +83,7 @@ if build_plugins
'bin_r2unity',
files('src/r2/bin_r2unity.c') + lib_sources,
include_directories: lib_inc,
dependencies: r_bin_dep,
dependencies: [r_bin_dep],
name_prefix: '',
install: true,
install_dir: r2_plugindir,
Expand Down
Loading