LibAFL-git-aware

This is a fork of AFLplusplus/LibAFL. The objective is simple: keep coverage-guided fuzzing, but also prefer testcases that run code changed or added recently (based on git blame).

What this adds

It adds an opt-in scheduler score that boosts testcases which cover recently changed lines. "Recent" means the last commit time from git blame.

When to use

This is most useful when your target is under active development and you want to find regressions in recently changed code faster (for example: fuzzing a PR, a release branch, or after a refactor).

Another good workflow is for large codebases with big seed corpora: run your normal coverage-guided harness until the corpus is “stable” (diminishing returns), then switch on the git-aware scheduler and restart the fuzzer using the same corpus. That way, scheduling effort shifts toward inputs that hit recently changed/added lines, which is a nice fit for continuous audit / code review and helps avoid spending most cycles on code last touched years ago.

To enable it in an existing harness, the minimal additions look like:

let edges_observer = StdMapObserver::owned("edges", vec![0u8; 65536]).track_indices();

state.add_metadata(GitRecencyMapMetadata::load_from_file("git_recency_map.bin")?);
state.add_metadata(GitRecencyConfigMetadata::new(2.0)); // optional, default is 2.0

let scheduler = GitAwareStdWeightedScheduler::new(&mut state, &edges_observer);

Best practices:

Commit the changes you care about before building (recency comes from git blame).
Rebuild the target and regenerate the mapping whenever HEAD changes.
For continuous audit/code review, keep a long-running baseline fuzzer, and spin up a git-aware run on each new commit/PR (same harness + same corpus, new build + new mapping).

How it works

At build time, libafl_cc creates a mapping from SanitizerCoverage trace-pc-guard map indexes to git blame timestamps. At runtime, the fuzzer loads that mapping and the scheduler uses it to bias corpus selection.

The mapping file format is: u64 head_time + u64 len + len * u64 entries (all little-endian).

Implementation details (how it is built)

Build step 1: record “index -> source location” per object file

When you compile with libafl_cc and you set LIBAFL_GIT_RECENCY_MAPPING_PATH, the wrapper loads an LLVM pass while compiling each object file.

That pass records one source location (file:line) per instrumented basic block, in the same order LLVM’s SanitizerCoverage pc-guard pass assigns indices (module function/basic-block iteration order). To avoid blaming sanitizer/instrumentation code, it uses a non-instrumentation instruction’s debug location in the same basic block (currently: the last non-terminator instruction with a valid debug location, skipping known sanitizer/afl helper calls).

It writes a small sidecar file next to the object file:

<object>.libafl_git_recency

This sidecar is a list in pc-guard order for that object: each entry is either file:line or “unknown”.

Build step 2: merge objects at link time, then run `git blame`

At the final link step, the libafl_cc wrapper reads all those sidecar files (for each .o on the link line), merges them in the same order, and turns file:line into a timestamp using git blame --line-porcelain.

We also store the HEAD commit time at build time in the file (head_time). This makes the “age” computation stable for a given build.

If a location cannot be mapped (no debug info, file not in the git repo, blame fails), it is treated as “old” and gets timestamp 0.

To avoid blaming system headers or external code, the link step only blames files that are inside the current git repo root.

Runtime: how the scheduler uses the mapping

At fuzzer startup, you load the mapping file into GitRecencyMapMetadata.

Then GitRecencyTestcaseScore wraps the normal weighted score (CorpusWeightTestcaseScore) and applies a boost. For each testcase, it looks at the list of covered map indexes stored in MapIndexesMetadata (this is why you must call .track_indices() on the map observer).

It computes a cached value:

tc_time = max(entries[idx]) for all covered idx

Then it boosts the testcase weight like this:

final = base * (1 + alpha * decay)

The decay is exponential with a fixed half-life of 14 days:

decay = 2^(-(head_time - tc_time) / half_life)

alpha controls how strong the bias is (default is 2.0).

To keep runtime overhead low, tc_time is computed once per testcase and cached inside GitRecencyTestcaseMetadata.

How do we know the runtime map index matches the .bin entry?

This only works because the mapping uses the same index that the target uses at runtime.

With SanitizerCoverage trace-pc-guard, each instrumented site has a “guard” variable. At startup, __sanitizer_cov_trace_pc_guard_init assigns each guard a unique number (0, 1, 2, …). libafl_targets then uses that number directly as the index into the coverage map.

So the "basic block ID" you see in MapIndexesMetadata is the same number that was written into the guard, and the mapping file stores entries[that_number].

The tricky part is ordering. The runtime assigns those numbers by walking each object’s guard array, and doing that for all objects. This implementation makes the build-time merge match that order by writing the per-object metadata in the same order SanitizerCoverage instruments basic blocks in that object (matching the guard array order), embedding it in a dedicated section, and then reading/concatenating that section from the final linked output.

Instrumented .a archives are supported as long as they were built with the libafl_cc wrappers so that the embedded metadata section is present (archive members pulled in by the linker contribute both __sancov_guards and the metadata section to the final binary). Uninstrumented .a archives are ignored for mapping purposes. Also, this expects plain pc-guard indexing. If you enable instrumentation modes that transform the index (like n-gram or ctx modes), the map index is no longer the raw guard value, and the mapping will not line up.

How to use it

1a) Build your target and generate the mapping (C/C++)

Build your target using a libafl_cc wrapper so the mapping can be created at the final link step.

Example:

cargo build --release -p forkserver_libafl_cc --bin libafl_cc
cargo build --release -p forkserver_libafl_cc --bin libafl_cxx

export CC="$(pwd)/target/release/libafl_cc"
export CXX="$(pwd)/target/release/libafl_cxx"
export LIBAFL_GIT_RECENCY_MAPPING_PATH="$(pwd)/git_recency_map.bin"

make CC="$CC" CXX="$CXX""

Notes: you must use -fsanitize-coverage=trace-pc-guard and have debug info (-g) so coverage sites can be mapped to file:line. Instrumented .a archives are supported for mapping generation, as long as they were built with the libafl_cc wrappers so the embedded metadata is available in the final linked output.

1b) Build your target and generate the mapping for Rust in-process fuzzing

If your fuzz target is a Rust in-process fuzzer binary, you can instrument it with SanitizerCoverage trace-pc-guard plus the git-recency LLVM plugin, then generate the mapping from the resulting binary. This requires nightly (or RUSTC_BOOTSTRAP=1) because -Zllvm-plugins is unstable.

You can use the following snippet. This basically compiles libafl_cc

# Step 1: Build the pass plugin + mapgen tool (in this repo)
LLVM_CONFIG=llvm-config-20 cargo build -p libafl_cc --release

# Step 2: Build your fuzzer binary (in the target repo you want to `git blame`)
plugin="$(find /path/to/LibAFL-git-aware/target/release -name 'git-recency-pass.so' -type f | head -n 1)" # this is to easily localte git-recency-pass.so
rustflags=(
  "-Cdebuginfo=1"
  "-Cpasses=sancov-module libafl-git-recency"
  "-Cllvm-args=--sanitizer-coverage-level=3"
  "-Cllvm-args=--sanitizer-coverage-trace-pc-guard"
  "-Zllvm-plugins=${plugin}"
)
CARGO_ENCODED_RUSTFLAGS="$(IFS=$'\x1f'; echo "${rustflags[*]}")" RUSTC_BOOTSTRAP=1 cargo build --release -p <your-fuzzer-crate> 

# Step 3: Generate the mapping for the produced binary (run from the target repo root)
/path/to/LibAFL-git-aware/target/release/libafl_git_recency_mapgen --out git_recency_map.bin --bin target/release/<your-fuzzer-binary>

Then load git_recency_map.bin in your fuzzer state and use the git-aware scheduler (next section). For a complete working example, see scripts/git_aware_reth_bench.sh.

2) Use the git-aware scheduler

Enable index tracking on your map observer (.track_indices()), load the mapping file into state, then use the git-aware weighted scheduler.

Minimal sketch (exact types vary by fuzzer):

let edges_observer = StdMapObserver::owned("edges", vec![0u8; 65536]).track_indices();
state.add_metadata(GitRecencyMapMetadata::load_from_file("git_recency_map.bin")?);
state.add_metadata(GitRecencyConfigMetadata::new(2.0)); // optional, default is 2.0
let scheduler = GitAwareStdWeightedScheduler::new(&mut state, &edges_observer);

Benchmark

You can run the reth benchmark with:

bash scripts/git_aware_reth_bench.sh --trials 3 --budget 120 --warmup 30

This benchmark creates a temporary reth git checkout under /tmp, adds a small LibAFL in-process fuzzer crate and a freshly committed crashing line marked RECENT_BUG so git blame treats it as recently changed code, builds the target with SanitizerCoverage (trace-pc-guard) plus the git-recency LLVM pass to generate a pcguard_index -> git blame timestamp mapping, then runs paired baseline vs git-aware trials and reports the median time-to-first-crash.

Benchmark 1

Time to find the introduced bug with bash scripts/git_aware_reth_bench.sh --trials 5 --warmup 3600 --budget 3600

trial	baseline_s	git-aware_s	winner
1	107.443	52.267	git-aware
2	86.076	60.382	git-aware
3	31.772	124.470	baseline
4	44.089	13.072	git-aware
5	211.480	27.938	git-aware

Benchmark 2

Time to find the introduced bug with bash scripts/git_aware_reth_bench.sh --trials 10 --budget 6000 --input-corpus /tmp/libafl_gitaware_reth_bench.bynvh6UT/warmup/out/queue/

trial	baseline_s	git-aware_s	winner
1	79.683	94.540	baseline
2	233.198	8.517	git-aware
3	84.579	51.279	git-aware
4	5.628	2.262	git-aware
5	84.615	89.174	baseline
6	534.511	10.955	git-aware
7	226.654	2.116	git-aware
8	865.867	16.582	git-aware
9	78.062	159.150	baseline
10	43.719	34.985	git-aware

License

Same license as upstream LibAFL: MIT or Apache-2.0 (see LICENSE-MIT and LICENSE-APACHE).

Name		Name	Last commit message	Last commit date
Latest commit History 3,599 Commits
.cargo		.cargo
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
bindings/pylibafl		bindings/pylibafl
crates		crates
docs		docs
fuzzers		fuzzers
just		just
scripts		scripts
seeds/pngs		seeds/pngs
utils		utils
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
MIGRATION.md		MIGRATION.md
README.md		README.md
TROPHIES.md		TROPHIES.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LibAFL-git-aware

What this adds

When to use

How it works

Implementation details (how it is built)

Build step 1: record “index -> source location” per object file

Build step 2: merge objects at link time, then run `git blame`

Runtime: how the scheduler uses the mapping

How do we know the runtime map index matches the .bin entry?

How to use it

1a) Build your target and generate the mapping (C/C++)

1b) Build your target and generate the mapping for Rust in-process fuzzing

2) Use the git-aware scheduler

Benchmark

Benchmark 1

Benchmark 2

License

About

Uh oh!

Languages

License

kevin-valerio/LibAFL-git-aware

Folders and files

Latest commit

History

Repository files navigation

LibAFL-git-aware

What this adds

When to use

How it works

Implementation details (how it is built)

Build step 1: record “index -> source location” per object file

Build step 2: merge objects at link time, then run git blame

Runtime: how the scheduler uses the mapping

How do we know the runtime map index matches the .bin entry?

How to use it

1a) Build your target and generate the mapping (C/C++)

1b) Build your target and generate the mapping for Rust in-process fuzzing

2) Use the git-aware scheduler

Benchmark

Benchmark 1

Benchmark 2

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Languages

Build step 2: merge objects at link time, then run `git blame`