diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 3489d6881fb..68edb124012 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -59,13 +59,15 @@ jobs: run: ./scripts/check_for_blobs.sh - name: Build libafl debug run: cargo build -p libafl + - name: Build tutorial fuzzer + run: (cd fuzzers/baby/tutorial && cargo build) - name: Test the book (Linux) # TODO: fix books test fail with updated windows-rs if: runner.os == 'Linux' - run: cd docs && mdbook test -L ../target/debug/deps + run: cd docs && mdbook test -L ../fuzzers/baby/tutorial/debug/deps - name: Test the book (MacOS) if: runner.os == 'MacOS' - run: cd docs && mdbook test -L ../target/debug/deps $(python3-config --ldflags | cut -d ' ' -f1) + run: cd docs && mdbook test -L ../fuzzers/baby/tutorial/debug/deps $(python3-config --ldflags | cut -d ' ' -f1) - name: Build individual libafl book examples (linux) if: runner.os == 'Linux' run: cd docs/listings/baby_fuzzer/ && just build-all diff --git a/docs/README.md b/docs/README.md index 393df1c64d1..64457b17078 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,7 @@ This project contains the out-of-source LibAFL documentation as a book. -Here you can find tutorials, examples, and detailed explanations. +Here you can find the [LibAFL tutorial](./src/tutorial/tutorial.md), examples, and detailed explanations. For the API documentation instead, run `cargo doc` in the LibAFl root folder. diff --git a/docs/src/getting_started/crates.md b/docs/src/getting_started/crates.md index 94ad4d52868..ea2dfe93bb5 100644 --- a/docs/src/getting_started/crates.md +++ b/docs/src/getting_started/crates.md @@ -76,7 +76,7 @@ Currently, the supported flags are: This is a library that provides utils to wrap compilers and create source-level fuzzers. At the moment, only the Clang compiler is supported. -To understand it deeper, look through the tutorials and examples. +To understand it deeper, look through the [tutorial](../tutorial/tutorial.md) and examples. ### [`libafl_frida`](https://github.com/AFLplusplus/LibAFL/tree/main/crates/libafl_frida) diff --git a/docs/src/tutorial/intro.md b/docs/src/tutorial/intro.md index 7b6cd323cb6..f30fa9fbdcf 100644 --- a/docs/src/tutorial/intro.md +++ b/docs/src/tutorial/intro.md @@ -1,8 +1,6 @@ # Introduction -> ## Under Construction! -> -> This section is under construction. -> Please check back later (or open a PR) -> -> In the meantime, find the final Lain-based fuzzer in [the fuzzers folder](https://github.com/AFLplusplus/LibAFL/tree/main/fuzzers/baby/tutorial) +This chapter will walk you through the creation of a fuzzer, step by step. +If you are new to LibAFL, this is the right place to start. + +[Let's start with the first tutorial!](./tutorial.md) diff --git a/docs/src/tutorial/tutorial.md b/docs/src/tutorial/tutorial.md index dd34e410894..5162962b16e 100644 --- a/docs/src/tutorial/tutorial.md +++ b/docs/src/tutorial/tutorial.md @@ -1,5 +1,594 @@ -# Tutorial +# `LibAFL` Custom Input Tutorial: Your Own Structure-Aware Fuzzer -In this chapter, we will build a custom fuzzer using the [Lain](https://github.com/microsoft/lain) mutator in Rust. +Welcome to the `LibAFL` custom inputs tutorial! In this guide, we'll walk you through building a structure-aware fuzzer for a simple C program using a third-party grammar mutator. -This tutorial will introduce you to writing extensions to LibAFL like Feedbacks and Testcase's metadata. +We'll be using the `lain` crate for structure-aware mutations and hook it up to `LibAFL`. This could be done with any sort of mutator and input types, even if they are written in other programming languages. + +## The Target + +Our target is a simple C program that processes packets. The code can be found in [`fuzzers/baby/tutorial/target.c`](https://github.com/AFLplusplus/LibAFL/blob/main/fuzzers/baby/tutorial/target.c): + +```c +#include +#include +#include +#include +#include + +#define MAX_PACKET_SIZE 0x1000 + +typedef enum _packet_type { + data_read = 0x0, + data_write = 0x1, + data_reset = 0x2, +} packet_type; + +#pragma pack(1) +typedef struct _packet_data { + packet_type type; + uint64_t offset; + uint64_t length; + char data[0]; +} packet_data; + +int LLVMFuzzerTestOneInput(const uint8_t *packet_buffer, size_t packet_length) { + ssize_t saved_data_length = 0; + char *saved_data = NULL; + int err = 0; + packet_data *datagram = NULL; + + if (packet_length < sizeof(packet_data) || packet_length > MAX_PACKET_SIZE) { + return 1; + } + + datagram = (packet_data *)packet_buffer; + + switch (datagram->type) { + case data_read: + if (saved_data != NULL && + datagram->offset + datagram->length <= saved_data_length) { + write(0, packet_buffer + datagram->offset, datagram->length); + } + break; + + case data_write: + // NOTE: Who cares about checking the offset? Nobody would ever provide + // bad data + if (saved_data != NULL && datagram->length <= saved_data_length) { + memcpy(saved_data + datagram->offset, datagram->data, datagram->length); + } + break; + + case data_reset: + if (datagram->length > packet_length - sizeof(*datagram)) { return 1; } + + if (saved_data != NULL) { free(saved_data); } + + saved_data = malloc(datagram->length); + saved_data_length = datagram->length; + + memcpy(saved_data, datagram->data, datagram->length); + break; + + default: + return 1; + } + + return 0; +} +``` + +The target defines a `LLVMFuzzerTestOneInput` function, which is the standard entry point for libFuzzer-style harnesses. It processes a `packet_data` structure. There's a vulnerability in the `data_write` case: it doesn't check the `offset`, which can lead to a heap buffer overflow. + +## The Input + +To fuzz this target effectively, we need to define the input structure in Rust. This is done in [`fuzzers/baby/tutorial/src/input.rs`](https://github.com/AFLplusplus/LibAFL/blob/main/fuzzers/baby/tutorial/src/input.rs): + +```rust +# extern crate lain; +# extern crate libafl; +# extern crate libafl_bolts; +# extern crate serde; +use std::hash::Hash; + +use lain::prelude::*; +use libafl::inputs::{HasTargetBytes, Input}; +use libafl_bolts::{ownedref::OwnedSlice, HasLen}; +use serde::{Deserialize, Serialize}; + +#[derive( + Serialize, + Deserialize, + Debug, + Default, + Clone, + NewFuzzed, + Mutatable, + VariableSizeObject, + BinarySerialize, +)] +pub struct PacketData { + pub typ: UnsafeEnum, + + pub offset: u64, + pub length: u64, + + #[lain(max = 10)] + pub data: Vec, +} + +impl Fixup for PacketData { + fn fixup(&mut self, _mutator: &mut Mutator) { + self.length = self.data.len() as u64; + } +} + +#[derive( + Serialize, Deserialize, Debug, Copy, Clone, FuzzerObject, ToPrimitiveU32, BinarySerialize, Hash, +)] +#[repr(u32)] +#[derive(Default)] +pub enum PacketType { + #[default] + Read = 0x0, + Write = 0x1, + Reset = 0x2, +} + +impl Input for PacketData {} + +impl HasTargetBytes for PacketData { + #[inline] + fn target_bytes(&self) -> OwnedSlice<'_, u8> { + let mut serialized_data = Vec::with_capacity(self.serialized_size()); + self.binary_serialize::<_, LittleEndian>(&mut serialized_data); + OwnedSlice::from(serialized_data) + } +} + +impl HasLen for PacketData { + fn len(&self) -> usize { + self.serialized_size() + } +} + +impl Hash for PacketData { + fn hash(&self, state: &mut H) { + match self.typ { + UnsafeEnum::Invalid(a) => a.hash(state), + UnsafeEnum::Valid(a) => a.hash(state), + } + self.offset.hash(state); + self.length.hash(state); + self.data.hash(state); + } +} +``` + +We use the `lain` crate to derive `NewFuzzed`, `Mutatable`, and other traits. This allows `lain` to generate and mutate `PacketData` structs automatically. The `fixup` function is used to keep the `length` field consistent with the actual length of the `data` vector. + +The `HasTargetBytes` trait is implemented to serialize the `PacketData` struct into a byte slice that can be passed to the C target. + +## The Fuzzer + +Now let's look at the fuzzer itself in [`fuzzers/baby/tutorial/src/lib.rs`](https://github.com/AFLplusplus/LibAFL/blob/main/fuzzers/baby/tutorial/src/lib.rs). + +### The `libafl_main` function + +The `libafl_main` function is the entry point of our fuzzer. + +```rust +# extern crate libafl; +# use std::path::PathBuf; +# use libafl::Error; +# fn fuzz(_corpus_dirs: &[PathBuf], _objective_dir: PathBuf, _broker_port: u16) -> Result<(), Error> { Ok(()) } +#[no_mangle] +pub extern "C" fn libafl_main() { + // ... + fuzz( + &[PathBuf::from("./corpus")], + PathBuf::from("./crashes"), + 1337, + ) + .expect("An error occurred while fuzzing"); +} +``` + +It calls the `fuzz` function with the corpus and objective directories, and a broker port for multi-threaded fuzzing. + +### The `fuzz` function + +The `fuzz` function contains the main fuzzer logic. + +#### The Harness + +```rust +# extern crate libafl; +# extern crate libafl_bolts; +# extern crate serde; +# +# use libafl::executors::ExitKind; +# use libafl::inputs::HasTargetBytes; +# use libafl_bolts::ownedref::OwnedSlice; +# use libafl_bolts::AsSlice; +# +# // Dummy PacketData +# #[derive(Debug)] +# pub struct PacketData {} +# impl HasTargetBytes for PacketData { +# fn target_bytes(&self) -> OwnedSlice { +# OwnedSlice::from(vec![]) +# } +# } +# +# // Dummy libfuzzer_test_one_input +# unsafe fn libfuzzer_test_one_input(_buf: &[u8]) {} +# +let mut harness = |input: &PacketData| { + let target = input.target_bytes(); + let buf = target.as_slice(); + // # Safety + // We're looking for crashes in there! + unsafe { + libfuzzer_test_one_input(buf); + } + ExitKind::Ok +}; +``` + +The harness is a closure that takes a `PacketData` input, serializes it, and passes it to the `libfuzzer_test_one_input` function in our C target. + +#### Observers, Feedbacks, and Scheduler + +```rust +# use libafl::{ +# feedback_or, feedback_or_fast, +# feedbacks::{CrashFeedback, MaxMapFeedback, TimeFeedback, TimeoutFeedback, Feedback}, +# observers::{HitcountsMapObserver, TimeObserver, Observer, CanTrack}, +# stages::calibrate::CalibrationStage, +# Error, +# }; +# use libafl_bolts::{Named, MaybeOwned}; +# use libafl_targets::std_edges_map_observer; +# use std::borrow::Cow; +# +# #[derive(Default, Debug)] +# struct PacketLenFeedback; +# impl PacketLenFeedback { fn new() -> Self { Self {} } } +# impl Feedback for PacketLenFeedback { +# fn is_interesting(&mut self, _state: &mut S, _manager: &mut EM, _input: &I, _observers: &OT, _exit_kind: &libafl::executors::ExitKind) -> Result { Ok(false) } +# } +# impl Named for PacketLenFeedback { +# fn name(&self) -> &Cow<'static, str> { +# static NAME: Cow<'static, str> = Cow::Borrowed("PacketLenFeedback"); +# &NAME +# } +# } +# +# fn dummy_func() { +// Create an observation channel using the coverage map +let edges_observer = + HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices(); + +// Create an observation channel to keep track of the execution time +let time_observer = TimeObserver::new("time"); + +let map_feedback = MaxMapFeedback::new(&edges_observer); + +let calibration = CalibrationStage::new(&map_feedback); + +// Feedback to rate the interestingness of an input +let mut feedback = feedback_or!( + map_feedback, + TimeFeedback::new(&time_observer), + PacketLenFeedback::new() +); + +// A feedback to choose if an input is a solution or not +let mut objective = feedback_or_fast!(CrashFeedback::new(), TimeoutFeedback::new()); +# } +``` + +We use a `HitcountsMapObserver` to get code coverage, and a `TimeObserver` to measure execution time. These are used by `MaxMapFeedback` and `TimeFeedback` respectively. We also have a custom `PacketLenFeedback` which we'll look at later. + +`CrashFeedback` and `TimeoutFeedback` are used to identify crashing or timing-out inputs. + +```rust +# extern crate libafl; +# extern crate libafl_bolts; +# extern crate libafl_targets; +# extern crate serde; +# fn dummy() -> Result<(), libafl::Error> { +# use libafl::{ +# observers::{HitcountsMapObserver, CanTrack}, +# schedulers::{powersched::PowerSchedule, PowerQueueScheduler}, +# state::StdState, +# corpus::{InMemoryCorpus, OnDiskCorpus}, +# feedbacks::MaxMapFeedback, +# Error, +# }; +# use libafl_bolts::rands::StdRand; +# use libafl_targets::std_edges_map_observer; +# +# struct PacketLenMinimizerScheduler; +# impl PacketLenMinimizerScheduler { +# fn new(_observer: &O, _scheduler: S) -> Self { Self } +# } +# +# let edges_observer = HitcountsMapObserver::new(unsafe { std_edges_map_observer("edges") }).track_indices(); +# let map_feedback = MaxMapFeedback::new(&edges_observer); +# let mut feedback = (map_feedback,); +# let mut objective = (); +# let mut state = StdState::new( +# StdRand::new(), +# InMemoryCorpus::new(), +# OnDiskCorpus::new("./crashes")?, +# &mut feedback, +# &mut objective, +# )?; +# +// A minimization+queue policy to get testcasess from the corpus +let scheduler = PacketLenMinimizerScheduler::new( + &edges_observer, + PowerQueueScheduler::new(&mut state, &edges_observer, PowerSchedule::fast()), +); +# Ok(()) +# } +``` + +We use a `PowerQueueScheduler` to select the next input to fuzz. We wrap it in a custom `PacketLenMinimizerScheduler` to prioritize shorter inputs. + +#### The Mutator + +```rust +# fn dummy() { +# use libafl::{ +# stages::{power::StdPowerMutationalStage, Stage}, +# mutators::{MutationResult, Mutator}, +# state::HasRand, +# Error, +# }; +# use libafl_bolts::{tuples::tuple_list, rands::StdRand}; +# use lain::prelude::{Mutatable, NewFuzzed, FuzzerObject, ToPrimitiveU32, BinarySerialize, VariableSizeObject, UnsafeEnum, Fixup, Rng}; +# use serde::{Deserialize, Serialize}; +# use std::vec::Vec; +# +# #[derive(Serialize, Deserialize, Debug, Default, Clone, NewFuzzed, Mutatable, VariableSizeObject, BinarySerialize)] +# pub struct PacketData { +# pub typ: UnsafeEnum, +# pub offset: u64, +# pub length: u64, +# #[lain(max = 10)] +# pub data: Vec, +# } +# impl Fixup for PacketData { +# fn fixup(&mut self, _mutator: &mut lain::mutator::Mutator) { self.length = self.data.len() as u64; } +# } +# #[derive(Serialize, Deserialize, Debug, Copy, Clone, FuzzerObject, ToPrimitiveU32, BinarySerialize, std::hash::Hash)] +# #[repr(u32)] +# #[derive(Default)] +# pub enum PacketType { #[default] Read = 0x0, Write = 0x1, Reset = 0x2, } +# +# pub struct LainMutator; +# impl LainMutator { fn new() -> Self { Self } } +# impl Mutator for LainMutator where I: Mutatable, S: HasRand { +# fn mutate(&mut self, _state: &mut S, _input: &mut I) -> Result { Ok(MutationResult::Mutated) } +# } +# +# struct DummyStage; +# impl Stage for DummyStage { +# fn perform( +# &mut self, +# _fuzzer: &mut dyn libafl::fuzzer::HasCorpus, +# _executor: &mut dyn libafl::executors::HasObservers, +# _state: &mut S, +# _manager: &mut dyn libafl::events::EventManager, +# ) -> Result<(), Error> { +# Ok(()) +# } +# } +# let calibration = DummyStage; +# +// Setup a lain mutator with a mutational stage +let mutator = LainMutator::new(); + +let power: StdPowerMutationalStage<_, _, PacketData, _, _, _> = + StdPowerMutationalStage::new(mutator); + +let mut stages = tuple_list!(calibration, power); +# } +``` + +We use a custom `LainMutator` which is a wrapper around `lain`'s mutator. This is where the structure-aware magic happens. The `LainMutator` knows how to mutate the `PacketData` struct in a meaningful way. + +### Custom Components + +#### `PacketLenFeedback` and `PacketLenMinimizerScheduler` + +These are defined in `fuzzers/baby/tutorial/src/metadata.rs`. + +```rust +# extern crate libafl_bolts; +# extern crate serde; +# use serde::{Deserialize, Serialize}; +# use libafl_bolts::SerdeAny; +# use std::fmt::Debug; +# +# trait Feedback { +# fn is_interesting(&mut self, state: &mut S, manager: &mut EM, input: &I, observers: &OT, exit_kind: &ExitKind) -> Result; +# fn append_metadata(&mut self, state: &mut S, manager: &mut EM, observers: &OT, testcase: &mut Testcase) -> Result<(), Error>; +# } +# #[derive(Debug)] +# struct PacketData { length: u64 } +# #[derive(Debug)] +# struct PacketLenFeedback { len: u64 } +# #[derive(Debug)] +# struct ExitKind; +# #[derive(Debug)] +# struct Error; +# #[derive(Debug)] +# struct Testcase { _phantom: std::marker::PhantomData } +# impl Testcase { +# fn metadata_map_mut(&mut self) -> &mut Self { self } +# fn insert(&mut self, _meta: T) {} +# } +# +#[derive(Debug, SerdeAny, Serialize, Deserialize)] +pub struct PacketLenMetadata { + pub length: u64, +} + +// ... + +impl Feedback for PacketLenFeedback { + #[inline] + fn is_interesting( + &mut self, + _state: &mut S, + _manager: &mut EM, + input: &PacketData, + _observers: &OT, + _exit_kind: &ExitKind, + ) -> Result { + self.len = input.length; + Ok(false) + } + + #[inline] + fn append_metadata( + &mut self, + _state: &mut S, + _manager: &mut EM, + _observers: &OT, + testcase: &mut Testcase, + ) -> Result<(), Error> { + testcase + .metadata_map_mut() + .insert(PacketLenMetadata { length: self.len }); + Ok(()) + } +} +``` + +The `PacketLenFeedback` doesn't mark any input as interesting, but it attaches the packet length as metadata to each testcase. The `PacketLenMinimizerScheduler` then uses this metadata to prioritize shorter inputs. + +#### `LainMutator` + +This is defined in `fuzzers/baby/tutorial/src/mutator.rs`. + +```rust +# use std::fmt::Debug; +# +# mod lain { +# #[derive(Debug)] +# pub mod mutator { pub struct Mutator { _phantom: std::marker::PhantomData } } +# } +# impl lain::mutator::Mutator { +# pub fn rng_mut(&mut self) -> &mut Self { self } +# pub fn set_seed(&mut self, _seed: u64) {} +# } +# +# trait Mutator { +# fn mutate(&mut self, state: &mut S, input: &mut I) -> Result; +# } +# trait HasRand { +# fn rand_mut(&mut self) -> &mut Self; +# } +# +# trait Rand { +# fn next(&mut self) -> u64; +# } +# +# impl HasRand for T { +# fn rand_mut(&mut self) -> &mut Self { +# self +# } +# } +# +# impl Rand for () { +# fn next(&mut self) -> u64 { 0 } +# } +# +# #[derive(Debug)] +# struct StdRand; +# #[derive(Debug)] +# struct PacketData; +# impl PacketData { +# fn mutate(&mut self, _m: &mut lain::mutator::Mutator, _s: Option<()>) {} +# } +# #[derive(Debug)] +# enum MutationResult { Mutated } +# #[derive(Debug)] +# struct Error; +# +pub struct LainMutator { + inner: lain::mutator::Mutator, +} + +impl Mutator for LainMutator +where + S: HasRand, +{ + fn mutate(&mut self, state: &mut S, input: &mut PacketData) -> Result { + // Lain uses its own instance of StdRand, but we want to keep it in sync with LibAFL's state. + self.inner.rng_mut().set_seed(state.rand_mut().next()); + input.mutate(&mut self.inner, None); + Ok(MutationResult::Mutated) + } + // ... +} +``` + +The `LainMutator` simply calls the `mutate` method on the `PacketData` input, which was derived using `lain`. + +## Building and Running the Fuzzer + +Now, let's build and run our fuzzer. + +1. **Set the rust-toolchain**: + The fuzzer uses nightly. + + ```sh + cd fuzzers/baby/tutorial + rustup override set nightly + ``` + +2. **Build the fuzzer and the target**: + We need to build the fuzzer library, and the target C code. The build script `fuzzers/baby/tutorial/build.rs` handles building the target C code and linking it against our fuzzer library. + + First, we need to set up the compiler wrappers. + + ```sh + cargo build --bin libafl_cc + cargo build --bin libafl_cxx + ``` + + Then, we set the `CC` and `CXX` environment variables to point to our wrappers. + + ```sh + export CC=$(pwd)/target/debug/libafl_cc + export CXX=$(pwd)/target/debug/libafl_cxx + ``` + + Now, we can build the target. + + ```sh + make -C ../../.. fuzzers/baby/tutorial/target + ``` + + This will create a `target` executable in the `fuzzers/baby/tutorial` directory. + +3. **Run the fuzzer**: + + ```sh + cargo fuzz run --release + ``` + + Eventually, after running for a short while, the fuzzer will find the crash and save the crashing input in the `crashes` directory. + The baby fuzzer won't restart afterwards. This will need a Restating event manager, such as [`LlmpRestartingEventManager`](https://docs.rs/libafl/latest/libafl/events/llmp/restarting/struct.LlmpRestartingEventManager.html). + +## Conclusion + +In this tutorial, we've built a structure-aware fuzzer using LibAFL and `lain`. We've seen how to define a custom input structure, use a structure-aware mutator, and customize the fuzzer with custom feedbacks and schedulers. + +This is just a starting point. LibAFL is a very flexible framework that allows you to customize every aspect of the fuzzing process. We encourage you to explore the examples and the documentation to learn more. diff --git a/fuzzers/baby/tutorial/rust-toolchain b/fuzzers/baby/tutorial/rust-toolchain deleted file mode 100644 index bf867e0ae5b..00000000000 --- a/fuzzers/baby/tutorial/rust-toolchain +++ /dev/null @@ -1 +0,0 @@ -nightly diff --git a/fuzzers/baby/tutorial/src/input.rs b/fuzzers/baby/tutorial/src/input.rs index 6e0a864aecd..f034a0f5d07 100644 --- a/fuzzers/baby/tutorial/src/input.rs +++ b/fuzzers/baby/tutorial/src/input.rs @@ -1,4 +1,3 @@ -#![expect(unexpected_cfgs)] // deriving NewFuzzed etc. introduces these use std::hash::Hash; use lain::prelude::*; diff --git a/fuzzers/baby/tutorial/src/mutator.rs b/fuzzers/baby/tutorial/src/mutator.rs index d2813012099..a83cfe5b54e 100644 --- a/fuzzers/baby/tutorial/src/mutator.rs +++ b/fuzzers/baby/tutorial/src/mutator.rs @@ -52,7 +52,6 @@ impl LainMutator { } impl Default for LainMutator { - #[must_use] fn default() -> Self { Self::new() }