trace2fio — Deterministic Reconstruction of I/O Workloads

Overview

trace2fio.py is a utility that takes an eBPF or syscall-level trace of file I/O activity and reconstructs an equivalent workload description suitable for use with fio. The tool's purpose is to allow developers, storage engineers, and kernel researchers to replay realistic I/O patterns captured from real systems in a deterministic and reproducible way.

Rather than relying on statistical inference or machine learning, trace2fio uses a deterministic state machine that models file descriptor lifecycles. This makes it easy to reason about and verify, while remaining flexible enough to describe complex I/O behaviors.

Motivation

Modern eBPF frameworks can trace fine-grained I/O events such as open, read, write, and fsync across all processes. However, converting those traces into a form usable for benchmarking tools like fio has traditionally required manual analysis or ad-hoc scripts.

trace2fio aims to bridge this gap by:

Mapping syscall traces to high-level fio jobs automatically.
Preserving key parameters such as block size, read/write mix, sequential/random access, and direct I/O flags.
Allowing controlled replay of real-world workloads without intrusive tracing.
Providing a deterministic, auditable workflow without machine learning heuristics.

How It Works

The tool parses a syscall trace in CSV or JSONL format with at least the following fields:

ts,pid,comm,syscall,fd,bytes,offset,flags,path,ret

It then applies a state machine model:

File descriptor tracking — each (pid, fd) pair is tracked through open → read/write → fsync → close.
Operation aggregation — offsets, sizes, and timestamps are aggregated per file.
Pattern inference — block size, sequential vs random pattern, and I/O type (read/write/mixed) are derived.
fio job generation — each file becomes a [job] section in the output .fio file.

The resulting fio configuration can be used to replay the original workload on any system, enabling accurate reproduction and performance analysis.

Example

$ python3 trace2fio.py trace.csv -o workload.fio
$ fio workload.fio

For combined replay of all files as a single job:

$ python3 trace2fio.py trace.csv -o workload.fio --merge

Example Input

ts,pid,comm,syscall,fd,bytes,offset,flags,path,ret
0.001,1234,app,open,3,,O_RDWR|O_DIRECT,/data/file1,3
0.002,1234,app,write,3,4096,0,,/data/file1,4096
0.004,1234,app,write,3,4096,4096,,/data/file1,4096
0.006,1234,app,fsync,3,,,,/data/file1,0
0.007,1234,app,close,3,,,,/data/file1,0

Produces:

[file1]
filename=file1.dat
rw=write
bs=4096
size=8192
direct=1

Design Principles

Determinism over heuristics: All inference is rule-based and explainable.
Modular expansion: Future versions may add io_uring, iodepth, or multithread detection.
Trace neutrality: Works with any tracing backend (bpftrace, perf, strace, LTTng) as long as the CSV/JSON schema matches.
Transparency: Every generated fio parameter can be traced back to source trace data.

Roadmap

Add support for async I/O (io_uring events) to infer iodepth and ioengine.
Detect temporal phases in traces (warmup, steady state, cooldown).
Integrate with bpftrace scripts to capture required fields automatically.
Visualize inferred workloads and I/O timelines.

License

MIT License — see LICENSE for details.

Author

Luis Chamberlain — (linux-kdevops project)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ai-traces		.ai-traces
LICENSES/preferred		LICENSES/preferred
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
trace2fio.py		trace2fio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

trace2fio — Deterministic Reconstruction of I/O Workloads

Overview

Motivation

How It Works

Example

Example Input

Design Principles

Roadmap

License

Author

About

Uh oh!

Releases

Packages

Languages

License

linux-kdevops/trace2fio

Folders and files

Latest commit

History

Repository files navigation

trace2fio — Deterministic Reconstruction of I/O Workloads

Overview

Motivation

How It Works

Example

Example Input

Design Principles

Roadmap

License

Author

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages