AIPerf generates synthetic datasets for benchmarking LLM inference servers. This tutorial explains how synthetic data is generated for text, images, audio, and video inputs.
Synthetic datasets enable consistent, reproducible benchmarking with full control over input characteristics. Each modality uses a specialized generator:
| Modality | Source Material | Configurable Properties |
|---|---|---|
| Text | Shakespeare corpus | Token length, distribution |
| Images | 4 source images | Width, height, format |
| Audio | Gaussian noise | Duration, sample rate, bit depth, channels |
| Video | Synthetic animations | Resolution, FPS, duration, codec |
All generators use deterministic random sampling for reproducibility (see Reproducibility Guide).
Text prompts are generated by sampling from a pre-tokenized Shakespeare corpus:
- Corpus Loading: The
assets/shakespeare.txtfile is tokenized once at startup - Character-Based Chunking: Text is split into fixed-size chunks for parallel tokenization
- Deterministic Sampling: Random slices of the tokenized corpus are extracted and decoded into prompts
- Length Control: Prompt lengths follow a normal distribution around specified mean/stddev
Key Feature: Character-based chunking ensures reproducibility across machines with different CPU counts - same random seed always produces identical prompts.
aiperf profile \
--model Qwen/Qwen3-0.6B \
--url localhost:8000 \
--endpoint-type chat \
--synthetic-input-tokens-mean 150 \
--synthetic-input-tokens-stddev 30 \
--output-tokens-mean 50 \
--request-count 10Options:
--synthetic-input-tokens-mean: Mean input token count (default: 550)--synthetic-input-tokens-stddev: Standard deviation for input length variability (default: 0)--output-tokens-mean: Mean number of output tokens requested (default: None — model decides)--output-tokens-stddev: Standard deviation for output token length (default: 0)--seq-dist: Distribution of (ISL, OSL) pairs for mixed workload simulation (default: None). See Sequence Length Distributions for format details.--random-seed: Seed for reproducible prompt generation (default: None)
For shared-prefix benchmarking (e.g., RAG scenarios):
aiperf profile \
--model Qwen/Qwen3-0.6B \
--url localhost:8000 \
--endpoint-type chat \
--synthetic-input-tokens-mean 100 \
--prefix-prompt-length 512 \
--prefix-prompt-pool-size 10 \
--request-count 10Each request randomly selects a 512-token prefix from a pool of 10, with a randomly sampled 100-token continuation. See Prefix Synthesis for details.
Images are generated by resizing source images from assets/source_images/:
- Source Images: 4 source images pre-loaded into memory
- Random Selection: One source image is randomly selected for each generation
- Resizing: Image is resized to target dimensions using PIL (Pillow)
- Format Conversion: Converted to the configured format (PNG, JPEG, or randomly selected)
- Base64 Encoding: Encoded as data URI for API requests
aiperf profile \
--model Qwen/Qwen2-VL-2B-Instruct \
--url localhost:8000 \
--endpoint-type chat \
--image-width-mean 512 \
--image-height-mean 512 \
--image-width-stddev 50 \
--image-height-stddev 50 \
--image-format png \
--image-batch-size 2 \
--synthetic-input-tokens-mean 100 \
--request-count 5Options:
--image-width-mean: Mean width in pixels (default: 0)--image-width-stddev: Width standard deviation (default: 0)--image-height-mean: Mean height in pixels (default: 0)--image-height-stddev: Height standard deviation (default: 0)--image-format:png,jpeg, orrandom(default:png)--image-batch-size: Number of images per request (default: 1)
Note: Image generation requires both --image-width-mean and --image-height-mean to be > 0. Setting either to 0 disables images.
Audio files are generated as synthetic Gaussian noise:
- Parameter Selection: Random selection of sample rate and bit depth from configured lists
- Duration Sampling: Duration follows normal distribution (with rejection sampling for ≥0.01s)
- Noise Generation: Gaussian noise generated as a NumPy array
- Scaling: Clipped to [-1, 1] and scaled to the target bit depth range
- Encoding: Written as WAV or MP3 using soundfile library
- Base64 Encoding: Encoded as
<format>,<base64data>string
Audio Characteristics:
- White noise (all frequencies equally represented)
- Gaussian amplitude distribution
- No structured speech or music content
aiperf profile \
--model Qwen/Qwen2-Audio-7B-Instruct \
--url localhost:8000 \
--endpoint-type chat \
--audio-length-mean 5.0 \
--audio-length-stddev 1.0 \
--audio-sample-rates 16 \
--audio-depths 16 \
--audio-format wav \
--audio-num-channels 1 \
--audio-batch-size 1 \
--synthetic-input-tokens-mean 50 \
--request-count 10Options:
--audio-length-mean: Mean duration in seconds (default: 0.0)--audio-length-stddev: Duration standard deviation (default: 0.0)--audio-sample-rates: List of sample rates in kHz to randomly select from (default:[16.0])--audio-depths: List of bit depths (8, 16, 24, 32) to randomly select from (default:[16])--audio-format:wavormp3(default:wav)--audio-num-channels: 1 (mono) or 2 (stereo) (default: 1)--audio-batch-size: Number of audio files per request (default: 1)
Note: Set --audio-length-mean > 0 to enable audio generation. MP3 supports a limited set of sample rates; use WAV for custom rates.
Video generation is fully documented in Synthetic Video Generation. Key points:
- Synthesis Types:
moving_shapes(animated geometry),grid_clock(grid with animation), ornoise(random pixels) - Codecs: CPU (
libvpx-vp9,libx264,libx265) or GPU (h264_nvenc,hevc_nvenc) - Formats: WebM (default) or MP4
Prerequisite: Video generation requires FFmpeg. For installations, see Synthetic Video Tutorial.
aiperf profile \
--model Qwen/Qwen2-VL-2B-Instruct \
--url localhost:8000 \
--endpoint-type chat \
--video-width 640 \
--video-height 480 \
--video-fps 4 \
--video-duration 5.0 \
--video-synth-type moving_shapes \
--video-codec libvpx-vp9 \
--output-tokens-mean 50 \
--request-count 5See Synthetic Video Tutorial for complete details.