Skip to content

[NEW] Add structured dataset support to valkey-benchmark #2765

@VoletiRam

Description

@VoletiRam

Currently, valkey-benchmark only supports synthetic data generation through placeholders like __rand_int__ and __data__. This limits realistic performance testing since synthetic data doesn't reflect real-world usage patterns, data distributions, or content characteristics that applications actually work with. We need this capability for our Full-text search work and believe it would benefit other use cases like JSON operations, VSS, and general data modeling.

Proposed Solution

Add a --dataset option to valkey-benchmark that loads structured data from files and introduces field-based placeholders:


valkey-benchmark --dataset products.jsonl -n 50000 \
  HSET product:__field:id__ name "__field:name__" price __field:price__

New Placeholder Syntax

__field:columnname__: Replaced with data from specified dataset column in the file.

Supported file structure

CSV: Header row defines field names - title,content,category

TSV: Tab-separated with header - title\tcontent\tcategory

Parquet: Columnar binary format (for FTS) (requires library to support)

JSONL: Each line is JSON object - {"title": "...", "content": "...", "embedding": [...]} (requires library to support)

Details

  • Pre-load dataset into memory during initialization
  • Thread-safe row selection using atomic counters
  • Extends existing placeholder system in valkey-benchmark.c

Use Cases

# FTS with real Wikipedia data
valkey-benchmark --dataset wikipedia.csv -n 100000 \
  FT.SEARCH articles "@title:__field:title__"

# E-commerce product catalog
valkey-benchmark --dataset products.csv -n 50000 \
  HSET product:__field:id__ name "__field:name__" category "__field:category__"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions