Skip to content

Add chdb vs pandas peak-memory benchmark#557

Open
wudidapaopao wants to merge 1 commit intochdb-io:mainfrom
wudidapaopao:add_benchmark_tests
Open

Add chdb vs pandas peak-memory benchmark#557
wudidapaopao wants to merge 1 commit intochdb-io:mainfrom
wudidapaopao:add_benchmark_tests

Conversation

@wudidapaopao
Copy link
Copy Markdown
Contributor

Self-contained benchmark that auto-generates test data (default 10M rows) and compares chdb SQL-pushdown vs pandas across 10 scenarios including filter, groupby, join, window functions, quantiles, and time-series. Measures peak memory via VmHWM (Linux) or ru_maxrss (macOS) in isolated subprocesses.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Run these jobs only (required builds will be added automatically):

  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with aarch64
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Deny these jobs:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

Self-contained benchmark that auto-generates test data (default 10M rows)
and compares chdb SQL-pushdown vs pandas across 10 scenarios including
filter, groupby, join, window functions, quantiles, and time-series.
Measures peak memory via VmHWM (Linux) or ru_maxrss (macOS) in isolated
subprocesses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant