Skip to content

Slower performance than expected for TPC-H SF100 Q01Β #402

@beckernick

Description

@beckernick

Describe the situation
Hi folks! Have been a fan of Clickhouse for a while but just recently came across this project. Super cool.

To take it for a quick test drive, I spun up Jupyterlab to run a few TPC-H queries at SF100 scale on a 2x socket AMD EPYC 7642 48-Core Processor. It was very easy to get up and running.

I did see a little slower performance than I expected, though.

Based on the docs, I expected to see similar performance to e.g., DuckDB when directly querying Parquet files.

Figured I'd file an issue, in case this might be unexpected behavior (or just user error, as I'm new to chDB).

Happy to provide more info, if helpful.

How to reproduce

Setup (TPCH files generated by https://github.com/clflushopt/tpchgen-rs/blob/main/tpchgen-cli/README.md)

conda create -n tpch -c conda-forge python=3.12 jupyterlab
conda activate tpch
pip install tpchgen-cli duckdb chdb
tpchgen-cli -s 100 --format=parquet --output-dir sf100
jupyter-lab

Then, in Jupyterlab:

import duckdb
import chdb

duckdb_q1 = f"""
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    "sf100/lineitem.parquet"
where
    l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus
"""

chdb_q1 = f"""
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    file("sf100/lineitem.parquet", 'Parquet')
where
    l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus
"""

%time duckdb.sql(duckdb_q1).show()

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ l_returnflag β”‚ l_linestatus β”‚    sum_qty    β”‚  sum_base_price   β”‚   sum_disc_price    β”‚      sum_charge       β”‚      avg_qty       β”‚     avg_price      β”‚       avg_disc       β”‚ count_order β”‚
β”‚   varchar    β”‚   varchar    β”‚ decimal(38,2) β”‚   decimal(38,2)   β”‚    decimal(38,4)    β”‚     decimal(38,6)     β”‚       double       β”‚       double       β”‚        double        β”‚    int64    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ A            β”‚ F            β”‚ 3775127758.00 β”‚  5660776097194.45 β”‚  5377736398183.9374 β”‚  5592847429515.927026 β”‚ 25.499370423275426 β”‚  38236.11698430489 β”‚ 0.050002243530929025 β”‚   148047881 β”‚
β”‚ N            β”‚ F            β”‚   98553062.00 β”‚   147771098385.98 β”‚   140384965965.0348 β”‚   145999793032.775829 β”‚ 25.501556956882876 β”‚   38237.1993888045 β”‚  0.04998528433805397 β”‚     3864590 β”‚
β”‚ N            β”‚ O            β”‚ 7436302976.00 β”‚ 11150725681373.59 β”‚ 10593195308234.8523 β”‚ 11016932248183.655467 β”‚ 25.500009404374193 β”‚ 38237.227646360945 β”‚  0.04999791831562552 β”‚   291619617 β”‚
β”‚ R            β”‚ F            β”‚ 3775724970.00 β”‚  5661603032745.34 β”‚  5378513563915.4097 β”‚  5593662252666.916161 β”‚  25.50006628406532 β”‚ 38236.697258452965 β”‚  0.05000130433965413 β”‚   148067261 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

CPU times: user 1min 31s, sys: 10.9 s, total: 1min 42s
Wall time: 2.68 s

%time chdb.query(chdb_q1).show()
"A","F",3775127758,5660776097194.45,5377736398183.9374,5592847429515.927026,25.499370423275426,38236.1169843049,0.050002243530929025,148047881
"N","F",98553062,147771098385.98,140384965965.0348,145999793032.775829,25.501556956882876,38237.19938880451,0.04998528433805397,3864590
"N","O",7436302976,11150725681373.59,10593195308234.8523,11016932248183.655467,25.500009404374193,38237.227646360945,0.04999791831562552,291619617
"R","F",3775724970,5661603032745.34,5378513563915.4097,5593662252666.916161,25.50006628406532,38236.697258452965,0.05000130433965412,148067261

CPU times: user 11min 5s, sys: 14min 31s, total: 25min 36s
Wall time: 40.8 s
chdb                      3.6.0                    pypi_0    pypi
duckdb                    1.4.1                    pypi_0    pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions