-
-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Describe the situation
Hi folks! Have been a fan of Clickhouse for a while but just recently came across this project. Super cool.
To take it for a quick test drive, I spun up Jupyterlab to run a few TPC-H queries at SF100 scale on a 2x socket AMD EPYC 7642 48-Core Processor. It was very easy to get up and running.
I did see a little slower performance than I expected, though.
Based on the docs, I expected to see similar performance to e.g., DuckDB when directly querying Parquet files.
Figured I'd file an issue, in case this might be unexpected behavior (or just user error, as I'm new to chDB).
Happy to provide more info, if helpful.
How to reproduce
Setup (TPCH files generated by https://github.com/clflushopt/tpchgen-rs/blob/main/tpchgen-cli/README.md)
conda create -n tpch -c conda-forge python=3.12 jupyterlab
conda activate tpch
pip install tpchgen-cli duckdb chdb
tpchgen-cli -s 100 --format=parquet --output-dir sf100
jupyter-labThen, in Jupyterlab:
import duckdb
import chdb
duckdb_q1 = f"""
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
"sf100/lineitem.parquet"
where
l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
"""
chdb_q1 = f"""
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
file("sf100/lineitem.parquet", 'Parquet')
where
l_shipdate <= DATE '1998-12-01' - INTERVAL '90' DAY
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
"""
%time duckdb.sql(duckdb_q1).show()
ββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββββββββ¬βββββββββββββββββββββ¬βββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββ
β l_returnflag β l_linestatus β sum_qty β sum_base_price β sum_disc_price β sum_charge β avg_qty β avg_price β avg_disc β count_order β
β varchar β varchar β decimal(38,2) β decimal(38,2) β decimal(38,4) β decimal(38,6) β double β double β double β int64 β
ββββββββββββββββΌβββββββββββββββΌββββββββββββββββΌββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββββΌβββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββ€
β A β F β 3775127758.00 β 5660776097194.45 β 5377736398183.9374 β 5592847429515.927026 β 25.499370423275426 β 38236.11698430489 β 0.050002243530929025 β 148047881 β
β N β F β 98553062.00 β 147771098385.98 β 140384965965.0348 β 145999793032.775829 β 25.501556956882876 β 38237.1993888045 β 0.04998528433805397 β 3864590 β
β N β O β 7436302976.00 β 11150725681373.59 β 10593195308234.8523 β 11016932248183.655467 β 25.500009404374193 β 38237.227646360945 β 0.04999791831562552 β 291619617 β
β R β F β 3775724970.00 β 5661603032745.34 β 5378513563915.4097 β 5593662252666.916161 β 25.50006628406532 β 38236.697258452965 β 0.05000130433965413 β 148067261 β
ββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββ΄ββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββββββ΄βββββββββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββββββ
CPU times: user 1min 31s, sys: 10.9 s, total: 1min 42s
Wall time: 2.68 s
%time chdb.query(chdb_q1).show()
"A","F",3775127758,5660776097194.45,5377736398183.9374,5592847429515.927026,25.499370423275426,38236.1169843049,0.050002243530929025,148047881
"N","F",98553062,147771098385.98,140384965965.0348,145999793032.775829,25.501556956882876,38237.19938880451,0.04998528433805397,3864590
"N","O",7436302976,11150725681373.59,10593195308234.8523,11016932248183.655467,25.500009404374193,38237.227646360945,0.04999791831562552,291619617
"R","F",3775724970,5661603032745.34,5378513563915.4097,5593662252666.916161,25.50006628406532,38236.697258452965,0.05000130433965412,148067261
CPU times: user 11min 5s, sys: 14min 31s, total: 25min 36s
Wall time: 40.8 schdb 3.6.0 pypi_0 pypi
duckdb 1.4.1 pypi_0 pypi