Skip to content

Conversation

@gabotechs
Copy link
Collaborator

@gabotechs gabotechs commented Dec 30, 2025

This PR adapts the remote benchmarks to run over other datasets besides just TPCH. It was built to qualify the changes shipped in #275, but as a separate PR as it's quite big.

This is the result of running the Clickbench dataset in Trino VS Distributed DataFusion in a remote cluster:

==== Comparison with previous run ====
  q0.sql: prev=1065 ms, new= 165 ms, 6.45x faster ✅
  q1.sql: prev=1619 ms, new=1356 ms, 1.19x faster ✔
  q2.sql: prev=2027 ms, new=1172 ms, 1.73x faster ✅
  q3.sql: prev=1476 ms, new=1155 ms, 1.28x faster ✅
  q4.sql: prev=2041 ms, new=1891 ms, 1.08x faster ✔
  q5.sql: prev=3507 ms, new=2524 ms, 1.39x faster ✅
  q6.sql: prev=1319 ms, new= 146 ms, 9.03x faster ✅
  q7.sql: prev=1554 ms, new=1001 ms, 1.55x faster ✅
  q8.sql: prev=3518 ms, new=2298 ms, 1.53x faster ✅
  q9.sql: prev=6601 ms, new=2602 ms, 2.54x faster ✅
 q10.sql: prev=2162 ms, new=1414 ms, 1.53x faster ✅
 q11.sql: prev=2062 ms, new=1380 ms, 1.49x faster ✅
 q12.sql: prev=3359 ms, new=2533 ms, 1.33x faster ✅
 q13.sql: prev=5450 ms, new=4424 ms, 1.23x faster ✅
 q14.sql: prev=3527 ms, new=2510 ms, 1.41x faster ✅
 q15.sql: prev=2362 ms, new=2072 ms, 1.14x faster ✔
 q16.sql: prev=7081 ms, new=3758 ms, 1.88x faster ✅
 q17.sql: prev=6290 ms, new=3700 ms, 1.70x faster ✅
 q19.sql: prev=1465 ms, new= 931 ms, 1.57x faster ✅
 q20.sql: prev=5034 ms, new=5749 ms, 1.14x slower ✖
 q21.sql: prev=5192 ms, new=5867 ms, 1.13x slower ✖
 q22.sql: prev=8653 ms, new=9622 ms, 1.11x slower ✖
 q23.sql: prev=14331 ms, new=24139 ms, 1.68x slower ❌
 q24.sql: prev=2306 ms, new=1708 ms, 1.35x faster ✅
 q25.sql: prev=1958 ms, new=1571 ms, 1.25x faster ✅
 q26.sql: prev=2517 ms, new=1800 ms, 1.40x faster ✅
 q27.sql: prev=5280 ms, new=6377 ms, 1.21x slower ❌
 q29.sql: prev=8144 ms, new=1725 ms, 4.72x faster ✅
 q30.sql: prev=3310 ms, new=2287 ms, 1.45x faster ✅
 q31.sql: prev=4096 ms, new=2840 ms, 1.44x faster ✅
 q32.sql: prev=15564 ms, new=7608 ms, 2.05x faster ✅
 q33.sql: prev=13225 ms, new=8552 ms, 1.55x faster ✅
 q34.sql: prev=13266 ms, new=8465 ms, 1.57x faster ✅
 q35.sql: prev=4231 ms, new=2700 ms, 1.57x faster ✅
 q36.sql: prev=1459 ms, new= 807 ms, 1.81x faster ✅
 q37.sql: prev=1223 ms, new= 424 ms, 2.88x faster ✅

What's interesting is that queries 20, 21, 22 and 23 are slower because the perform LIKE <expr> operations, which probably means that DataFusion does not really do a good job at evaluating those.


This is the result for the TPC-DS benchmark:

==== Comparison with previous run ====
  q1.sql: prev=1572 ms, new= 822 ms, 1.91x faster ✅
  q2.sql: prev=1784 ms, new= 506 ms, 3.53x faster ✅
  q3.sql: prev=1315 ms, new= 401 ms, 3.28x faster ✅
  q4.sql: prev=5515 ms, new=6400 ms, 1.16x slower ✖
  q5.sql: prev=2596 ms, new=1014 ms, 2.56x faster ✅
  q6.sql: prev=1879 ms, new=1748 ms, 1.07x faster ✔
  q7.sql: prev=1621 ms, new= 733 ms, 2.21x faster ✅
  q8.sql: prev=1815 ms, new= 771 ms, 2.35x faster ✅
 q10.sql: prev=2092 ms, new=1406 ms, 1.49x faster ✅
 q11.sql: prev=3283 ms, new=4123 ms, 1.26x slower ❌
 q12.sql: prev=1135 ms, new= 447 ms, 2.54x faster ✅
 q13.sql: prev=2371 ms, new=1237 ms, 1.92x faster ✅
 q14.sql: prev=7514 ms, new=2094 ms, 3.59x faster ✅
 q15.sql: prev=1220 ms, new= 557 ms, 2.19x faster ✅
 q16.sql: prev=1510 ms, new=1093 ms, 1.38x faster ✅
 q17.sql: prev=1703 ms, new= 708 ms, 2.41x faster ✅
 q18.sql: prev=1627 ms, new= 763 ms, 2.13x faster ✅
 q19.sql: prev=1641 ms, new= 820 ms, 2.00x faster ✅
 q20.sql: prev=1223 ms, new= 478 ms, 2.56x faster ✅
 q21.sql: prev=2846 ms, new= 590 ms, 4.82x faster ✅
 q22.sql: prev=1825 ms, new= 848 ms, 2.15x faster ✅
 q23.sql: prev=4786 ms, new=2137 ms, 2.24x faster ✅
 q24.sql: prev=2641 ms, new=1228 ms, 2.15x faster ✅
 q25.sql: prev=1793 ms, new= 762 ms, 2.35x faster ✅
 q26.sql: prev=1424 ms, new= 563 ms, 2.53x faster ✅
 q27.sql: prev=2524 ms, new=1311 ms, 1.93x faster ✅
 q28.sql: prev=1565 ms, new= 620 ms, 2.52x faster ✅
 q29.sql: prev=1961 ms, new= 783 ms, 2.50x faster ✅
 q31.sql: prev=2621 ms, new= 910 ms, 2.88x faster ✅
 q32.sql: prev=1221 ms, new= 561 ms, 2.18x faster ✅
 q33.sql: prev=2547 ms, new= 874 ms, 2.91x faster ✅
 q34.sql: prev=1544 ms, new= 717 ms, 2.15x faster ✅
 q35.sql: prev=1935 ms, new=1145 ms, 1.69x faster ✅
 q36.sql: prev=2621 ms, new=1137 ms, 2.31x faster ✅
 q37.sql: prev=1300 ms, new=1223 ms, 1.06x faster ✔
 q38.sql: prev=1821 ms, new= 951 ms, 1.91x faster ✅
 q39.sql: prev=5068 ms, new=1429 ms, 3.55x faster ✅
 q40.sql: prev=1459 ms, new= 602 ms, 2.42x faster ✅
 q41.sql: prev= 783 ms, new= 359 ms, 2.18x faster ✅
 q42.sql: prev=1032 ms, new= 441 ms, 2.34x faster ✅
 q43.sql: prev=1297 ms, new= 422 ms, 3.07x faster ✅
 q44.sql: prev=1523 ms, new= 704 ms, 2.16x faster ✅
 q45.sql: prev=1524 ms, new= 748 ms, 2.04x faster ✅
 q46.sql: prev=1844 ms, new=1094 ms, 1.69x faster ✅
 q47.sql: prev=3724 ms, new=1815 ms, 2.05x faster ✅
 q48.sql: prev=1842 ms, new=1095 ms, 1.68x faster ✅
 q49.sql: prev=2405 ms, new= 717 ms, 3.35x faster ✅
 q50.sql: prev=1520 ms, new= 772 ms, 1.97x faster ✅
 q51.sql: prev=1717 ms, new=1093 ms, 1.57x faster ✅
 q52.sql: prev=1091 ms, new= 487 ms, 2.24x faster ✅
 q53.sql: prev=1406 ms, new= 527 ms, 2.67x faster ✅
 q54.sql: prev=2429 ms, new=1302 ms, 1.87x faster ✅
 q55.sql: prev=1127 ms, new= 623 ms, 1.81x faster ✅
 q56.sql: prev=2285 ms, new= 904 ms, 2.53x faster ✅
 q57.sql: prev=2670 ms, new=1044 ms, 2.56x faster ✅
 q58.sql: prev=2730 ms, new= 920 ms, 2.97x faster ✅
 q59.sql: prev=1957 ms, new= 906 ms, 2.16x faster ✅
 q60.sql: prev=2354 ms, new= 914 ms, 2.58x faster ✅
 q61.sql: prev=2412 ms, new=1641 ms, 1.47x faster ✅
 q62.sql: prev=1470 ms, new=2208 ms, 1.50x slower ❌
 q63.sql: prev=1319 ms, new= 675 ms, 1.95x faster ✅
 q64.sql: prev=5805 ms, new=3392 ms, 1.71x faster ✅
 q65.sql: prev=1580 ms, new= 977 ms, 1.62x faster ✅
 q66.sql: prev=2678 ms, new=1171 ms, 2.29x faster ✅
 q67.sql: prev=1960 ms, new=2155 ms, 1.10x slower ✖
 q68.sql: prev=1846 ms, new=1136 ms, 1.63x faster ✅
 q69.sql: prev=1784 ms, new=1209 ms, 1.48x faster ✅
 q71.sql: prev=1751 ms, new= 988 ms, 1.77x faster ✅
 q72.sql: prev=29048 ms, new=59454 ms, 2.05x slower ❌
 q73.sql: prev=1364 ms, new= 838 ms, 1.63x faster ✅
 q74.sql: prev=2638 ms, new=2098 ms, 1.26x faster ✅
 q75.sql: prev=3267 ms, new=1373 ms, 2.38x faster ✅
 q76.sql: prev=1681 ms, new= 731 ms, 2.30x faster ✅
 q77.sql: prev=2551 ms, new=1305 ms, 1.95x faster ✅
 q78.sql: prev=2511 ms, new=1159 ms, 2.17x faster ✅
 q79.sql: prev=1570 ms, new= 866 ms, 1.81x faster ✅
 q80.sql: prev=3189 ms, new=1505 ms, 2.12x faster ✅
 q81.sql: prev=1353 ms, new= 989 ms, 1.37x faster ✅
 q82.sql: prev=1222 ms, new=1164 ms, 1.05x faster ✔
 q83.sql: prev=1791 ms, new= 661 ms, 2.71x faster ✅
 q84.sql: prev=1354 ms, new= 718 ms, 1.89x faster ✅
 q85.sql: prev=1746 ms, new=1124 ms, 1.55x faster ✅
 q87.sql: prev=1739 ms, new= 867 ms, 2.01x faster ✅
 q88.sql: prev=3301 ms, new=1359 ms, 2.43x faster ✅
 q89.sql: prev=1440 ms, new= 580 ms, 2.48x faster ✅
 q90.sql: prev=1349 ms, new= 617 ms, 2.19x faster ✅
 q91.sql: prev=1690 ms, new= 501 ms, 3.37x faster ✅
 q92.sql: prev=1093 ms, new= 669 ms, 1.63x faster ✅
 q93.sql: prev=1192 ms, new= 539 ms, 2.21x faster ✅
 q94.sql: prev=1484 ms, new= 766 ms, 1.94x faster ✅
 q95.sql: prev=1581 ms, new=1054 ms, 1.50x faster ✅
 q96.sql: prev=1176 ms, new= 532 ms, 2.21x faster ✅
 q97.sql: prev=1238 ms, new= 850 ms, 1.46x faster ✅
 q98.sql: prev=1218 ms, new= 567 ms, 2.15x faster ✅
 q99.sql: prev=1590 ms, new=28869 ms, 18.16x slower ❌

@gabotechs gabotechs force-pushed the gabrielmusat/improve-default-task-estimator branch from da7997e to e5cd77d Compare December 30, 2025 15:49
@gabotechs gabotechs force-pushed the gabrielmusat/adapt-benchmarks-to-more-datasets branch from b70cc76 to 10b13f3 Compare December 30, 2025 15:49
Base automatically changed from gabrielmusat/improve-default-task-estimator to main January 2, 2026 07:13
@gabotechs gabotechs force-pushed the gabrielmusat/adapt-benchmarks-to-more-datasets branch from 10b13f3 to d2241c0 Compare January 2, 2026 07:29
@gabotechs gabotechs merged commit 1fb4daa into main Jan 2, 2026
7 checks passed
@gabotechs gabotechs deleted the gabrielmusat/adapt-benchmarks-to-more-datasets branch January 2, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants