Conversation
The `.tbl` files generated by `tpch-gen.sh` has an additional trailing
column that need to be special-treated by `get_tbl_tpch_table_schema`.
After this the tests are running, but with some failures:
```shell
$ cd <BALLISTA_HOME>/benchmarks
$ TPCH_DATA=data cargo test
...
failures:
tests::q1
tests::q10
tests::q11
tests::q14
tests::q15
tests::q17
tests::q18
tests::q19
tests::q2
tests::q20
tests::q21
tests::q22
tests::q3
tests::q5
tests::q6
tests::q7
tests::q8
tests::q9
...
```
After this PR, some tests pass but still quite some are failing:
```
running 44 tests
07:50:07 [160/4254]
test tests::q15 ... FAILED
test tests::q13 ... FAILED
test tests::q1 ... FAILED
test tests::q18 ... FAILED
test tests::q10 ... FAILED
test tests::q11 ... FAILED
test tests::q12 ... FAILED
test tests::q17 ... FAILED
test tests::q14 ... FAILED
test tests::q19 ... FAILED
test tests::q4 ... FAILED
test tests::q20 ... FAILED
test tests::q3 ... FAILED
test tests::q22 ... FAILED
test tests::q6 ... FAILED
test tests::q21 ... FAILED
test tests::q2 ... FAILED
test tests::run_q1 ... ok
test tests::run_q10 ... ok
test tests::run_2 ... ok
test tests::run_q15 ... ignored
test tests::run_q11 ... ok
test tests::run_q12 ... ok
test tests::run_q13 ... ok
test tests::q5 ... FAILED
test tests::q7 ... FAILED
test tests::run_q14 ... ok
test tests::q9 ... FAILED
test tests::run_q18 ... ok
test tests::run_q17 ... ok
test tests::run_q16 ... ok
test tests::run_q19 ... ok
test tests::run_q20 ... ok
test tests::run_q6 ... ok
test tests::run_q22 ... ok
test tests::run_q21 ... ok
test tests::run_q3 ... ok
test tests::run_q4 ... ok
test tests::q8 ... FAILED
test tests::q16 ... FAILED
test tests::run_q5 ... ok
test tests::run_q9 ... ok
test tests::run_q7 ... ok
test tests::run_q8 ... ok
```
This fixes tests that failed with mismatched record count; for example, ``` thread 'tests::q2' panicked at benchmarks/src/bin/tpch.rs:1026:5: assertion `left == right` failed left: 100 right: 460 ``` The limits was not in the offical sql query but set through instructions at the end of the sql file with ":n 100". For example, the query-2 from the offical TPCH dbgen folder is as follows: ```sql :x :o select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = :1 and p_type like '%:2' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ':3' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ':3' ) order by s_acctbal desc, n_name, s_name, p_partkey; :n 100 ```
It is not supported and `run_q15` is also ignored.
See details below:
```sql
create view revenue0 (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1996-01-01'
and l_shipdate < date '1996-01-01' + interval '3' month
group by
l_suppkey;
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;
drop view revenue0;
```
3508126 to
7b80efd
Compare
|
@andygrove is right person for this review. |
Good point! I reverted the commit that changes the queries, and added limit as a step of the normalization. |
|
Thanks @vmingchen I will try and review this over the weekend |
benchmarks/src/bin/tpch.rs
Outdated
| // Our queries do not have the LIMIT clause, so we need to apply it | ||
| // before comparing the results with the expected results. |
There was a problem hiding this comment.
I don't remember why the queries don't contain limit clauses but I think they should, to match the TPC-H specification.
There was a problem hiding this comment.
Make sense. I have reapplied the limit and reverted the normalization-limit step.
There was a problem hiding this comment.
To provide more context, the ":n 100" lines in the official TPCH sql files are called flags. They are handled by the qsub function in qgen.c file:
/*
* FUNCTION qsub(char *qtag, int flags)
*
* based on the settings of flags, and the template file $QDIR/qtag.sql
* make the following substitutions to turn a query template into EQT
*
* String Converted to Based on
* ====== ============ ===========
* first line database <db_name>; -n from command line
* second line set explain on; -x from command line
* :<number> parameter <number>
* :k set number
* :o output to outpath/qnum.snum----
* -o from command line, SET_OUTPUT
* :s stream number
* :b BEGIN WORK; -a from command line, START_TRAN
* :e COMMIT WORK; -a from command line, END_TRAN
* :q query number
* :n<number> sets rowcount to be returned
*/
void
qsub(char *qtag, int flags)|
Looking at the https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf changes make sense @vmingchen, thanks a lot. I believe there are integration tests which verify this but they are disabled, it would be great if we can enable them after this get merged as a follow up. If @andygrove is ok with this we can merge it |
LGTM |
|
thanks for contributing @vmingchen |
Which issue does this PR close?
#1194
Rationale for this change
Fix unit test in tpch unit tests. Now benchmark test
TPCH_DATA=data/ cargo testis passing.What changes are included in this PR?
There are a list of changes each of which is in a separate commit:
get_tbl_tpch_table_schemato special treat the.tblfiles that are generated bytpch-gen.shand have an additional trailing column.q15()because it contains multiple statements and is not supported and similarly ignored byrun_q15().Are there any user-facing changes?
No