Skip to content

Commit 7c9a76a

Browse files
authored
Use correct setting for click bench queries in sql_planner benchmark (#19835)
## Which issue does this PR close? - Closes #19809 ## Rationale for this change The ClickBench partitioned dataset was written by an ancient version of pyarrow that that wrote strings with the wrong logical type. To read it correctly, we must automatically convert binary to string. This is the configuration we run the ClickBench benchmark in as well: https://github.com/apache/datafusion/blob/cd12d510395eabb7ee51cac0a4cc7c7ffd1ac841/benchmarks/src/clickbench.rs#L184-L183 ## What changes are included in this PR? Change the sql planner benchmark to use the correct setting I tested it manually -- before this change this command fails ```shell cargo bench --profile=dev --bench sql_planner -- q50 ... thread 'main' (38326073) panicked at datafusion/core/benches/sql_planner.rs:62:14: called `Result::unwrap()` on an `Err` value: Context("type_coercion", Internal("Expect TypeSignatureClass::Native(LogicalType(Native(String), String)) but received NativeType::Binary, DataType: BinaryView")) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` After this change the command passes ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent 54b848c commit 7c9a76a

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

datafusion/core/benches/sql_planner.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,11 @@ fn register_clickbench_hits_table(rt: &Runtime) -> SessionContext {
118118

119119
let sql = format!("CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION '{path}'");
120120

121+
// ClickBench partitioned dataset was written by an ancient version of pyarrow that
122+
// that wrote strings with the wrong logical type. To read it correctly, we must
123+
// automatically convert binary to string.
124+
rt.block_on(ctx.sql("SET datafusion.execution.parquet.binary_as_string = true;"))
125+
.unwrap();
121126
rt.block_on(ctx.sql(&sql)).unwrap();
122127

123128
let count =

0 commit comments

Comments
 (0)