Skip to content

concat_elements_utf8view panics with large buffer on 64bit machines #17857

@samueleresca

Description

@samueleresca

Describe the bug

It is possible to cause a panic in Datafusion on 64-bit machines. Datafusion does not handle the panic caused by the underlying append_value method in the GenericByteViewBuilder. (See affected line)

See the To Reproduce section for the recursive concat query.

Few notes/thoughts:

  • The panic is reproducible in the latest version of Data Fusion, but it was also there in the previous versions.
  • The panic originates in arrow-rs as on 64-bit systems, usize can be up to u64::MAX, but everything is assumed to be u32::MAX. The panic is already declared in the doc of the append_value method, but never handled by the consumer (DataFusion). Is there a reason for that?
  • I understand the panic happens in arrow-rs, but should datafusion handle the panic coming from Arrow? (e.g. in the append_value call of concat_elements_utf8view) to prevent the panic from happening
  • Should datafusion limit somehow the append_value calls to prevent the panic from happening?

cc @comphead

To Reproduce

Sample repository: https://github.com/samueleresca/datafusion-byte-view-builder-issue

  1. Run on a 64-bit machine.
  2. Include some dummy data (I attached an example):
 ctx.register_parquet("users","./data/users_shorten.parquet", ParquetReadOptions::default()).await?;
  1. Run a recursive string concatenation query (see query in main.rs)
  2. Observe the panic

Expected behavior

  • Should Data Fusion handle the panic?
  • (maybe) restrictions on the builder view calls from datafusion
  • (maybe) arrow-rs handling this more gracefully

Additional context

Truncated panic trace happening on my local machine:

thread 'tokio-runtime-worker' (38043518) panicked at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46:
called `Result::unwrap()` on an `Err` value: TryFromIntError(())
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/a454fccb02df9d361f1201b747c01257f58a8b37/library/core/src/result.rs:1855:5
   3: core::result::Result<T,E>::unwrap
             at /Users/samuele/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:1226:23
   4: arrow_array::builder::generic_bytes_view_builder::GenericByteViewBuilder<T>::append_value
             at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-56.2.0/src/builder/generic_bytes_view_builder.rs:310:46
   5: datafusion_physical_expr::expressions::binary::kernels::concat_elements_utf8view
             at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary/kernels.rs:159:20
   6: datafusion_physical_expr::expressions::binary::concat_elements
             at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:1078:40
   7: datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_with_resolved_args
             at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:848:29
   8: <datafusion_physical_expr::expressions::binary::BinaryExpr as datafusion_physical_expr_common::physical_expr::PhysicalExpr>::evaluate
             at /Users/samuele/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/datafusion-physical-expr-50.0.0/src/expressions/binary.rs:479:14

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions