Skip to content

Stack Overflow with Deeply Nested Filter Expressions #8900

@orlandohohmeier

Description

@orlandohohmeier

Describe the bug

When running a query with a deeply nested filter expression the query fails with stack overflow – the bug initially manifested as a EXC_BAD_ACCESS error on macOS in our application. The problem is that the filter expression is recursively normalized using transform_up which can cause stack overflows. This probably also happens in other scenarios where one would end up with a deeply nested tree.

Tested/Reproduced with:
version = "34.0.0"
macOS = 14.2.1

To Reproduce

Minimal Reproducible Example:

use datafusion::arrow::array::Int64Array;
use datafusion::arrow::datatypes::DataType;
use datafusion::arrow::datatypes::Field;
use datafusion::arrow::datatypes::Schema;
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::error::Result;
use datafusion::prelude::*;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    let batch = RecordBatch::try_new(
        Arc::new(Schema::new(vec![Field::new("a", DataType::Int64, false)])),
        vec![Arc::new(Int64Array::from(vec![
            1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
        ]))],
    )?;
    let df = ctx.read_batch(batch)?;

    let mut expr = col("a").eq(lit(1));
    for _ in 0..1000 {
        expr = expr.or(col("a").eq(lit(1)));
    }

    let df = df.filter(expr).unwrap();

    df.show().await
}

For it to run into an SO with an optimized release build the depth needs to be increased to 10000.

Expected behavior

The query should complete without errors, despite the complexity of the filter expression.

Additional context

No response

/cc @nfnt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions