Skip to content

Column::bounds_check goes into error path from ProjectionExpr::partition_statistics() #17122

@hareshkh

Description

@hareshkh

Describe the bug

When computing partition_statistics during evalutaion - flamegraph shows a lot of time spend in bounds_check() which happens as part of a Column::data_type() call.

Image

Almost all of the time in bounds_check() is also spend in fmt() which suggests that this goes into the error branch:

impl Column {
    fn bounds_check(&self, input_schema: &Schema) -> Result<()> {
        if self.index < input_schema.fields.len() {
            Ok(())
        } else {
            internal_err!(
                "PhysicalExpr Column references column '{}' at index {} (zero-based) but input schema only has {} columns: {:?}",
                self.name,
                self.index,
                input_schema.fields.len(),
                input_schema.fields().iter().map(|f| f.name()).collect::<Vec<_>>()
            )
        }
    }
}

All occurrences that I hand checked from my example were originating from ProjectionExec::partition_statistics()

To Reproduce

Run with RUST_BACKTRACE enabled.

Expected behavior

data_type() method should not trigger bounds_check() to go to an error path for the column.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions