Skip to content

Conversation

timsaucer
Copy link
Contributor

Related

Closes https://github.com/rerun-io/dataplatform/issues/851

What

This PR identifies when the push down filters provided to the table provider are checking for a component that is not null. If so it pushes this filter down into the chunk store query. This reduces the amount of data coming out of the cpu worker thread and passed into the rest of the datafusion engine.

These are equivalent

  • column is not null
  • not (column is null)
  • column != lit(null)

@timsaucer timsaucer self-assigned this Aug 7, 2025
Copy link

github-actions bot commented Aug 7, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
f172a07 https://rerun.io/viewer/pr/10829 +nightly +main

Note: This comment is updated whenever you push a commit.

@timsaucer timsaucer added exclude from changelog PRs with this won't show up in CHANGELOG.md feat-dataframe-api Everything related to the dataframe API dataplatform Rerun Data Platform integration labels Aug 7, 2025
Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say I follow everything happening here, but… looks good :)

Is there a way we can add a test for this without too much hassle?

@@ -190,6 +191,49 @@ impl DataframeQueryTableProvider {
chunk_request,
})
}

fn column_to_selector(column: &Column) -> Option<ComponentColumnSelector> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit/style: we usually prefer selector_from_column (see CODE_STYLE.md).

Mostly for consistency with matrices, but also writing it let selector = selector_from_column(column); means the names in the function title is next to the types of the input/output variables (instead of being swizzled)

ComponentColumnSelector::from_str(column.name()).ok()
}

fn compute_column_is_not_null_filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a docstring to explain that what this checks for, especially since the function name can be read two ways (does it return true for the filter foo == 42 since that is not a null filter?)

ComponentColumnSelector::from_str(column.name()).ok()
}

fn compute_column_is_not_null_filter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would read slightly better?

Suggested change
fn compute_column_is_not_null_filter(
fn compute_column_is_neq_null_filter(

Comment on lines 205 to 233
match expr {
Expr::IsNotNull(inner) => {
if let Expr::Column(col) = inner.as_ref() {
return Ok(Self::column_to_selector(col));
}
}
Expr::Not(inner) => {
if let Expr::IsNull(col_expr) = inner.as_ref() {
if let Expr::Column(col) = col_expr.as_ref() {
return Ok(Self::column_to_selector(col));
}
}
}
Expr::BinaryExpr(binary) => {
if binary.op == Operator::NotEq {
if let (Expr::Column(col), Expr::Literal(sv))
| (Expr::Literal(sv), Expr::Column(col)) =
(binary.left.as_ref(), binary.right.as_ref())
{
if sv.is_null() {
return Ok(Self::column_to_selector(col));
}
}
}
}
_ => {}
}

Ok(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would read nicer with a helper function

Suggested change
match expr {
Expr::IsNotNull(inner) => {
if let Expr::Column(col) = inner.as_ref() {
return Ok(Self::column_to_selector(col));
}
}
Expr::Not(inner) => {
if let Expr::IsNull(col_expr) = inner.as_ref() {
if let Expr::Column(col) = col_expr.as_ref() {
return Ok(Self::column_to_selector(col));
}
}
}
Expr::BinaryExpr(binary) => {
if binary.op == Operator::NotEq {
if let (Expr::Column(col), Expr::Literal(sv))
| (Expr::Literal(sv), Expr::Column(col)) =
(binary.left.as_ref(), binary.right.as_ref())
{
if sv.is_null() {
return Ok(Self::column_to_selector(col));
}
}
}
}
_ => {}
}
Ok(None)
if is_neq_null(expr) {
Ok(Self::column_to_selector(col))
} else {
Ok(None)
}

That would also allow us to unit-test is_neq_null

Comment on lines 212 to 213
if let Expr::IsNull(col_expr) = inner.as_ref() {
if let Expr::Column(col) = col_expr.as_ref() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use let-chaining

Comment on lines 219 to 224
if binary.op == Operator::NotEq {
if let (Expr::Column(col), Expr::Literal(sv))
| (Expr::Literal(sv), Expr::Column(col)) =
(binary.left.as_ref(), binary.right.as_ref())
{
if sv.is_null() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use let-chaining

@timsaucer timsaucer force-pushed the tsaucer/streaming_provider_filtering branch from ddf569b to f172a07 Compare August 11, 2025 13:56
@timsaucer timsaucer merged commit d25341f into main Aug 12, 2025
40 checks passed
@timsaucer timsaucer deleted the tsaucer/streaming_provider_filtering branch August 12, 2025 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataplatform Rerun Data Platform integration exclude from changelog PRs with this won't show up in CHANGELOG.md feat-dataframe-api Everything related to the dataframe API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants