-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am rewriting our CDF operation in delta-rs, the code looks roughly like this:
let mut projected = if should_cdc {
operation_count
.clone()
.with_column(
CDC_COLUMN_NAME,
when(col(TARGET_DELETE_COLUMN).is_null(), lit("delete")) // nulls are equal to True
.when(col(DELETE_COLUMN).is_null(), lit("source_delete"))
.when(col(TARGET_COPY_COLUMN).is_null(), lit("copy"))
.when(col(TARGET_INSERT_COLUMN).is_null(), lit("insert"))
.when(col(TARGET_UPDATE_COLUMN).is_null(), lit("update"))
.end()?,
)?
// .drop_columns(&["__delta_rs_path"])? // WEIRD bug caused by interaction with unnest_columns, has to be dropped otherwise throws schema error
.with_column(
"__delta_rs_update_expanded",
when(
col(CDC_COLUMN_NAME).eq(lit("update")),
lit(ScalarValue::List(ScalarValue::new_list(
&[
ScalarValue::Utf8(Some("update_preimage".into())),
ScalarValue::Utf8(Some("update_postimage".into())),
],
&DataType::List(Field::new("element", DataType::Utf8, false).into()),
true,
))),
)
.end()?,
)?
.unnest_columns(&["__delta_rs_update_expanded"])?
.with_column(
CDC_COLUMN_NAME,
when(
col(CDC_COLUMN_NAME).eq(lit("update")),
col("__delta_rs_update_expanded"),
)
.otherwise(col(CDC_COLUMN_NAME))?,
)?
.drop_columns(&["__delta_rs_update_expanded"])?
.select(write_projection_with_cdf)?I noticed that when I do unnest_columns on another column, it complains afterwards about a schema error:
Result::unwrap()` on an `Err` value: Arrow { source: InvalidArgumentError("column types must match schema types, expected Utf8 but found Dictionary(UInt16, Utf8) at column index 7") }
Since I don't need the column, I can safely drop it beforehand, but I don't understand why doesn't Dictionary(UInt16, Utf8) just coerce to utf8?
To Reproduce
Bit difficult but, you can run grab my branch: https://github.com/ion-elgreco/delta-rs/tree/refactor--combine_execution_plans
And then you run the test test_merge_cdc_enabled_simple, with this line commented out: .drop_columns(&["__delta_rs_path"])?
Expected behavior
I guess coerce gracefully?
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working