Skip to content

Conversation

@cgiachalis
Copy link
Contributor

@cgiachalis cgiachalis commented Sep 18, 2025

This PR is a first attempt to fix #843.

When an array has factor columns, any new writes that includes only a subset of existing factor levels will not remap these levels accurately.

Unit tests are added to cover the cases of new writes with current factors only (not growing enums).


Before

library(tiledb)

set_return_as_preference("data.frame")

uri <- tempfile()
df1 <- data.frame(id = 1:3, obs = factor(c("A", "B", "C")))
tiledb::fromDataFrame(df1, uri, col_index = 1, tile_domain = c(1L, 5L))
df2 <- data.frame(id = 4:5, obs = factor(c("B", "C")))
arr <- tiledb::tiledb_array(uri)
arr[] <- df2
arr[]


   id obs
1  1   A
2  2   B
3  3   C
4  4   A # <- expected B
5  5   B # <- expected C

After

   id obs
1  1   A
2  2   B
3  3   C
4  4   B
5  5   C

@cgiachalis cgiachalis force-pushed the cg/fix-843-remap-factor-levels branch 2 times, most recently from 862b5e5 to 93cd034 Compare September 27, 2025 10:39
@cgiachalis cgiachalis force-pushed the cg/fix-843-remap-factor-levels branch from 93cd034 to 26de769 Compare September 30, 2025 08:18
Copy link
Member

@mojaveazure mojaveazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in review; this looks good. I've made one suggestion below, but it's optional. Can you bump the develop version and add an entry to NEWS.md before we merge?

@mojaveazure mojaveazure merged commit 01de444 into TileDB-Inc:main Sep 30, 2025
@cgiachalis cgiachalis deleted the cg/fix-843-remap-factor-levels branch September 30, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enum levels are not remapped for updates with no new levels

2 participants