Skip to content

Commit 4cc77c6

Browse files
committed
memrecycle(): match factor levels in UTF-8
Previously, by-reference sub-assignment to a factor column could fail to match strings with identical content if they had different encoding bits (even CE_NATIVE UTF-8 vs. CE_UTF8), causing duplicate levels. Fixes: #6886
1 parent 2cb0316 commit 4cc77c6

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

inst/tests/tests.Rraw

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21113,3 +21113,8 @@ test(2309.09, as.data.table(df, keep.rownames=TRUE), data.table(rn = c("a","b"),
2111321113
as.data.frame.no.reset = function(x) x
2111421114
DF = structure(list(a = 1:2), class = c("data.frame", "no.reset"), row.names = c(NA, -2L))
2111521115
test(2310.01, as.data.table(DF), data.table(a=1:2))
21116+
21117+
# memrecycle() did not consider string encodings for factor levels #6886
21118+
DT = data.table(factor(rep(enc2utf8('ø'), 3)))
21119+
DT[1,V1 := iconv('ø', to='latin1')]
21120+
test(2311, length(levels(DT$V1)), 1L)

src/assign.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,8 +823,10 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con
823823
SEXP targetLevels = PROTECT(getAttrib(target, R_LevelsSymbol)); protecti++;
824824
SEXP sourceLevels = source; // character source
825825
if (sourceIsFactor) { sourceLevels=PROTECT(getAttrib(source, R_LevelsSymbol)); protecti++; }
826+
sourceLevels = PROTECT(coerceUtf8IfNeeded(sourceLevels)); protecti++;
826827
if (!sourceIsFactor || !R_compute_identical(sourceLevels, targetLevels, 0)) { // !sourceIsFactor for test 2115.6
827828
const int nTargetLevels=length(targetLevels), nSourceLevels=length(sourceLevels);
829+
targetLevels = PROTECT(coerceUtf8IfNeeded(targetLevels)); protecti++;
828830
const SEXP *targetLevelsD=STRING_PTR_RO(targetLevels), *sourceLevelsD=STRING_PTR_RO(sourceLevels);
829831
SEXP newSource = PROTECT(allocVector(INTSXP, length(source))); protecti++;
830832
savetl_init();

0 commit comments

Comments
 (0)