Skip to content

Commit b6ad1a4

Browse files
authored
only coerce UTF8 factors if needed in memrecycle (#7480)
* only coerce UTF8 factors if needed * remove unreachable code * remove unit test * add NEWS
1 parent e5e10a0 commit b6ad1a4

File tree

2 files changed

+6
-3
lines changed

2 files changed

+6
-3
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,8 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
350350

351351
25. By-group operations on missing rows (e.g. `foo[c(i, NA), bar, by=grp]`) now avoid leaving in data from the previous groups, [#7442](https://github.com/Rdatatable/data.table/issues/7442). Thanks @aitap for the report and the fix.
352352

353+
26. Grouping by a factor with many groups is now fast again, fixing a timing regression introduced in [#6890](https://github.com/Rdatatable/data.table/pull/6890) where UTF-8 coercion and level remapping were performed unnecessarily, [#7404](https://github.com/Rdatatable/data.table/issues/7404). Thanks @ben-schwen for the report and fix.
354+
353355
### NOTES
354356

355357
1. The following in-progress deprecations have proceeded:

src/assign.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -755,10 +755,11 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con
755755
SEXP targetLevels = PROTECT(getAttrib(target, R_LevelsSymbol)); protecti++;
756756
SEXP sourceLevels = source; // character source
757757
if (sourceIsFactor) { sourceLevels=PROTECT(getAttrib(source, R_LevelsSymbol)); protecti++; }
758-
sourceLevels = PROTECT(coerceUtf8IfNeeded(sourceLevels)); protecti++;
759-
if (!sourceIsFactor || !R_compute_identical(sourceLevels, targetLevels, 0)) { // !sourceIsFactor for test 2115.6
760-
const int nTargetLevels=length(targetLevels), nSourceLevels=length(sourceLevels);
758+
bool needUtf8Coerce = !sourceIsFactor || !R_compute_identical(sourceLevels, targetLevels, 0);
759+
if (needUtf8Coerce) {
760+
sourceLevels = PROTECT(coerceUtf8IfNeeded(sourceLevels)); protecti++;
761761
targetLevels = PROTECT(coerceUtf8IfNeeded(targetLevels)); protecti++;
762+
const int nTargetLevels=length(targetLevels), nSourceLevels=length(sourceLevels);
762763
const SEXP *targetLevelsD=STRING_PTR_RO(targetLevels), *sourceLevelsD=STRING_PTR_RO(sourceLevels);
763764
SEXP newSource = PROTECT(allocVector(INTSXP, length(source))); protecti++;
764765
savetl_init();

0 commit comments

Comments
 (0)