Skip to content

Commit 23b0a92

Browse files
fix(7571): bug fix for narm issue on gforce in int64 case (#7572)
* fix(7571): bug fix for narm issue on gforce in int64 case * fix(7571): test sequencing * fix(7571): updated the NEWS.md * trailing newline * Use $V1 * fix(7571): added db optimize 2L * refine NEWS * fix(7571): add more tests and change to code similar to int for gsum * fix(7571): added more tests for mean * eliminate intermediate variable * NEWS again --------- Co-authored-by: Michael Chirico <[email protected]>
1 parent 24a306a commit 23b0a92

File tree

3 files changed

+60
-6
lines changed

3 files changed

+60
-6
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
3. Vignettes are now built using `litedown` instead of `knitr`, [#6394](https://github.com/Rdatatable/data.table/issues/6394). Thanks @jangorecki for the suggestion and @ben-schwen and @aitap for the implementation.
1818

1919

20+
4. `sum(<int64 column>)` by group is correct with missing entries and GForce activated ([#7571](https://github.com/Rdatatable/data.table/issues/7571)). Thanks to @rweberc for the report and @manmita for the fix. The issue was caused by a faulty early `break` that spilled between groups, and resulted in silently incorrect results!
21+
2022
## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025
2123

2224
### BREAKING CHANGE

inst/tests/tests.Rraw

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21938,3 +21938,55 @@ dimnames(X) = copy(Xdn)
2193821938
DT = as.data.table(X)
2193921939
test(2354.2, dimnames(X), Xdn)
2194021940
rm(X, Xdn, DT)
21941+
21942+
#7571 issue for na.rm on int64
21943+
if (test_bit64) local({
21944+
# integer64 + GForce grouped sum with na.rm = FALSE
21945+
# Example 1 from issue: ids 1:8, 9, 9; three leading NAs then 4:10
21946+
dt_short = data.table(
21947+
id = c(1:8, 9, 9),
21948+
value = c(rep(NA_integer64_, 3L), as.integer64(4:10))
21949+
)
21950+
test(2355.1, options=c(datatable.optimize=2L),
21951+
dt_short[, sum(value, na.rm = FALSE), by = id]$V1,
21952+
as.integer64(c(NA, NA, NA, 4:8, 19))
21953+
)
21954+
21955+
# Example 2 from issue: ids in pairs, same values; checks multi-row groups
21956+
dt_short2 = data.table(
21957+
id = rep(1:5, each = 2L),
21958+
value = c(rep(NA_integer64_, 3L), as.integer64(4:10))
21959+
)
21960+
test(2355.2, options=c(datatable.optimize=2L),
21961+
dt_short2[, sum(value, na.rm = FALSE), by = id]$V1,
21962+
as.integer64(c(NA, NA, 11, 15, 19))
21963+
)
21964+
21965+
# Test mean for integer64 with NA
21966+
dt_mean = data.table(
21967+
id = c(1,1,2,2,3,3),
21968+
value = as.integer64(c(NA, NA, NA, 20000000, 5, 3))
21969+
)
21970+
test(2355.3, options=c(datatable.optimize=2L),
21971+
dt_mean[, mean(value, na.rm=FALSE), by = id]$V1,
21972+
c(NA, NA, 4)
21973+
)
21974+
21975+
# GForce sum vs base::sum for integer64
21976+
DT = data.table(id = sample(letters, 1000, TRUE), value = as.integer64(sample(c(1:100, NA), 1000, TRUE)))
21977+
gforce = DT[, .(gforce_sum = sum(value)), by=id]
21978+
base = DT[, .(true_sum = base::sum(value)), by=id]
21979+
merged = merge(gforce, base, by="id", all=TRUE)
21980+
test(2355.4, options=c(datatable.optimize=2L),
21981+
merged$gforce_sum, merged$true_sum
21982+
)
21983+
21984+
# GForce mean vs base::mean for integer64
21985+
DTm = data.table(id = sample(letters, 1000, TRUE), value = as.integer64(sample(c(1:100, NA), 1000, TRUE)))
21986+
gforce_m = DTm[, .(gforce_mean = mean(value)), by=id]
21987+
base_m = DTm[, .(true_mean = base::mean(value)), by=id]
21988+
merged_m = merge(gforce_m, base_m, by="id", all=TRUE)
21989+
test(2355.5, options=c(datatable.optimize=2L),
21990+
merged$gforce_mean, merged$true_mean
21991+
)
21992+
})

src/gsumm.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -502,13 +502,13 @@ SEXP gsum(SEXP x, SEXP narmArg)
502502
const int64_t *my_gx = gx + b*batchSize + pos;
503503
const uint16_t *my_low = low + b*batchSize + pos;
504504
for (int i=0; i<howMany; i++) {
505-
const int64_t elem = my_gx[i];
506-
if (elem!=INT64_MIN) {
507-
_ans[my_low[i]] += elem;
508-
} else {
509-
_ans[my_low[i]] = INT64_MIN;
510-
break;
505+
if (_ans[my_low[i]] == INT64_MIN) continue;
506+
const int64_t b = my_gx[i];
507+
if (b == INT64_MIN) {
508+
if (!narm) _ans[my_low[i]] = INT64_MIN;
509+
continue;
511510
}
511+
_ans[my_low[i]] += b;
512512
}
513513
}
514514
}

0 commit comments

Comments
 (0)