Skip to content

Commit a5e2bca

Browse files
authored
Merge pull request #6164 from Rdatatable/dcast
Upgrade `dcast` behavior from message to warning for missing aggregate function
2 parents 8d1377e + 1ef1a70 commit a5e2bca

File tree

4 files changed

+5
-3
lines changed

4 files changed

+5
-3
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@
9696

9797
14. `setNumericRounding()` now invisibly returns the old rounding value instead of `NULL`, which is now consistent with similar behavior by `setwd()`, `options()`, etc. Thanks @MichaelChirico for the report and @joshhwuu for the fix.
9898

99+
15. `dcast()` now issues a warning when `fun.aggregate` is used but not provided by the user. `fun.aggregate` defaults to `length` in this case. Previously, only a message was issued. However, relying on this default often signals unexpected duplicates in the data. Therefore, a stricter class of signal was deemed more appropriate, [#5386](https://github.com/Rdatatable/data.table/issues/5386). The warning is classed as `dt_missing_fun_aggregate_warning`, allowing for more targeted handling in user code. Thanks @MichaelChirico for the suggestion and @Nj221102 for the fix.
100+
99101
# data.table [v1.15.0](https://github.com/Rdatatable/data.table/milestone/29) (30 Jan 2024)
100102

101103
## BREAKING CHANGE

R/fcast.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ dcast.data.table = function(data, formula, fun.aggregate = NULL, sep = "_", ...,
182182
if (is.null(fun.call)) {
183183
oo = forderv(dat, by=varnames, retGrp=TRUE)
184184
if (attr(oo, 'maxgrpn', exact=TRUE) > 1L) {
185-
messagef("'fun.aggregate' is NULL, but found duplicate row/column combinations, so defaulting to length(). That is, the variables %s used in 'formula' do not uniquely identify rows in the input 'data'. In such cases, 'fun.aggregate' is used to derive a single representative value for each combination in the output data.table, for example by summing or averaging (fun.aggregate=sum or fun.aggregate=mean, respectively). Check the resulting table for values larger than 1 to see which combinations were not unique. See ?dcast.data.table for more details.", brackify(varnames))
185+
warningf("'fun.aggregate' is NULL, but found duplicate row/column combinations, so defaulting to length(). That is, the variables %s used in 'formula' do not uniquely identify rows in the input 'data'. In such cases, 'fun.aggregate' is used to derive a single representative value for each combination in the output data.table, for example by summing or averaging (fun.aggregate=sum or fun.aggregate=mean, respectively). Check the resulting table for values larger than 1 to see which combinations were not unique. See ?dcast.data.table for more details.", brackify(varnames), class= "dt_missing_fun_aggregate_warning")
186186
fun.call = quote(length)
187187
}
188188
}

inst/tests/tests.Rraw

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13657,7 +13657,7 @@ test(1962.087, dcast(DT, a ~ b, value.var = 'b'),
1365713657
test(1962.088, dcast(DT[0L, ], a ~ c, value.var = 'b'), data.table(a=numeric(), key="a")) #1215
1365813658
test(1962.089, dcast(DT, a ~ c, value.var = 'b'),
1365913659
data.table(a = c(1, 2), `2` = c(0L, 2L), `4` = c(2L, 0L), key = 'a'),
13660-
message = ".*'fun\\.aggregate' is NULL.*defaulting to length\\(\\).*\\[a, c\\].*")
13660+
warning = ".*'fun\\.aggregate' is NULL.*defaulting to length\\(\\).*\\[a, c\\].*")
1366113661

1366213662
## IDateTime.R
1366313663
x = as.IDate('2018-08-01')

man/dcast.data.table.Rd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
\arguments{
1818
\item{data}{ A \code{data.table}.}
1919
\item{formula}{A formula of the form LHS ~ RHS to cast, see Details.}
20-
\item{fun.aggregate}{Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to \code{length} with a message.
20+
\item{fun.aggregate}{Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to \code{length} with a warning of class 'dt_missing_fun_aggregate_warning'.
2121

2222
To use multiple aggregation functions, pass a \code{list}; see Examples. }
2323
\item{sep}{Character vector of length 1, indicating the separating character in variable names generated during casting. Default is \code{_} for backwards compatibility. }

0 commit comments

Comments
 (0)