Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,21 @@

18. `fwrite` now allows `dec` to be the same as `sep` for edge cases where only one will be written, e.g. 0-row or 1-column tables. [#7227](https://github.com/Rdatatable/data.table/issues/7227). Thanks @MichaelChirico for the report and @venom1204 for the fix.

19. Using `by=` or `keyby=` with a simple numeric or character vector in `j` (e.g. `DT[, 1:2, by=grp]`) used to silently ignore the grouping argument. This now issues a warning to alert the user that grouping is not applied in this syntax and guides them to use the `.SD` idiom instead. [#5397](https://github.com/Rdatatable/data.table/issues/5397). Thanks to @mcol for the report and @venom1204 for the fix.

```r
DT = data.table(a=1:4, grp=c(1,1,2,2))
DT[, 1, by = grp]
# a
# <int>
# 1: 1
# 2: 2
# 3: 3
# 4: 4
# Warning message:
# `by` or `keyby` is ignored when `j` is a numeric vector...
```

### NOTES

1. The following in-progress deprecations have proceeded:
Expand Down
13 changes: 13 additions & 0 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,13 @@ replace_dot_alias = function(e) {
if (!length(j) && !notj) return( null.data.table() )
if (is.factor(j)) j = as.character(j) # fix for FR: #358
if (is.character(j)) {
if (!missingby) {
warning(
"`by` or `keyby` is ignored when `j` is a character vector used for column selection. ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not use backticks in warnings (only in markdown), please remove.
also please use one long character string (easier to translate), instead of breaking into several strings/lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly it's not so simple:

l = make_linter_from_xpath(
  "
    //SYMBOL_FUNCTION_CALL[
      starts-with(text(), 'stop')
      or starts-with(text(), 'warning')
      or starts-with(text(), 'message')
      or starts-with(text(), 'packageStartupMessage')
    ]
      /parent::expr[
        following-sibling::expr
        //STR_CONST[contains(text(), '`')]
      ]
  ",
  'message with backtick'
)
lint_dir("R", l())

Gives hits

data.table.R:315:9: warning: [l] message with backtick
        stopf("Invalid use of `:=` inside `{}`. `:=` must be the only expression inside `{}` when used in `j`. Instead of: DT[{tmp1 <- ...; tmp2 <- ...; someCol := tmp1 * tmp2}], Use: DT[, someCol := {tmp1 <- ...; tmp2 <- ...; tmp1 * tmp2}]")
        ^~~~~
data.table.R:724:31: warning: [l] message with backtick
      if (jsub %iscall% ":=") stopf("`:=` is only supported under with=TRUE, see ?`:=`.")
                              ^~~~~
data.table.R:2671:40: warning: [l] message with backtick
  if (identical(name, quote(`*tmp*`))) stopf("setalloccol attempting to modify `*tmp*`")
                                       ^~~~~
data.table.R:2887:3: warning: [l] message with backtick
  stopf('Check that is.data.table(DT) == TRUE. Otherwise, `:=` is defined for use in j, once only and in particular ways. See help(":=", "data.table"). A common reason for this error is allocating a new column in `j` and using `<-` instead of `:=`; e.g., `DT[, new_col <- 1]` should be `DT[, new_col := 1]`. Another is using `:=` in a multi-statement `{...}` block; please use `:=` as the only statement in `j`.', class="dt_invalid_let_error")
  ^~~~~
data.table.R:3241:5: warning: [l] message with backtick
    stopf("It looks like you re-used `:=` in argument %d a functional assignment call -- use `=` instead: %s(col1=val1, col2=val2, ...)", jj-1L, call_name)
    ^~~~~
foverlaps.R:3:47: warning: [l] message with backtick
  if (!is.data.table(y) || !is.data.table(x)) stopf("y and x must both be data.tables. Use `setDT()` to convert list/data.frames to data.tables by reference or as.data.table() to convert to data.tables by copying.")
                                              ^~~~~
groupingsets.R:72:5: warning: [l] message with backtick
    stopf("When using `id=TRUE` the 'x' data.table must not have a column named 'grouping'.")
    ^~~~~
groupingsets.R:76:5: warning: [l] message with backtick
    warningf("'sets' contains a duplicate (i.e., equivalent up to sorting) element at index %d; as such, there will be duplicate rows in the output -- note that grouping by A,B and B,A will produce the same aggregations. Use `sets=unique(lapply(sets, sort))` to eliminate duplicates.", idx)
    ^~~~~~~~
groupingsets.R:121:5: warning: [l] message with backtick
    stopf("When using `id=TRUE` the 'j' expression must not evaluate to a column named 'grouping'.")
    ^~~~~
groupingsets.R:123:5: warning: [l] message with backtick
    stopf("There exists duplicated column names in the results, ensure the column passed/evaluated in `j` and those in `by` are not overlapping.")
    ^~~~~
merge.R:32:5: warning: [l] message with backtick
    stopf("`by.x` and `by.y` must be of same length.")
    ^~~~~
merge.R:34:5: warning: [l] message with backtick
    warningf("Supplied both `by` and `by.x`/`by.y`. `by` argument will be ignored.")
    ^~~~~~~~
merge.R:37:7: warning: [l] message with backtick
      stopf("A non-empty vector of column names is required for `by.x` and `by.y`.", class="dt_invalid_input_error")
      ^~~~~
merge.R:39:7: warning: [l] message with backtick
      stopf("The following columns listed in `%s` are missing from %s: %s", "by.x", "x", brackify(by.x[!idx]))
      ^~~~~
merge.R:42:7: warning: [l] message with backtick
      stopf("The following columns listed in `%s` are missing from %s: %s", "by.y", "y", brackify(by.y[!idx]))
      ^~~~~
merge.R:54:7: warning: [l] message with backtick
      stopf("A non-empty vector of column names for `by` is required.")
      ^~~~~
merge.R:56:7: warning: [l] message with backtick
      stopf("The following columns listed in `%s` are missing from %s: %s", "by", "x", brackify(by[!idx]))
      ^~~~~
merge.R:59:7: warning: [l] message with backtick
      stopf("The following columns listed in `%s` are missing from %s: %s", "by", "y", brackify(by[!idx]))
      ^~~~~
rowwiseDT.R:4:5: warning: [l] message with backtick
    stopf("Must provide at least one column (use `name=`). See ?rowwiseDT for details")
    ^~~~~
setops.R:197:7: warning: [l] message with backtick
      warningf("Argument 'tolerance' was forced to lowest accepted value `sqrt(.Machine$double.eps)` from provided %s", format(tolerance, scientific=FALSE))
      ^~~~~~~~
xts.R:11:33: warning: [l] message with backtick
  if (index_nm %chin% names(x)) stopf("Input xts object should not have '%s' column because it would result in duplicate column names. Rename '%s' column in xts or use `keep.rownames` to change the index column name.", index_nm, index_nm)
                                ^~~~~
xts.R:21:36: warning: [l] message with backtick
  if (!xts::is.timeBased(x[[1L]])) stopf("data.table must have a time based column in first position, use `setcolorder` function to change the order, or see ?timeBased for supported types")
                                   ^~~~~

Someday, we should make a guide around message styling (how to refer to functions, arguments, R values, etc) and try to enforce it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, thanks

"Perhaps you intended to use `.SD`? For example: DT[, .SD[, ", deparse(jsub), "], by = ...]",
call. = FALSE
)
}
if (notj) {
if (anyNA(idx <- chmatch(j, names_x)))
warningf(ngettext(sum(is.na(idx)), "column not removed because not found: %s", "columns not removed because not found: %s"),
Expand All @@ -762,6 +769,12 @@ replace_dot_alias = function(e) {
# else the NA in ansvals are for join inherited scope (test 1973), and NA could be in irows from join and data in i should be returned (test 1977)
# in both cases leave to the R-level subsetting of i and x together further below
} else if (is.numeric(j)) {
if (!missingby) {
warning(
"`by` or `keyby` is ignored when `j` is a numeric vector used for column selection. ", "Perhaps you intended to use `.SD`? For example: DT[, .SD[, ", deparse(jsub), "], by = ...]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as above? can this duplication be avoided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried implementing it .

call. = FALSE
)
}
j = as.integer(j)
if (any(w <- (j>ncol(x)))) stopf("Item %d of j is %d which is outside the column number range [1,ncol=%d]", idx <- which.first(w), j[idx], ncol(x))
j = j[j!=0L]
Expand Down
9 changes: 9 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21620,3 +21620,12 @@ local({
test(2338.9, {fwrite(dd, f, forceDecimal=FALSE); fread(f)}, di)
})

# 5397 - keyby/key ignored if numeric indices used in j
DT = data.table(a=1:4, b=5:8, g=c(1,1,2,2))
test(2339.1, DT[, 1:2, by=g], DT[, 1:2], warning="`by` or `keyby` is ignored")
test(2339.2, DT[, 2:1, keyby=g], DT[, 2:1], warning="`by` or `keyby` is ignored")
test(2339.3, DT[, c("b", "a"), by=g, with=FALSE], DT[, c("b", "a")], warning="`by` or `keyby` is ignored")
expected_sd = data.table(g=c(1,1,2,2), a=1:4, b=5:8)
test(2339.4, DT[, .SD[, 1:2], by=g], expected_sd)
expected_single_int = data.table(g=c(1,2), V1=c(1,1))
test(2339.5, DT[, 1, by=g], expected_single_int)
2 changes: 1 addition & 1 deletion man/data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac

See \href{../doc/datatable-intro.html}{\code{vignette("datatable-intro")}} and \code{example(data.table)}.}

\item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts:
\item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). \emph{Note that \code{by} and \code{keyby} are ignored when \code{j} is a character or numeric vector used for selecting columns (i.e., when the internal \code{with=FALSE} is triggered).} The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts:

\itemize{
\item A single unquoted column name: e.g., \code{DT[, .(sa=sum(a)), by=x]}
Expand Down
Loading