Skip to content

Commit 5c1d1d7

Browse files
authored
Merge branch 'master' into issue6964
2 parents 77446bf + 1b3998e commit 5c1d1d7

File tree

15 files changed

+291
-78
lines changed

15 files changed

+291
-78
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ export(.Last.updated)
6161
export(fcoalesce)
6262
export(substitute2)
6363
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
64+
export(fctr)
6465

6566
S3method("[", data.table)
6667
S3method("[<-", data.table)

NEWS.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,28 @@
1818

1919
6. `between()` gains the argument `ignore_tzone=FALSE`. Normally, a difference in time zone between `lower` and `upper` will produce an error, and a difference in time zone between `x` and either of the others will produce a message. Setting `ignore_tzone=TRUE` bypasses the checks, allowing both comparisons to proceed without error or message about time zones.
2020

21+
7. New helper function `fctr` as an extended version of `factor()`, [#4837](https://github.com/Rdatatable/data.table/issues/4837). Most notably, it supports (1) retaining input level ordering by default, i.e. `levels=unique(x)` as opposed to `levels = sort(unique(x))`; (2) `rev=` to reverse the levels; and (3) `sort=` to allow more feature parity with `factor()`. The choice of default is motivated by convenience in the common case when order of elements needs be preserved, for example when using `dcast` or adding a legend to a plot. This also matches the default sort ordering of groups in `by=`.
22+
23+
```r
24+
d = data.table(id1=rep(1:2, each=3L), id2=letters[c(4:3,5L,3:5)], v1=1:6)
25+
dcast(d, id1 ~ factor(id2))
26+
# id1 c d e
27+
# 1: 1 2 1 3
28+
# 2: 2 4 5 6
29+
dcast(d, id1 ~ fctr(id2))
30+
# id1 d c e
31+
# 1: 1 1 2 3
32+
# 2: 2 5 4 6
33+
dcast(d, id1 ~ fctr(id2, sort=TRUE)) # same as factor()
34+
# id1 c d e
35+
# 1: 1 2 1 3
36+
# 2: 2 4 5 6
37+
dcast(d, id1 ~ fctr(id2, rev=TRUE))
38+
# id1 e c d
39+
# 1: 1 3 2 1
40+
# 2: 2 6 4 5
41+
```
42+
2143
### BUG FIXES
2244

2345
1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.
@@ -48,7 +70,9 @@
4870
4971
14. `data.table()` function is now more aligned with `data.frame()` with respect to the names of the output when one of its inputs is a single-column matrix object, [#4124](https://github.com/Rdatatable/data.table/issues/4124). Thanks @PavoDive for the report and @jangorecki for the PR.
5072
51-
15. `cube()`, `rollup()`, and `groupingsets()` no longer produce a spurious warning when using `min()` or `max()` as aggregations, [#6964](https://github.com/Rdatatable/data.table/issues/6964). Thanks @ferenci-tamas for the report and @venom1204 for the fix.
73+
15. Including an `ITime` object as a named input to `data.frame()` respects the provided name, i.e. `data.frame(a = as.ITime(...))` will have column `a`, [#4673](https://github.com/Rdatatable/data.table/issues/4673). Thanks @shrektan for the report and @MichaelChirico for the fix.
74+
75+
16. `cube()`, `rollup()`, and `groupingsets()` no longer produce a spurious warning when using `min()` or `max()` as aggregations, [#6964](https://github.com/Rdatatable/data.table/issues/6964). Thanks @ferenci-tamas for the report and @venom1204 for the fix.
5276
5377
### NOTES
5478

R/IDateTime.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ as.character.ITime = format.ITime = function(x, ...) {
209209
res
210210
}
211211

212-
as.data.frame.ITime = function(x, ...) {
212+
as.data.frame.ITime = function(x, ..., optional=FALSE) {
213213
# This method is just for ggplot2, #1713
214214
# Avoids the error "cannot coerce class '"ITime"' into a data.frame", but for some reason
215215
# ggplot2 doesn't seem to call the print method to get axis labels, so still prints integers.
@@ -219,7 +219,8 @@ as.data.frame.ITime = function(x, ...) {
219219
# ans = list(as.POSIXct(x,tzone="")) # ggplot2 gives "Error: Discrete value supplied to continuous scale"
220220
setattr(ans, "class", "data.frame")
221221
setattr(ans, "row.names", .set_row_names(length(x)))
222-
setattr(ans, "names", "V1")
222+
# require 'optional' support for passing back to e.g. data.frame() without overriding names there
223+
if (!optional) setattr(ans, "names", "V1")
223224
ans
224225
}
225226

R/bmerge.R

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@ coerce_col = function(dt, col, from_type, to_type, from_name, to_name, from_deta
2727

2828
bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbose)
2929
{
30+
if (roll != 0.0 && length(icols)) {
31+
last_x_idx = tail(xcols, 1L)
32+
last_i_idx = tail(icols, 1L)
33+
if (is.factor(x[[last_x_idx]]) || is.factor(i[[last_i_idx]]))
34+
stopf("Attempting roll join on factor column when joining x.%s to i.%s. Only integer, double or character columns may be roll joined.", names(x)[last_x_idx], names(i)[last_i_idx])
35+
}
36+
3037
callersi = i
3138
i = shallow(i)
3239
# Just before the call to bmerge() in [.data.table there is a shallow() copy of i to prevent coercions here
@@ -64,9 +71,8 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
6471
iname = paste0("i.", names(i)[icol])
6572
if (!x_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, x_merge_type)
6673
if (!i_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, i_merge_type)
74+
# we check factors first because they might have different levels
6775
if (x_merge_type=="factor" || i_merge_type=="factor") {
68-
if (roll!=0.0 && a==length(icols))
69-
stopf("Attempting roll join on factor column when joining %s to %s. Only integer, double or character columns may be roll joined.", xname, iname)
7076
if (x_merge_type=="factor" && i_merge_type=="factor") {
7177
if (verbose) catf("Matching %s factor levels to %s factor levels.\n", iname, xname)
7278
set(i, j=icol, value=chmatch(levels(i[[icol]]), levels(x[[xcol]]), nomatch=0L)[i[[icol]]]) # nomatch=0L otherwise a level that is missing would match to NA values
@@ -86,7 +92,6 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
8692
}
8793
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, x_merge_type, iname, i_merge_type)
8894
}
89-
# we check factors first to cater for the case when trying to do rolling joins on factors
9095
if (x_merge_type == i_merge_type) {
9196
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, x_merge_type, xname)
9297
next

R/data.table.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ replace_dot_alias = function(e) {
221221
}
222222
return(x)
223223
}
224-
if (!mult %chin% c("first","last","all")) stopf("mult argument can only be 'first', 'last' or 'all'")
224+
if (!mult %chin% c("first", "last", "all")) stopf("mult argument can only be 'first', 'last' or 'all'")
225225
missingroll = missing(roll)
226226
if (length(roll)!=1L || is.na(roll)) stopf("roll must be a single TRUE, FALSE, positive/negative integer/double including +Inf and -Inf or 'nearest'")
227227
if (is.character(roll)) {

R/test.data.table.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -229,8 +229,9 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F
229229
# notranslate start
230230
cat("\n", date(), # so we can tell exactly when these tests ran on CRAN to double-check the result is up to date
231231
" endian==", .Platform$endian,
232-
", sizeof(long double)==", .Machine$sizeof.longdouble,
233-
", longdouble.digits==", .Machine$longdouble.digits, # 64 normally, 53 for example under valgrind where some high accuracy tests need turning off, #4639
232+
", sizeof(long double)==", format(.Machine$sizeof.longdouble),
233+
", capabilities('long.double')==", capabilities('long.double'), # almost certainly overkill, but that's OK; see #6154
234+
", longdouble.digits==", format(.Machine$longdouble.digits), # 64 normally, 53 for example under valgrind where some high accuracy tests need turning off, #4639
234235
", sizeof(pointer)==", .Machine$sizeof.pointer,
235236
", TZ==", if (is.na(tz)) "unset" else paste0("'",tz,"'"),
236237
", Sys.timezone()=='", suppressWarnings(Sys.timezone()), "'",

R/utils.R

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,17 @@ rss = function() { #5515 #5517
219219
# nocov end
220220
}
221221

222+
# convert char to factor retaining order #4837
223+
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
224+
if (!isTRUEorFALSE(sort))
225+
stopf("argument 'sort' must be TRUE or FALSE")
226+
if (!isTRUEorFALSE(rev))
227+
stopf("argument 'rev' must be TRUE or FALSE")
228+
if (sort) levels = sort(levels)
229+
if (rev) levels = rev(levels)
230+
factor(x, levels=levels, ...)
231+
}
232+
222233
formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
223234
terms <- terms(f)
224235
setNames(

inst/tests/other.Rraw

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ if (exists("test.data.table",.GlobalEnv,inherits=FALSE) ||
1818
}
1919

2020
test = data.table:::test
21+
coerceAs = data.table:::coerceAs
2122
INT = data.table:::INT
2223

2324
if (anyDuplicated(pkgs)) stop("Packages defined to be loaded for integration tests in 'inst/tests/other.Rraw' contains duplicates.")

inst/tests/tests.Rraw

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21237,3 +21237,18 @@ test(2321.4, names(DT), names(data.frame(a=integer(), b=matrix(1L, nrow=0L, ncol
2123721237
## but respect named column vectors
2123821238
test(2321.5, DT <- data.table(a=1:2, cbind(b=3:4)), data.table(a=1:2, b=3:4))
2123921239
test(2321.6, names(DT), names(data.frame(a=1:2, cbind(b=3:4))))
21240+
21241+
# New fctr() helper: like factor() but retaining order by default #4837
21242+
test(2322.01, levels(fctr(c("b","a","c"))), c("b","a","c"))
21243+
test(2322.02, levels(fctr(c(3,1,2))), c("3","1","2"))
21244+
test(2322.11, levels(fctr(c("b","a","c"), rev=TRUE)), c("c","a","b"))
21245+
test(2322.12, levels(fctr(c("b","a","c"), rev=NA)), error="TRUE or FALSE")
21246+
test(2322.21, levels(fctr(c("b","a","c"), sort=TRUE)), c("a","b","c"))
21247+
test(2322.22, levels(fctr(c("b","a","c"), sort=NA)), error="TRUE or FALSE")
21248+
test(2322.31, levels(fctr(c("b","a","c"), rev=TRUE, sort=TRUE)), c("c","b","a"))
21249+
21250+
# data.frame() uses provided names of ITime inputs
21251+
it <- as.ITime('00:00:00')
21252+
test(2323.1, names(data.frame(COL = it)), "COL")
21253+
test(2323.2, names(data.frame(b = 1, COL = it)), c("b", "COL"))
21254+
test(2323.3, names(as.data.frame(it, optional=TRUE)), NULL)

man/fctr.Rd

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
\name{fctr}
2+
\alias{fctr}
3+
\alias{factor}
4+
\title{Create a factor retaining original ordering}
5+
\description{
6+
Creates a code{\link[base:factor]{factor}}.
7+
8+
By default, the output will have its levels in the original order, i.e., \code{levels = unique(x)}, as opposed to \code{factor}'s default where \code{levels = sort(unique(x))}.
9+
}
10+
\usage{
11+
fctr(x, levels=unique(x), ..., sort=FALSE, rev=FALSE)
12+
}
13+
\arguments{
14+
\item{x}{ Object to be turned into a factor. }
15+
\item{levels}{ Levels for the new factor; \code{unique(x)} by default. }
16+
\item{\dots}{ Other arguments passed to code{\link[base:factor]{factor}}. }
17+
\item{sort}{ Logical, default \code{FALSE}. Should \code{levels} be sorted? }
18+
\item{rev}{ Logical, default \code{FALSE}. Should \code{levels} be reversed? Applied \emph{after} \code{sort}. }
19+
}
20+
\value{
21+
Factor vector having levels ordered according to the order of elements in input and arguments \code{sort}, \code{rev}.
22+
}
23+
\examples{
24+
levels(factor(c("b","a","c")))
25+
levels(fctr(c("b","a","c")))
26+
levels(fctr(c("b","a","c"), rev=TRUE))
27+
levels(fctr(c("b","a","c"), sort=TRUE))
28+
levels(fctr(c("b","a","c"), sort=TRUE, rev=TRUE))
29+
}
30+
\keyword{ data }

0 commit comments

Comments
 (0)