Skip to content
Merged
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ export(.Last.updated)
export(fcoalesce)
export(substitute2)
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
export(fctr)

S3method("[", data.table)
S3method("[<-", data.table)
Expand Down
22 changes: 22 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,28 @@

6. `between()` gains the argument `ignore_tzone=FALSE`. Normally, a difference in time zone between `lower` and `upper` will produce an error, and a difference in time zone between `x` and either of the others will produce a message. Setting `ignore_tzone=TRUE` bypasses the checks, allowing both comparisons to proceed without error or message about time zones.

7. New helper function `fctr` as an extended version of `factor()`, [#4837](https://github.com/Rdatatable/data.table/issues/4837). Most notably, it supports (1) retaining input level ordering by default, i.e. `levels=unique(x)` as opposed to `levels = sort(unique(x))`; (2) `rev=` to reverse the levels; and (3) `sort=` to allow more feature parity with `factor()`. The choice of default is motivated by convenience in the common case when order of elements needs be preserved, for example when using `dcast` or adding a legend to a plot. This also matches the default sort ordering of groups in `by=`.

```r
d = data.table(id1=rep(1:2, each=3L), id2=letters[c(4:3,5L,3:5)], v1=1:6)
dcast(d, id1 ~ factor(id2))
# id1 c d e
# 1: 1 2 1 3
# 2: 2 4 5 6
dcast(d, id1 ~ fctr(id2))
# id1 d c e
# 1: 1 1 2 3
# 2: 2 5 4 6
dcast(d, id1 ~ fctr(id2, sort=TRUE)) # same as factor()
# id1 c d e
# 1: 1 2 1 3
# 2: 2 4 5 6
dcast(d, id1 ~ fctr(id2, rev=TRUE))
# id1 e c d
# 1: 1 3 2 1
# 2: 2 6 4 5
```

### BUG FIXES

1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.
Expand Down
11 changes: 11 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,17 @@ rss = function() { #5515 #5517
# nocov end
}

# convert char to factor retaining order #4837
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
if (!isTRUEorFALSE(sort))
stopf("argument 'sort' must be TRUE or FALSE")
if (!isTRUEorFALSE(rev))
stopf("argument 'rev' must be TRUE or FALSE")
if (sort) levels = sort(levels)
if (rev) levels = rev(levels)
factor(x, levels=levels, ...)
}

formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
terms <- terms(f)
setNames(
Expand Down
9 changes: 9 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21237,3 +21237,12 @@ test(2321.4, names(DT), names(data.frame(a=integer(), b=matrix(1L, nrow=0L, ncol
## but respect named column vectors
test(2321.5, DT <- data.table(a=1:2, cbind(b=3:4)), data.table(a=1:2, b=3:4))
test(2321.6, names(DT), names(data.frame(a=1:2, cbind(b=3:4))))

# New fctr() helper: like factor() but retaining order by default #4837
test(2240.01, levels(fctr(c("b","a","c"))), c("b","a","c"))
test(2240.02, levels(fctr(c(3,1,2))), c("3","1","2"))
test(2240.11, levels(fctr(c("b","a","c"), rev=TRUE)), c("c","a","b"))
test(2240.12, levels(fctr(c("b","a","c"), rev=NA)), error="TRUE or FALSE")
test(2240.21, levels(fctr(c("b","a","c"), sort=TRUE)), c("a","b","c"))
test(2240.22, levels(fctr(c("b","a","c"), sort=NA)), error="TRUE or FALSE")
test(2240.31, levels(fctr(c("b","a","c"), rev=TRUE, sort=TRUE)), c("c","b","a"))
30 changes: 30 additions & 0 deletions man/fctr.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
\name{fctr}
\alias{fctr}
\alias{factor}
\title{Create a factor retaining original ordering}
\description{
Creates a code{\link[base:factor]{factor}}.

By default, the output will have its levels in the original order, i.e., \code{levels = unique(x)}, as opposed to \code{factor}'s default where \code{levels = sort(unique(x))}.
}
\usage{
fctr(x, levels=unique(x), ..., sort=FALSE, rev=FALSE)
}
\arguments{
\item{x}{ Object to be turned into a factor. }
\item{levels}{ Levels for the new factor; \code{unique(x)} by default. }
\item{\dots}{ Other arguments passed to code{\link[base:factor]{factor}}. }
\item{sort}{ Logical, default \code{FALSE}. Should \code{levels} be sorted? }
\item{rev}{ Logical, default \code{FALSE}. Should \code{levels} be reversed? Applied \emph{after} \code{sort}. }
}
\value{
Factor vector having levels ordered according to the order of elements in input and arguments \code{sort}, \code{rev}.
}
\examples{
levels(factor(c("b","a","c")))
levels(fctr(c("b","a","c")))
levels(fctr(c("b","a","c"), rev=TRUE))
levels(fctr(c("b","a","c"), sort=TRUE))
levels(fctr(c("b","a","c"), sort=TRUE, rev=TRUE))
}
\keyword{ data }
Loading