Skip to content

Commit 7f1b3bb

Browse files
new helper function fctr to retain order of input in levels, closes #… (#4843)
* new helper function fctr to retain order of input in levels, closes #4837 * wording * fctr gets option to reverse order of levels * add sort argument to fctr() * augment NEWS * more polish * indent code block * stop->stopf * emphasize sort happens first * polish Rd grammar * test #ing --------- Co-authored-by: Michael Chirico <[email protected]>
1 parent d2b3ea7 commit 7f1b3bb

File tree

5 files changed

+73
-0
lines changed

5 files changed

+73
-0
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ export(.Last.updated)
6161
export(fcoalesce)
6262
export(substitute2)
6363
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
64+
export(fctr)
6465

6566
S3method("[", data.table)
6667
S3method("[<-", data.table)

NEWS.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,28 @@
1818

1919
6. `between()` gains the argument `ignore_tzone=FALSE`. Normally, a difference in time zone between `lower` and `upper` will produce an error, and a difference in time zone between `x` and either of the others will produce a message. Setting `ignore_tzone=TRUE` bypasses the checks, allowing both comparisons to proceed without error or message about time zones.
2020

21+
7. New helper function `fctr` as an extended version of `factor()`, [#4837](https://github.com/Rdatatable/data.table/issues/4837). Most notably, it supports (1) retaining input level ordering by default, i.e. `levels=unique(x)` as opposed to `levels = sort(unique(x))`; (2) `rev=` to reverse the levels; and (3) `sort=` to allow more feature parity with `factor()`. The choice of default is motivated by convenience in the common case when order of elements needs be preserved, for example when using `dcast` or adding a legend to a plot. This also matches the default sort ordering of groups in `by=`.
22+
23+
```r
24+
d = data.table(id1=rep(1:2, each=3L), id2=letters[c(4:3,5L,3:5)], v1=1:6)
25+
dcast(d, id1 ~ factor(id2))
26+
# id1 c d e
27+
# 1: 1 2 1 3
28+
# 2: 2 4 5 6
29+
dcast(d, id1 ~ fctr(id2))
30+
# id1 d c e
31+
# 1: 1 1 2 3
32+
# 2: 2 5 4 6
33+
dcast(d, id1 ~ fctr(id2, sort=TRUE)) # same as factor()
34+
# id1 c d e
35+
# 1: 1 2 1 3
36+
# 2: 2 4 5 6
37+
dcast(d, id1 ~ fctr(id2, rev=TRUE))
38+
# id1 e c d
39+
# 1: 1 3 2 1
40+
# 2: 2 6 4 5
41+
```
42+
2143
### BUG FIXES
2244

2345
1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.

R/utils.R

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,17 @@ rss = function() { #5515 #5517
219219
# nocov end
220220
}
221221

222+
# convert char to factor retaining order #4837
223+
fctr = function(x, levels=unique(x), ..., sort=FALSE, rev=FALSE) {
224+
if (!isTRUEorFALSE(sort))
225+
stopf("argument 'sort' must be TRUE or FALSE")
226+
if (!isTRUEorFALSE(rev))
227+
stopf("argument 'rev' must be TRUE or FALSE")
228+
if (sort) levels = sort(levels)
229+
if (rev) levels = rev(levels)
230+
factor(x, levels=levels, ...)
231+
}
232+
222233
formula_vars = function(f, x) { # .formula2varlist is not API and seems to have appeared after R-4.2, #6841
223234
terms <- terms(f)
224235
setNames(

inst/tests/tests.Rraw

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21237,3 +21237,12 @@ test(2321.4, names(DT), names(data.frame(a=integer(), b=matrix(1L, nrow=0L, ncol
2123721237
## but respect named column vectors
2123821238
test(2321.5, DT <- data.table(a=1:2, cbind(b=3:4)), data.table(a=1:2, b=3:4))
2123921239
test(2321.6, names(DT), names(data.frame(a=1:2, cbind(b=3:4))))
21240+
21241+
# New fctr() helper: like factor() but retaining order by default #4837
21242+
test(2322.01, levels(fctr(c("b","a","c"))), c("b","a","c"))
21243+
test(2322.02, levels(fctr(c(3,1,2))), c("3","1","2"))
21244+
test(2322.11, levels(fctr(c("b","a","c"), rev=TRUE)), c("c","a","b"))
21245+
test(2322.12, levels(fctr(c("b","a","c"), rev=NA)), error="TRUE or FALSE")
21246+
test(2322.21, levels(fctr(c("b","a","c"), sort=TRUE)), c("a","b","c"))
21247+
test(2322.22, levels(fctr(c("b","a","c"), sort=NA)), error="TRUE or FALSE")
21248+
test(2322.31, levels(fctr(c("b","a","c"), rev=TRUE, sort=TRUE)), c("c","b","a"))

man/fctr.Rd

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
\name{fctr}
2+
\alias{fctr}
3+
\alias{factor}
4+
\title{Create a factor retaining original ordering}
5+
\description{
6+
Creates a code{\link[base:factor]{factor}}.
7+
8+
By default, the output will have its levels in the original order, i.e., \code{levels = unique(x)}, as opposed to \code{factor}'s default where \code{levels = sort(unique(x))}.
9+
}
10+
\usage{
11+
fctr(x, levels=unique(x), ..., sort=FALSE, rev=FALSE)
12+
}
13+
\arguments{
14+
\item{x}{ Object to be turned into a factor. }
15+
\item{levels}{ Levels for the new factor; \code{unique(x)} by default. }
16+
\item{\dots}{ Other arguments passed to code{\link[base:factor]{factor}}. }
17+
\item{sort}{ Logical, default \code{FALSE}. Should \code{levels} be sorted? }
18+
\item{rev}{ Logical, default \code{FALSE}. Should \code{levels} be reversed? Applied \emph{after} \code{sort}. }
19+
}
20+
\value{
21+
Factor vector having levels ordered according to the order of elements in input and arguments \code{sort}, \code{rev}.
22+
}
23+
\examples{
24+
levels(factor(c("b","a","c")))
25+
levels(fctr(c("b","a","c")))
26+
levels(fctr(c("b","a","c"), rev=TRUE))
27+
levels(fctr(c("b","a","c"), sort=TRUE))
28+
levels(fctr(c("b","a","c"), sort=TRUE, rev=TRUE))
29+
}
30+
\keyword{ data }

0 commit comments

Comments
 (0)