Skip to content

Commit d2fa90f

Browse files
Merge branch 'master' into between-int64
2 parents 53bae8b + 7f14f2e commit d2fa90f

File tree

14 files changed

+123
-39
lines changed

14 files changed

+123
-39
lines changed

NAMESPACE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,7 @@ if (getRversion() >= "3.6.0") {
153153

154154
# IDateTime support:
155155
export(as.IDate,as.ITime,IDateTime)
156-
export(second,minute,hour,yday,wday,mday,week,isoweek,month,quarter,year,yearmon,yearqtr)
156+
export(second,minute,hour,yday,wday,mday,week,isoweek,isoyear,month,quarter,year,yearmon,yearqtr)
157157

158158
S3method("[", ITime)
159159
S3method("+", IDate)

NEWS.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,16 @@
1010

1111
### NEW FEATURES
1212

13-
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
13+
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also matches `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
14+
15+
```r
16+
DT = data.table(a=c(1L, 2L, 1L), b=c(3L, 1L, 2L))
17+
sort_by(DT, ~a + b)
18+
# a b
19+
# 1: 1 2
20+
# 2: 1 3
21+
# 3: 2 1
22+
```
1423

1524
2. `melt()` now supports using `patterns()` with `id.vars`, [#6867](https://github.com/Rdatatable/data.table/issues/6867). Thanks to Toby Dylan Hocking for the suggestion and PR.
1625

@@ -56,6 +65,10 @@
5665

5766
13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
5867

68+
14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
69+
70+
15. New function `isoyear()` has been implemented as a complement to `isoweek()`, returning the ISO 8601 year corresponding to a given date, [#7154](https://github.com/Rdatatable/data.table/issues/7154). Thanks to @ben-schwen and @MichaelChirico for the suggestion and @venom1204 for the implementation.
71+
5972
### BUG FIXES
6073

6174
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.
@@ -86,6 +99,8 @@
8699
87100
14. Filling columns of class Date with POSIXct (and vice versa) using `shift()` now yields a clear, informative error message specifying the class mismatch, [#5218](https://github.com/Rdatatable/data.table/issues/5218). Thanks @ashbaldry for the report and @ben-schwen for the fix.
88101
102+
15. `split.data.table()` output list elements retain the S3 class of the generating data.table, e.g. in `l=split(x, ...)` if `x` has class `my_class`, so will `l[[1]]` and so on, [#7105](https://github.com/Rdatatable/data.table/issues/7105). Thanks @m-muecke for the bug report and @MichaelChirico for the fix.
103+
89104
### NOTES
90105
91106
1. The following in-progress deprecations have proceeded:

R/IDateTime.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,7 @@ isoweek = function(x) as.integer(format(as.IDate(x), "%V"))
355355
# nearest_thurs = as.IDate(7L * (as.integer(x + 3L) %/% 7L))
356356
# year_start = as.IDate(format(nearest_thurs, '%Y-01-01'))
357357
# 1L + (nearest_thurs - year_start) %/% 7L
358-
358+
isoyear = function(x) as.integer(format(as.IDate(x), "%G"))
359359

360360
month = function(x) convertDate(as.IDate(x), "month")
361361
quarter = function(x) convertDate(as.IDate(x), "quarter")

R/data.table.R

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2491,7 +2491,7 @@ Ops.data.table = function(e1, e2 = NULL)
24912491
}
24922492

24932493
split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, ..., verbose = getOption("datatable.verbose")) {
2494-
if (!is.data.table(x)) stopf("x argument must be a data.table")
2494+
if (!is.data.table(x)) internal_error("x argument to split.data.table must be a data.table") # nocov
24952495
stopifnot(is.logical(drop), is.logical(sorted), is.logical(keep.by), is.logical(flatten))
24962496
# split data.frame way, using `f` and not `by` argument
24972497
if (!missing(f)) {
@@ -2566,8 +2566,11 @@ split.data.table = function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TR
25662566
setattr(ll, "names", nm)
25672567
# handle nested split
25682568
if (flatten || length(by) == 1L) {
2569-
for (x in ll) .Call(C_unlock, x)
2570-
lapply(ll, setDT)
2569+
for (xi in ll) .Call(C_unlock, xi)
2570+
out = lapply(ll, setDT)
2571+
# TODO(#2000): just let setDT handle this
2572+
if (!identical(old_class <- class(x), c("data.table", "data.frame"))) for (xi in out) setattr(xi, "class", old_class)
2573+
out
25712574
# alloc.col could handle DT in list as done in: c9c4ff80bdd4c600b0c4eff23b207d53677176bd
25722575
} else if (length(by) > 1L) {
25732576
lapply(ll, split.data.table, drop=drop, by=by[-1L], sorted=sorted, keep.by=keep.by, flatten=flatten)

R/wrappers.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Very small (e.g. one line) R functions that just call C.
33
# One file wrappers.R to avoid creating lots of small .R files.
44

5-
fcoalesce = function(...) .Call(Ccoalesce, list(...), FALSE)
6-
setcoalesce = function(...) .Call(Ccoalesce, list(...), TRUE)
5+
fcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), FALSE, nan_is_na(nan))
6+
setcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), TRUE, nan_is_na(nan))
77

88
fifelse = function(test, yes, no, na=NA) .Call(CfifelseR, test, yes, no, na)
99
fcase = function(..., default=NA) {

inst/tests/nafill.Rraw

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,9 @@ test(3.02, setnafill(list(copy(x)), "locf", fill=0L), list(x))
114114
test(3.03, setnafill(x, "locf"), error="in-place update is supported only for list")
115115
test(3.04, nafill(letters[1:5], fill=0), error="must be numeric type, or list/data.table")
116116
test(3.05, setnafill(list(letters[1:5]), fill=0), error="must be numeric type, or list/data.table")
117-
test(3.06, nafill(x, fill=1:2), error="fill must be a vector of length 1")
118-
test(3.07, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))
117+
test(3.06, nafill(x, fill=1:2), error="fill must be a vector of length 1.*fcoalesce")
118+
test(3.07, nafill(x, "locf", fill=1:2), error="fill must be a vector of length 1.*x\\.$")
119+
test(3.08, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))
119120

120121
# colnamesInt helper
121122
dt = data.table(a=1, b=2, d=3)

inst/tests/tests.Rraw

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,6 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
7474
setfrev = data.table:::setfrev
7575
shallow = data.table:::shallow # until exported
7676
.shallow = data.table:::.shallow
77-
split.data.table = data.table:::split.data.table
7877
stopf = data.table:::stopf
7978
test = data.table:::test
8079
uniqlengths = data.table:::uniqlengths
@@ -9764,6 +9763,14 @@ test(1639.141, all(sapply(dtL, truelength) > 1000))
97649763
dt <- data.table(x = factor("a"), y = 1)
97659764
test(1639.142, x = split(dt, by = "x"), y = list(a = dt))
97669765
test(1639.143, x = split(dt, by = "y"), y = list(`1` = dt))
9766+
9767+
# retain a custom class after splitting, #7105
9768+
DT = data.table(x=letters[1:10], y=1:10, z=rnorm(10))
9769+
setattr(DT, "class", c("my_class", class(DT)))
9770+
test(1639.144, "my_class" %in% unlist(lapply(split(DT, by="x"), class)))
9771+
test(1639.145, "my_class" %in% unlist(lapply(split(DT, ~x), class)))
9772+
test(1639.146, "my_class" %in% unlist(lapply(split(DT, by=c("x", "y")), class)))
9773+
test(1639.147, "my_class" %in% unlist(lapply(split(DT, ~x+y), class)))
97679774
rm_all()
97689775

97699776
# allow x's cols (specifically x's join cols) to be referred to using 'x.' syntax
@@ -14277,7 +14284,7 @@ test(1984.25, rbindlist(list(DT[1L], DT[2L]), idcol = TRUE), data.table(.id=1:2,
1427714284
test(1984.26, setalloccol(`*tmp*`), error='setalloccol attempting to modify `*tmp*`')
1427814285
DF = as.data.frame(DT)
1427914286
test(1984.27, identical(shallow(DF), DF)) # shallow (which is not exported) works on DF from v1.14.2. identical() to force checking the selfref attribute for #5286.
14280-
test(1984.28, split.data.table(DF), error='argument must be a data.table')
14287+
# 1984.28 was a coverage test converted to 'nocov' of an internal_error instead
1428114288
test(1984.29, split(DT, by='a', f='a'), error="passing 'f' argument together with 'by' is not allowed")
1428214289
test(1984.30, split(DT), error="Either 'by' or 'f' argument must be supplied")
1428314290
setnames(DT, '.ll.tech.split')
@@ -15583,6 +15590,11 @@ test(2060.154, fcoalesce(list(x)), x)
1558315590
test(2060.155, setcoalesce(list(x)), x)
1558415591
test(2060.156, setcoalesce(list(x,y,z)), ans)
1558515592
test(2060.157, x, ans) # setcoalesce updated the first item (x) by reference
15593+
# nan parameter, #4567
15594+
test(2060.158, fcoalesce(c(NA_real_, NaN), 0, nan=NA), c(0, 0))
15595+
test(2060.159, fcoalesce(c(NA_real_, NaN), 0, nan=NaN), c(0, NaN))
15596+
test(2060.160, fcoalesce(c(NA_real_, NaN), c(1, 2), nan=NA), c(1, 2))
15597+
test(2060.161, fcoalesce(c(NA_real_, NaN), c(1, 2), nan=NaN), c(1, NaN))
1558615598
# factor of different levels
1558715599
x = factor(c('a','b',NA,NA,'b'))
1558815600
y = factor(c('b','b','a',NA,'b'))
@@ -21541,3 +21553,11 @@ f = tempfile()
2154121553
writeLines(c('a', rep('0x1.ffffp0', 10000L), '0x1.ff\x9fp0', rep('0x1.ffffp0', 20000L)), f)
2154221554
test(2334, names(fread(f)), "a")
2154321555
unlink(f)
21556+
21557+
# Tests for new isoyear() helper (complement to isoweek) #7154
21558+
test(2335.1, isoyear(as.IDate("2019-12-30")), 2020L) # End of year edge case
21559+
test(2335.2, isoyear(as.IDate("2016-01-01")), 2015L) # Start of year edge case
21560+
test(2335.3, isoyear(as.IDate("2023-08-15")), 2023L) # Normal mid-year case
21561+
test(2335.4, isoyear(as.IDate(c("2019-12-30", "2016-01-01", "2023-08-15"))),c(2020L, 2015L, 2023L))
21562+
test(2335.5, isoyear("2019-12-30"), 2020L)
21563+
test(2335.6, isoyear(as.Date("2019-12-30")), 2020L)

man/IDateTime.Rd

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
\alias{mday}
3939
\alias{week}
4040
\alias{isoweek}
41+
\alias{isoyear}
4142
\alias{month}
4243
\alias{quarter}
4344
\alias{year}
@@ -92,6 +93,7 @@ wday(x)
9293
mday(x)
9394
week(x)
9495
isoweek(x)
96+
isoyear(x)
9597
month(x)
9698
quarter(x)
9799
year(x)
@@ -187,6 +189,8 @@ which specify that the first week of the year is the one containing the first Th
187189
This convention ensures that week boundaries align consistently with year boundaries,
188190
accounting for both year transitions and varying day counts per week.
189191
192+
Similarly, \code{isoyear()} returns the ISO 8601 year corresponding to the ISO week.
193+
190194
}
191195
192196
\value{
@@ -200,7 +204,7 @@ accounting for both year transitions and varying day counts per week.
200204
\code{itime} in \code{IDate} and \code{ITime} format.
201205
202206
\code{second}, \code{minute}, \code{hour}, \code{yday}, \code{wday},
203-
\code{mday}, \code{week}, \code{month}, \code{quarter},
207+
\code{mday}, \code{week}, \code{isoweek}, \code{isoyear}, \code{month}, \code{quarter},
204208
and \code{year} return integer values
205209
for second, minute, hour, day of year, day of week,
206210
day of month, week, month, quarter, and year, respectively.
@@ -281,6 +285,17 @@ round(seqdates, "months")
281285
round(seqtimes, "hours")
282286
trunc(seqtimes, "hours")
283287

288+
# Examples for isoyear() and isoweek()
289+
d1 = as.IDate("2019-12-30")
290+
year(d1)
291+
isoweek(d1)
292+
isoyear(d1)
293+
294+
d2 = as.IDate("2016-01-01")
295+
year(d2)
296+
isoweek(d2)
297+
isoyear(d2)
298+
284299
}
285300
\keyword{utilities}
286301

man/coalesce.Rd

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@ Fill in missing values in a vector by successively pulling from candidate vector
77
Written in C, and multithreaded for numeric and factor types.
88
}
99
\usage{
10-
fcoalesce(\dots)
10+
fcoalesce(\dots, nan=NA)
1111
}
1212
\arguments{
1313
\item{\dots}{ A set of same-class vectors. These vectors can be supplied as separate arguments or as a single plain list, data.table or data.frame, see examples. }
14+
\item{nan}{ Either \code{NaN} or \code{NA}; if \code{NaN}, then \code{NaN} is treated as distinct from \code{NA}, otherwise they are treated the same during replacement (double columns only). }
1415
}
1516
\details{
1617
Factor type is supported only when the factor levels of each item are equal.
@@ -22,7 +23,7 @@ Atomic vector of the same type and length as the first vector, having \code{NA}
2223
If the first item is \code{NULL}, the result is \code{NULL}.
2324
}
2425
\seealso{
25-
\code{\link{fifelse}}
26+
\code{\link{fifelse}}, \code{\link{nafill}}
2627
}
2728
\examples{
2829
x = c(11L, NA, 13L, NA, 15L, NA)
@@ -31,6 +32,9 @@ z = c(11L, NA, 1L, 14L, NA, NA)
3132
fcoalesce(x, y, z)
3233
fcoalesce(list(x,y,z)) # same
3334
fcoalesce(x, list(y,z)) # same
35+
x_num = c(NaN, NA_real_, 3.0)
36+
fcoalesce(x_num, 1) # default: NaN treated as missing -> c(1, 1, 3)
37+
fcoalesce(x_num, 1, nan=NaN) # preserve NaN -> c(NaN, 1, 3)
3438
}
3539
\keyword{ data }
3640

src/coalesce.c

Lines changed: 40 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@
66
- The replacement of NAs with non-NA values from subsequent vectors
77
- The conditional checks within parallelized loops
88
*/
9-
SEXP coalesce(SEXP x, SEXP inplaceArg) {
9+
SEXP coalesce(SEXP x, SEXP inplaceArg, SEXP nan_is_na_arg) {
1010
if (TYPEOF(x)!=VECSXP) internal_error(__func__, "input is list(...) at R level"); // # nocov
1111
if (!IS_TRUE_OR_FALSE(inplaceArg)) internal_error(__func__, "argument 'inplaceArg' must be TRUE or FALSE"); // # nocov
12+
if (!IS_TRUE_OR_FALSE(nan_is_na_arg)) internal_error(__func__, "argument 'nan_is_na_arg' must be TRUE or FALSE"); // # nocov
1213
const bool inplace = LOGICAL(inplaceArg)[0];
14+
const bool nan_is_na = LOGICAL(nan_is_na_arg)[0];
1315
const bool verbose = GetVerbose();
1416
int nprotect = 0;
1517
if (length(x)==0 || isNull(VECTOR_ELT(x,0))) return R_NilValue; // coalesce(NULL, "foo") return NULL even though character type mismatches type NULL
@@ -102,23 +104,44 @@ SEXP coalesce(SEXP x, SEXP inplaceArg) {
102104
} else {
103105
double *xP = REAL(first), finalVal=NA_REAL;
104106
int k=0;
105-
for (int j=0; j<nval; ++j) {
106-
SEXP item = VECTOR_ELT(x, j+off);
107-
if (length(item)==1) {
108-
double tt = REAL(item)[0];
109-
if (ISNAN(tt)) continue;
110-
finalVal = tt;
111-
break;
107+
if (nan_is_na) {
108+
for (int j=0; j<nval; ++j) {
109+
SEXP item = VECTOR_ELT(x, j+off);
110+
if (length(item)==1) {
111+
double tt = REAL(item)[0];
112+
if (ISNAN(tt)) continue;
113+
finalVal = tt;
114+
break;
115+
}
116+
valP[k++] = REAL_RO(item);
117+
}
118+
const bool final = !ISNAN(finalVal);
119+
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
120+
for (int i=0; i<nrow; ++i) {
121+
double val=xP[i];
122+
if (!ISNAN(val)) continue;
123+
int j=0; while (ISNAN(val) && j<k) val=((double *)valP[j++])[i];
124+
if (!ISNAN(val)) xP[i]=val; else if (final) xP[i]=finalVal;
125+
}
126+
} else {
127+
for (int j=0; j<nval; ++j) {
128+
SEXP item = VECTOR_ELT(x, j+off);
129+
if (length(item)==1) {
130+
double tt = REAL(item)[0];
131+
if (ISNA(tt)) continue;
132+
finalVal = tt;
133+
break;
134+
}
135+
valP[k++] = REAL_RO(item);
136+
}
137+
const bool final = !ISNA(finalVal);
138+
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
139+
for (int i=0; i<nrow; ++i) {
140+
double val=xP[i];
141+
if (!ISNA(val)) continue;
142+
int j=0; while (ISNA(val) && j<k) val=((double *)valP[j++])[i];
143+
if (!ISNA(val)) xP[i]=val; else if (final) xP[i]=finalVal;
112144
}
113-
valP[k++] = REAL_RO(item);
114-
}
115-
const bool final = !ISNAN(finalVal);
116-
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
117-
for (int i=0; i<nrow; ++i) {
118-
double val=xP[i];
119-
if (!ISNAN(val)) continue;
120-
int j=0; while (ISNAN(val) && j<k) val=((double *)valP[j++])[i];
121-
if (!ISNAN(val)) xP[i]=val; else if (final) xP[i]=finalVal;
122145
}
123146
}
124147
} break;

0 commit comments

Comments
 (0)