Skip to content

Commit 66cb6d2

Browse files
Add nan parameter to fcoalesce for NaN/NA distinction control (#7189)
* added nan parameter to fcoalesce * Style, link ?nafill * incorporate #7186 insights here too * duplicate loop for NA and NAN arg * tests * added tests for use of vector replacement also * added news entry --------- Co-authored-by: Michael Chirico <[email protected]>
1 parent 2b191ae commit 66cb6d2

File tree

6 files changed

+56
-22
lines changed

6 files changed

+56
-22
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@
5656

5757
13. New `mergelist()` and `setmergelist()` similarly work _a la_ `Reduce()` to recursively merge a `list` of data.tables, [#599](https://github.com/Rdatatable/data.table/issues/599). Different join modes (_left_, _inner_, _full_, _right_, _semi_, _anti_, and _cross_) are supported through the `how` argument; duplicate handling goes through the `mult` argument. `setmergelist()` carefully avoids copies where one is not needed, e.g. in a 1:1 left join. Thanks Patrick Nicholson for the FR (in 2013!), @jangorecki for the PR, and @MichaelChirico for extensive reviews and fine-tuning.
5858

59+
14. `fcoalesce()` and `setcoalesce()` gain `nan` argument to control whether `NaN` values should be treated as missing (`nan=NA`, the default) or non-missing (`nan=NaN`), [#4567](https://github.com/Rdatatable/data.table/issues/4567). This provides full compatibility with `nafill()` behavior. Thanks to @ethanbsmith for the feature request and @Mukulyadav2004 for the implementation.
60+
5961
### BUG FIXES
6062

6163
1. `fread()` no longer warns on certain systems on R 4.5.0+ where the file owner can't be resolved, [#6918](https://github.com/Rdatatable/data.table/issues/6918). Thanks @ProfFancyPants for the report and PR.

R/wrappers.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
# Very small (e.g. one line) R functions that just call C.
33
# One file wrappers.R to avoid creating lots of small .R files.
44

5-
fcoalesce = function(...) .Call(Ccoalesce, list(...), FALSE)
6-
setcoalesce = function(...) .Call(Ccoalesce, list(...), TRUE)
5+
fcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), FALSE, nan_is_na(nan))
6+
setcoalesce = function(..., nan=NA) .Call(Ccoalesce, list(...), TRUE, nan_is_na(nan))
77

88
fifelse = function(test, yes, no, na=NA) .Call(CfifelseR, test, yes, no, na)
99
fcase = function(..., default=NA) {

inst/tests/tests.Rraw

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15586,6 +15586,11 @@ test(2060.154, fcoalesce(list(x)), x)
1558615586
test(2060.155, setcoalesce(list(x)), x)
1558715587
test(2060.156, setcoalesce(list(x,y,z)), ans)
1558815588
test(2060.157, x, ans) # setcoalesce updated the first item (x) by reference
15589+
# nan parameter, #4567
15590+
test(2060.158, fcoalesce(c(NA_real_, NaN), 0, nan=NA), c(0, 0))
15591+
test(2060.159, fcoalesce(c(NA_real_, NaN), 0, nan=NaN), c(0, NaN))
15592+
test(2060.160, fcoalesce(c(NA_real_, NaN), c(1, 2), nan=NA), c(1, 2))
15593+
test(2060.161, fcoalesce(c(NA_real_, NaN), c(1, 2), nan=NaN), c(1, NaN))
1558915594
# factor of different levels
1559015595
x = factor(c('a','b',NA,NA,'b'))
1559115596
y = factor(c('b','b','a',NA,'b'))

man/coalesce.Rd

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@ Fill in missing values in a vector by successively pulling from candidate vector
77
Written in C, and multithreaded for numeric and factor types.
88
}
99
\usage{
10-
fcoalesce(\dots)
10+
fcoalesce(\dots, nan=NA)
1111
}
1212
\arguments{
1313
\item{\dots}{ A set of same-class vectors. These vectors can be supplied as separate arguments or as a single plain list, data.table or data.frame, see examples. }
14+
\item{nan}{ Either \code{NaN} or \code{NA}; if \code{NaN}, then \code{NaN} is treated as distinct from \code{NA}, otherwise they are treated the same during replacement (double columns only). }
1415
}
1516
\details{
1617
Factor type is supported only when the factor levels of each item are equal.
@@ -22,7 +23,7 @@ Atomic vector of the same type and length as the first vector, having \code{NA}
2223
If the first item is \code{NULL}, the result is \code{NULL}.
2324
}
2425
\seealso{
25-
\code{\link{fifelse}}
26+
\code{\link{fifelse}}, \code{\link{nafill}}
2627
}
2728
\examples{
2829
x = c(11L, NA, 13L, NA, 15L, NA)
@@ -31,6 +32,9 @@ z = c(11L, NA, 1L, 14L, NA, NA)
3132
fcoalesce(x, y, z)
3233
fcoalesce(list(x,y,z)) # same
3334
fcoalesce(x, list(y,z)) # same
35+
x_num = c(NaN, NA_real_, 3.0)
36+
fcoalesce(x_num, 1) # default: NaN treated as missing -> c(1, 1, 3)
37+
fcoalesce(x_num, 1, nan=NaN) # preserve NaN -> c(NaN, 1, 3)
3438
}
3539
\keyword{ data }
3640

src/coalesce.c

Lines changed: 40 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@
66
- The replacement of NAs with non-NA values from subsequent vectors
77
- The conditional checks within parallelized loops
88
*/
9-
SEXP coalesce(SEXP x, SEXP inplaceArg) {
9+
SEXP coalesce(SEXP x, SEXP inplaceArg, SEXP nan_is_na_arg) {
1010
if (TYPEOF(x)!=VECSXP) internal_error(__func__, "input is list(...) at R level"); // # nocov
1111
if (!IS_TRUE_OR_FALSE(inplaceArg)) internal_error(__func__, "argument 'inplaceArg' must be TRUE or FALSE"); // # nocov
12+
if (!IS_TRUE_OR_FALSE(nan_is_na_arg)) internal_error(__func__, "argument 'nan_is_na_arg' must be TRUE or FALSE"); // # nocov
1213
const bool inplace = LOGICAL(inplaceArg)[0];
14+
const bool nan_is_na = LOGICAL(nan_is_na_arg)[0];
1315
const bool verbose = GetVerbose();
1416
int nprotect = 0;
1517
if (length(x)==0 || isNull(VECTOR_ELT(x,0))) return R_NilValue; // coalesce(NULL, "foo") return NULL even though character type mismatches type NULL
@@ -102,23 +104,44 @@ SEXP coalesce(SEXP x, SEXP inplaceArg) {
102104
} else {
103105
double *xP = REAL(first), finalVal=NA_REAL;
104106
int k=0;
105-
for (int j=0; j<nval; ++j) {
106-
SEXP item = VECTOR_ELT(x, j+off);
107-
if (length(item)==1) {
108-
double tt = REAL(item)[0];
109-
if (ISNAN(tt)) continue;
110-
finalVal = tt;
111-
break;
107+
if (nan_is_na) {
108+
for (int j=0; j<nval; ++j) {
109+
SEXP item = VECTOR_ELT(x, j+off);
110+
if (length(item)==1) {
111+
double tt = REAL(item)[0];
112+
if (ISNAN(tt)) continue;
113+
finalVal = tt;
114+
break;
115+
}
116+
valP[k++] = REAL_RO(item);
117+
}
118+
const bool final = !ISNAN(finalVal);
119+
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
120+
for (int i=0; i<nrow; ++i) {
121+
double val=xP[i];
122+
if (!ISNAN(val)) continue;
123+
int j=0; while (ISNAN(val) && j<k) val=((double *)valP[j++])[i];
124+
if (!ISNAN(val)) xP[i]=val; else if (final) xP[i]=finalVal;
125+
}
126+
} else {
127+
for (int j=0; j<nval; ++j) {
128+
SEXP item = VECTOR_ELT(x, j+off);
129+
if (length(item)==1) {
130+
double tt = REAL(item)[0];
131+
if (ISNA(tt)) continue;
132+
finalVal = tt;
133+
break;
134+
}
135+
valP[k++] = REAL_RO(item);
136+
}
137+
const bool final = !ISNA(finalVal);
138+
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
139+
for (int i=0; i<nrow; ++i) {
140+
double val=xP[i];
141+
if (!ISNA(val)) continue;
142+
int j=0; while (ISNA(val) && j<k) val=((double *)valP[j++])[i];
143+
if (!ISNA(val)) xP[i]=val; else if (final) xP[i]=finalVal;
112144
}
113-
valP[k++] = REAL_RO(item);
114-
}
115-
const bool final = !ISNAN(finalVal);
116-
#pragma omp parallel for num_threads(getDTthreads(nrow, true))
117-
for (int i=0; i<nrow; ++i) {
118-
double val=xP[i];
119-
if (!ISNAN(val)) continue;
120-
int j=0; while (ISNAN(val) && j<k) val=((double *)valP[j++])[i];
121-
if (!ISNAN(val)) xP[i]=val; else if (final) xP[i]=finalVal;
122145
}
123146
}
124147
} break;

src/data.table.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ SEXP nafillR(SEXP obj, SEXP type, SEXP fill, SEXP nan_is_na_arg, SEXP inplace, S
251251
SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP incbounds, SEXP NAbounds, SEXP check);
252252

253253
// coalesce.c
254-
SEXP coalesce(SEXP x, SEXP inplace);
254+
SEXP coalesce(SEXP x, SEXP inplace, SEXP nan_is_na_arg);
255255

256256
// utils.c
257257
bool within_int32_repres(double x);

0 commit comments

Comments
 (0)