Skip to content

Commit 276cdeb

Browse files
committed
Merge branch 'master' into frev
2 parents e27a6f3 + 54f9048 commit 276cdeb

File tree

10 files changed

+86
-12
lines changed

10 files changed

+86
-12
lines changed

NEWS.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020

2121
4. Namespace-qualifying `data.table::shift()`, `data.table::first()`, or `data.table::last()` will not deactivate GForce, [#5942](https://github.com/Rdatatable/data.table/issues/5942). Thanks @MichaelChirico for the proposal and fix. Namespace-qualifying other calls like `stats::sum()`, `base::prod()`, etc., continue to work as an escape valve to avoid GForce, e.g. to ensure S3 method dispatch.
2222

23+
5. `transpose` gains `list.cols=` argument, [#5639](https://github.com/Rdatatable/data.table/issues/5639). Use this to return output with list columns and avoids type promotion (an exception is `factor` columns which are promoted to `character` for consistency between `list.cols=TRUE` and `list.cols=FALSE`). This is convenient for creating a row-major representation of a table. Thanks to @MLopez-Ibanez for the request, and Benjamin Schwendinger for the PR.
24+
2325
## BUG FIXES
2426

2527
1. `unique()` returns a copy the case when `nrows(x) <= 1` instead of a mutable alias, [#5932](https://github.com/Rdatatable/data.table/pull/5932). This is consistent with existing `unique()` behavior when the input has no duplicates but more than one row. Thanks to @brookslogan for the report and @dshemetov for the fix.
@@ -30,7 +32,9 @@
3032

3133
4. `dcast(fill=NULL)` only computes default fill value if necessary, which eliminates some previous warnings (for example, when fun.aggregate=min or max, warning was NAs introduced by coercion to integer range) which were potentially confusing, [#5512](https://github.com/Rdatatable/data.table/issues/5512), [#5390](https://github.com/Rdatatable/data.table/issues/5390). Thanks to Toby Dylan Hocking for the fix.
3234

33-
3. New `frev(x, copy=TRUE)` as a faster analogue to `base::rev()` for atomic vectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Thanks to Benjamin Schwendinger for suggesting and implementing.
35+
5. `fwrite(x, row.names=TRUE)` with `x` a `matrix` writes `row.names` when present, not row numbers, [#5315](https://github.com/Rdatatable/data.table/issues/5315). Thanks to @Liripo for the report, and @ben-schwen for the fix.
36+
37+
6. New `frev(x, copy=TRUE)` as a faster analogue to `base::rev()` for atomic vectors/lists, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Thanks to Benjamin Schwendinger for suggesting and implementing.
3438

3539
```R
3640
x = sample(2e8)

R/fwrite.R

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,13 @@ fwrite = function(x, file="", append=FALSE, quote="auto",
3838
# validate arguments
3939
if (is.matrix(x)) { # coerce to data.table if input object is matrix
4040
messagef("x being coerced from class: matrix to data.table")
41-
x = as.data.table(x)
41+
# keep row.names for matrix input #5315
42+
if (row.names && !is.null(rownames(x))) {
43+
row.names = FALSE
44+
x = as.data.table(x, keep.rownames="")
45+
} else {
46+
x = as.data.table(x)
47+
}
4248
}
4349
stopifnot(is.list(x),
4450
identical(quote,"auto") || isTRUEorFALSE(quote),

R/test.data.table.R

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,10 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,no
395395
catf("Test %s did not produce correct output:\n", numStr)
396396
catf("Expected: <<%s>>\n", encodeString(output)) # \n printed as '\\n' so the two lines of output can be compared vertically
397397
catf("Observed: <<%s>>\n", encodeString(out))
398+
if (anyNonAscii(output) || anyNonAscii((out))) {
399+
catf("Expected (raw): <<%s>>\n", paste(charToRaw(output), collapse = " "))
400+
catf("Observed (raw): <<%s>>\n", paste(charToRaw(out), collapse = " "))
401+
}
398402
fail = TRUE
399403
# nocov end
400404
}
@@ -403,6 +407,10 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,no
403407
catf("Test %s produced output but should not have:\n", numStr)
404408
catf("Expected absent (case insensitive): <<%s>>\n", encodeString(notOutput))
405409
catf("Observed: <<%s>>\n", encodeString(out))
410+
if (anyNonAscii(notOutput) || anyNonAscii((out))) {
411+
catf("Expected absent (raw): <<%s>>\n", paste(charToRaw(notOutput), collapse = " "))
412+
catf("Observed (raw): <<%s>>\n", paste(charToRaw(out), collapse = " "))
413+
}
406414
fail = TRUE
407415
# nocov end
408416
}
@@ -448,6 +456,10 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,no
448456
# head.matrix doesn't restrict columns
449457
if (length(d <- dim(x))) do.call(`[`, c(list(x, drop = FALSE), lapply(pmin(d, 6L), seq_len)))
450458
else print(head(x))
459+
if (typeof(x) == 'character' && anyNonAscii(x)) {
460+
cat("Non-ASCII string detected, raw representation:\n")
461+
print(lapply(head(x), charToRaw))
462+
}
451463
}
452464
}
453465
failPrint(x, deparse(xsub))
@@ -466,3 +478,4 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,no
466478
invisible(!fail)
467479
}
468480

481+
anyNonAscii = function(x) anyNA(iconv(x, to="ASCII")) # nocov

R/transpose.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL) {
1+
transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL, list.cols=FALSE) {
22
if (!is.null(make.names)) {
33
stopifnot(length(make.names)==1L)
44
if (is.character(make.names)) {
@@ -14,7 +14,7 @@ transpose = function(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names
1414
colnames = as.character(l[[make.names]])
1515
l = if (is.data.table(l)) l[,-make.names,with=FALSE] else l[-make.names]
1616
}
17-
ans = .Call(Ctranspose, l, fill, ignore.empty, keep.names)
17+
ans = .Call(Ctranspose, l, fill, ignore.empty, keep.names, list.cols)
1818
if (!is.null(make.names)) setattr(ans, "names", c(keep.names, colnames))
1919
else if (is.data.frame(l)) # including data.table but not plain list
2020
setattr(ans, "names", c(keep.names, paste0("V", seq_len(length(ans)-length(keep.names)))))

inst/tests/tests.Rraw

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6921,10 +6921,22 @@ ll = sapply(ll, paste, collapse=",")
69216921
test(1477.07, transpose(strsplit(ll, ",", fixed=TRUE)), tstrsplit(ll, ",", fixed=TRUE))
69226922
test(1477.08, transpose(1:5), error="l must be a list")
69236923
test(1477.09, transpose(list(as.complex(c(1, 1+5i)))), error="Unsupported column type")
6924-
test(1477.10, transpose(list(list(1:5))), error="Item 1 of list input is")
6924+
test(1477.10, transpose(list(x~y)), error="Item 1 of list input is")
69256925
test(1477.11, transpose(as.list(1:5), fill=1:2), error="fill must be a length 1 vector")
69266926
test(1477.12, transpose(as.list(1:5), ignore.empty=NA), error="ignore.empty should be logical TRUE/FALSE")
69276927
test(1477.13, transpose(list()), list())
6928+
# return list columns #5639
6929+
la = list(as.list(1:3), list("a","b","c"))
6930+
lb = list(list(1L,"a"), list(2L,"b"), list(3L,"c"))
6931+
test(1477.14, transpose(list(1:3, c("a","b","c")), list.cols=TRUE), lb)
6932+
test(1477.15, transpose(list(1:3, c("a","b","c")), list.cols=FALSE), lapply(lb, unlist))
6933+
test(1477.16, transpose(la, list.cols=TRUE), lb)
6934+
test(1477.17, transpose(lb, list.cols=TRUE), la)
6935+
test(1477.18, transpose(list(list(1L,"a"), list(2L), list(3L,"c")), list.cols=TRUE, fill="b"), la)
6936+
test(1477.19, transpose(list(1:2, c("a","b","c")), list.cols=TRUE, fill=3L), lb)
6937+
test(1477.20, transpose(list(factor(letters[1:3])), list.cols=TRUE), list(list("a"), list("b"), list("c")))
6938+
test(1477.21, transpose(list(factor(letters[1:3])), list.cols=FALSE), list("a", "b", "c"))
6939+
test(1477.22, transpose(la, list.cols=NA), error="list.cols should be logical TRUE/FALSE.")
69286940

69296941
# #480 `setDT` and 'lapply'
69306942
ll = list(data.frame(a=1), data.frame(x=1, y=2), NULL, list())
@@ -9912,6 +9924,14 @@ test(1658.37, fwrite(matrix("foo"), quote=TRUE), output='"V1"\n.*"foo"', message
99129924
test(1658.38, fwrite(matrix(1:4, nrow=2, ncol=2), quote = TRUE), output = '"V1","V2"\n.*1,3\n2,4', message = "x being coerced from class: matrix to data.table")
99139925
test(1658.39, fwrite(matrix(1:3, nrow=3, ncol=1), quote = TRUE), output = '"V1"\n.*1\n2\n3', message = "x being coerced from class: matrix to data.table")
99149926
test(1658.40, fwrite(matrix(1:4, nrow=2, ncol=2, dimnames = list(c("ra","rb"),c("ca","cb"))), quote = TRUE), output = '"ca","cb"\n.*1,3\n2,4', message = "x being coerced from class: matrix to data.table")
9927+
# keep row.names for matrix input #5315
9928+
M = matrix(1:4, nrow=2)
9929+
coercion_msg = "x being coerced from class: matrix to data.table"
9930+
test(1658.401, fwrite(M, row.names=TRUE), output='"",V1,V2\n1,1,3\n2,2,4', message=coercion_msg)
9931+
test(1658.402, fwrite(M, row.names=FALSE), output='V1,V2\n1,3\n2,4', message=coercion_msg)
9932+
rownames(M) = c("a","b")
9933+
test(1658.403, fwrite(M, row.names=TRUE), output='"",V1,V2\na,1,3\nb,2,4', message=coercion_msg)
9934+
test(1658.404, fwrite(M, row.names=FALSE), output='V1,V2\n1,3\n2,4', message=coercion_msg)
99159935

99169936
# fwrite compress
99179937
if (!haszlib()) {

man/print.data.table.Rd

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
Key enhancements include automatic output compression of many observations and concise column-wise \code{class} summary.
1414

1515
\code{format_col} and \code{format_list_item} generics provide flexibility for end-users to define custom printing methods for generic classes.
16+
17+
Note also the option \code{datatable.prettyprint.char}; character columns entries exceeding this limit will be truncated, with \code{...} indicating the truncation.
1618
}
1719
\usage{
1820
\method{print}{data.table}(x,
@@ -98,6 +100,15 @@
98100
print(DT, trunc.cols=TRUE)
99101
options(old_width)
100102

103+
# `char.trunc` will truncate the strings,
104+
# if their lengths exceed the given limit: `datatable.prettyprint.char`
105+
# For example:
106+
107+
old = options(datatable.prettyprint.char=5L)
108+
DT = data.table(x=1:2, y=c("abcdefghij", "klmnopqrstuv"))
109+
DT
110+
options(old)
111+
101112
# Formatting customization
102113
format_col.complex = function(x, ...) sprintf('(\%.1f, \%.1fi)', Re(x), Im(x))
103114
x = data.table(z = c(1 + 3i, 2 - 1i, pi + 2.718i))

man/transpose.Rd

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,16 @@
66
}
77

88
\usage{
9-
transpose(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL)
9+
transpose(l, fill=NA, ignore.empty=FALSE, keep.names=NULL,
10+
make.names=NULL, list.cols=FALSE)
1011
}
1112
\arguments{
1213
\item{l}{ A list, data.frame or data.table. }
1314
\item{fill}{ Default is \code{NA}. It is used to fill shorter list elements so as to return each element of the transposed result of equal lengths. }
1415
\item{ignore.empty}{Default is \code{FALSE}. \code{TRUE} will ignore length-0 list elements.}
1516
\item{keep.names}{The name of the first column in the result containing the names of the input; e.g. \code{keep.names="rn"}. By default \code{NULL} and the names of the input are discarded.}
1617
\item{make.names}{The name or number of a column in the input to use as names of the output; e.g. \code{make.names="rn"}. By default \code{NULL} and default names are given to the output columns.}
18+
\item{list.cols}{Default is \code{FALSE}. \code{TRUE} will avoid promoting types and return columns of type \code{list} instead. \code{factor} will always be cast to \code{character}.}
1719
}
1820
\details{
1921
The list elements (or columns of \code{data.frame}/\code{data.table}) should be all \code{atomic}. If list elements are of unequal lengths, the value provided in \code{fill} will be used so that the resulting list always has all elements of identical lengths. The class of input object is also preserved in the transposed result.
@@ -38,6 +40,14 @@ setDT(transpose(ll, fill=0))[]
3840
DT = data.table(x=1:5, y=6:10)
3941
transpose(DT)
4042

43+
DT = data.table(x=1:3, y=c("a","b","c"))
44+
transpose(DT, list.cols=TRUE)
45+
46+
# base R equivalent of transpose
47+
l = list(1:3, c("a", "b", "c"))
48+
lapply(seq(length(l[[1]])), function(x) lapply(l, `[[`, x))
49+
transpose(l, list.cols=TRUE)
50+
4151
ll = list(nm=c('x', 'y'), 1:2, 3:4)
4252
transpose(ll, make.names="nm")
4353
}

src/data.table.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ SEXP lookup(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
307307
SEXP overlaps(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
308308
SEXP whichwrapper(SEXP, SEXP);
309309
SEXP shift(SEXP, SEXP, SEXP, SEXP);
310-
SEXP transpose(SEXP, SEXP, SEXP, SEXP);
310+
SEXP transpose(SEXP, SEXP, SEXP, SEXP, SEXP);
311311
SEXP anyNA(SEXP, SEXP);
312312
SEXP setlevels(SEXP, SEXP, SEXP);
313313
SEXP rleid(SEXP, SEXP);

src/transpose.c

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
#include <Rdefines.h>
33
#include <time.h>
44

5-
SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
5+
SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg, SEXP listColsArg) {
66

77
int nprotect=0;
88
if (!isNewList(l))
@@ -18,23 +18,26 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
1818
if (length(fill) != 1)
1919
error(_("fill must be a length 1 vector, such as the default NA"));
2020
R_len_t ln = LENGTH(l);
21+
if (!IS_TRUE_OR_FALSE(listColsArg))
22+
error(_("list.cols should be logical TRUE/FALSE."));
23+
bool listCol = LOGICAL(listColsArg)[0];
2124

2225
// preprocessing
2326
int maxlen=0, zerolen=0;
2427
SEXPTYPE maxtype=0;
2528
for (int i=0; i<ln; ++i) {
2629
SEXP li = VECTOR_ELT(l, i);
27-
if (!isVectorAtomic(li) && !isNull(li))
28-
error(_("Item %d of list input is not an atomic vector"), i+1);
30+
if (!isVectorAtomic(li) && !isNull(li) && !isNewList(li))
31+
error(_("Item %d of list input is not either an atomic vector, or a list"), i+1);
2932
const int len = length(li);
3033
if (len>maxlen) maxlen=len;
3134
zerolen += (len==0);
3235
SEXPTYPE type = TYPEOF(li);
3336
if (isFactor(li)) type=STRSXP;
3437
if (type>maxtype) maxtype=type;
3538
}
39+
if (listCol) maxtype=VECSXP; // need to keep preprocessing for zerolen
3640
fill = PROTECT(coerceVector(fill, maxtype)); nprotect++;
37-
3841
SEXP ans = PROTECT(allocVector(VECSXP, maxlen+rn)); nprotect++;
3942
int anslen = (ignore) ? (ln - zerolen) : ln;
4043
if (rn) {
@@ -54,7 +57,7 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
5457
const int len = length(li);
5558
if (ignore && len==0) continue;
5659
if (TYPEOF(li) != maxtype) {
57-
li = PROTECT(isFactor(li) ? asCharacterFactor(li) : coerceVector(li, maxtype));
60+
li = PROTECT(isFactor(li) ? (listCol ? coerceVector(asCharacterFactor(li), VECSXP) : asCharacterFactor(li)) : coerceVector(li, maxtype));
5861
} else PROTECT(li); // extra PROTECT just to help rchk by avoiding two counter variables
5962
switch (maxtype) {
6063
case LGLSXP : {
@@ -84,6 +87,12 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg, SEXP keepNamesArg) {
8487
SET_STRING_ELT(ansp[j+rn], k, j<len ? STRING_ELT(li, j) : sfill);
8588
}
8689
} break;
90+
case VECSXP : {
91+
const SEXP vfill = VECTOR_ELT(fill, 0);
92+
for (int j=0; j<maxlen; ++j) {
93+
SET_VECTOR_ELT(ansp[j+rn], k, j<len ? VECTOR_ELT(li, j) : vfill);
94+
}
95+
} break;
8796
default :
8897
error(_("Unsupported column type '%s'"), type2char(maxtype));
8998
}

vignettes/datatable-faq.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,7 @@ A key advantage of column vectors in R is that they are _ordered_, unlike SQL[^2
400400
- `check.names` is by default `TRUE` in `data.frame` but `FALSE` in data.table, for convenience.
401401
- `data.table` has always set `stringsAsFactors=FALSE` by default. In R 4.0.0 (Apr 2020), `data.frame`'s default was changed from `TRUE` to `FALSE` and there is no longer a difference in this regard; see [stringsAsFactors, Kurt Hornik, Feb 2020](https://developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/).
402402
- Atomic vectors in `list` columns are collapsed when printed using `", "` in `data.frame`, but `","` in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
403+
- Unlike data.frames a data.table cannot store rows with no columns, as rows are considered to be the children of columns: `nrow(DF[, 0])` returns the number of rows, while `nrow(DT[, 0])` always returns 0; but see issue [#2422](https://github.com/Rdatatable/data.table/issues/2422).
403404

404405
In `[.data.frame` we very often set `drop = FALSE`. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column `data.frame`. In `[.data.table` we took the opportunity to make it consistent and dropped `drop`.
405406

0 commit comments

Comments
 (0)