Skip to content

Commit ae2b815

Browse files
MichaelChiricoben-schwenjangorecki
authored
[cbindlist/mergelist] Implement cbindlist (#6435)
* cbindlist add cbind by reference, timing R prototype of mergelist wording use lower overhead funs stick to int32 for now, correct R_alloc bmerge C refactor for codecov and one loop for speed address revealed codecov gaps refactor vecseq for codecov seqexp helper, some alloccol export on C bmerge codecov, types handled in R bmerge already better comment seqexp bmerge mult=error #655 multiple new C utils swap if branches explain new C utils comments mostly reduce conflicts to PR #4386 comment C code address multiple matches during update-on-join #3747 Revert "address multiple matches during update-on-join #3747" This reverts commit b64c0c3. merge.dt has temporarily mult arg, for testing minor changes to cbindlist c dev mergelist, for single pair now add quiet option to cc() mergelist tests add check for names to perhaps.dt rm mult from merge.dt method rework, clean, polish multer, fix righ and full joins make full join symmetric mergepair inner function to loop on extra check for symmetric mergelist manual ensure no df-dt passed where list expected comments and manual handle 0 cols tables more tests more tests and debugging move more logic closer to bmerge, simplify mergepair more tests revert not used changes reduce not needed checks, cleanup copy arg behavior, manual, no tests yet cbindlist manual, export both cleanup processing bmerge to dtmatch test function match order for easier preview vecseq gets short-circuit batch test allow browser big cleanup remmove unneeded stuff, reduce diff more cleanup, minor manual fixes add proper test scripts Merge branch 'master' into cbind-merge-list comment out not used code for coverage more tests, some nocopy opts rename sql test script, should fix codecov simplify dtmatch inner branch more precise copy, now copy only T or F unused arg not yet in api, wording comments and refer issues codecov hasindex coverage codecov gap tests for join using key, cols argument fix missing import forderv more tests, improve missing on handling more tests for order of inner and full join for long keys new allow.cartesian option, #4383, #914 reduce diff, improve codecov reduce diff, comments need more DT, not lists, mergelist 3+ tbls proper escape heavy check unit tests more tests, address overalloc failure mergelist and cbindlist retain index manual, examples fix manual minor clarify in manual retain keys, right outer join for snowflake schema joins duplicates in cbindlist recycling in cbindlist escape 0 input in copyCols empty input handling closing cbindlist vectorized _on_ and _join.many_ arg rename dtmatch to dtmerge vectorized args: how, mult push down input validation add support for cross join, semi join, anti join full join, reduce overhead for mult=error mult default value dynamic fix manual add "see details" to Rd mention shared on in arg description amend feedback from Michael semi and anti joins will not reorder x columns Merge branch 'master' into cbind-merge-list spelling, thx to @jan-glx check all new funs used and add comments bugfix, sort=T needed for now Merge branch 'master' into cbind-merge-list Update NEWS.md Merge branch 'master' into cbind-merge-list Merge branch 'master' into cbind-merge-list NEWS placement numbering ascArg->order Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list attempt to restore from master Update to stopf() error style Need isFrame for now More quality checks: any(!x)->!all(x); use vapply_1{b,c,i} really restore from master try to PROTECT() before duplicate() update error message in test appease the rchk gods extraneous space missing ';' use catf simplify perhapsDataTableR move sqlite.Rraw.manual into other.Rraw simplify for loop Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list * cbindlist add cbind by reference, timing R prototype of mergelist wording use lower overhead funs stick to int32 for now, correct R_alloc bmerge C refactor for codecov and one loop for speed address revealed codecov gaps refactor vecseq for codecov seqexp helper, some alloccol export on C bmerge codecov, types handled in R bmerge already better comment seqexp bmerge mult=error #655 multiple new C utils swap if branches explain new C utils comments mostly reduce conflicts to PR #4386 comment C code address multiple matches during update-on-join #3747 Revert "address multiple matches during update-on-join #3747" This reverts commit b64c0c3. merge.dt has temporarily mult arg, for testing minor changes to cbindlist c dev mergelist, for single pair now add quiet option to cc() mergelist tests add check for names to perhaps.dt rm mult from merge.dt method rework, clean, polish multer, fix righ and full joins make full join symmetric mergepair inner function to loop on extra check for symmetric mergelist manual ensure no df-dt passed where list expected comments and manual handle 0 cols tables more tests more tests and debugging move more logic closer to bmerge, simplify mergepair more tests revert not used changes reduce not needed checks, cleanup copy arg behavior, manual, no tests yet cbindlist manual, export both cleanup processing bmerge to dtmatch test function match order for easier preview vecseq gets short-circuit batch test allow browser big cleanup remmove unneeded stuff, reduce diff more cleanup, minor manual fixes add proper test scripts Merge branch 'master' into cbind-merge-list comment out not used code for coverage more tests, some nocopy opts rename sql test script, should fix codecov simplify dtmatch inner branch more precise copy, now copy only T or F unused arg not yet in api, wording comments and refer issues codecov hasindex coverage codecov gap tests for join using key, cols argument fix missing import forderv more tests, improve missing on handling more tests for order of inner and full join for long keys new allow.cartesian option, #4383, #914 reduce diff, improve codecov reduce diff, comments need more DT, not lists, mergelist 3+ tbls proper escape heavy check unit tests more tests, address overalloc failure mergelist and cbindlist retain index manual, examples fix manual minor clarify in manual retain keys, right outer join for snowflake schema joins duplicates in cbindlist recycling in cbindlist escape 0 input in copyCols empty input handling closing cbindlist vectorized _on_ and _join.many_ arg rename dtmatch to dtmerge vectorized args: how, mult push down input validation add support for cross join, semi join, anti join full join, reduce overhead for mult=error mult default value dynamic fix manual add "see details" to Rd mention shared on in arg description amend feedback from Michael semi and anti joins will not reorder x columns Merge branch 'master' into cbind-merge-list spelling, thx to @jan-glx check all new funs used and add comments bugfix, sort=T needed for now Merge branch 'master' into cbind-merge-list Update NEWS.md Merge branch 'master' into cbind-merge-list Merge branch 'master' into cbind-merge-list NEWS placement numbering ascArg->order Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list attempt to restore from master Update to stopf() error style Need isFrame for now More quality checks: any(!x)->!all(x); use vapply_1{b,c,i} really restore from master try to PROTECT() before duplicate() update error message in test appease the rchk gods extraneous space missing ';' use catf simplify perhapsDataTableR move sqlite.Rraw.manual into other.Rraw simplify for loop Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list * restore ws change * cbindlist add cbind by reference, timing R prototype of mergelist wording use lower overhead funs stick to int32 for now, correct R_alloc bmerge C refactor for codecov and one loop for speed address revealed codecov gaps refactor vecseq for codecov seqexp helper, some alloccol export on C bmerge codecov, types handled in R bmerge already better comment seqexp bmerge mult=error #655 multiple new C utils swap if branches explain new C utils comments mostly reduce conflicts to PR #4386 comment C code address multiple matches during update-on-join #3747 Revert "address multiple matches during update-on-join #3747" This reverts commit b64c0c3. merge.dt has temporarily mult arg, for testing minor changes to cbindlist c dev mergelist, for single pair now add quiet option to cc() mergelist tests add check for names to perhaps.dt rm mult from merge.dt method rework, clean, polish multer, fix righ and full joins make full join symmetric mergepair inner function to loop on extra check for symmetric mergelist manual ensure no df-dt passed where list expected comments and manual handle 0 cols tables more tests more tests and debugging move more logic closer to bmerge, simplify mergepair more tests revert not used changes reduce not needed checks, cleanup copy arg behavior, manual, no tests yet cbindlist manual, export both cleanup processing bmerge to dtmatch test function match order for easier preview vecseq gets short-circuit batch test allow browser big cleanup remmove unneeded stuff, reduce diff more cleanup, minor manual fixes add proper test scripts Merge branch 'master' into cbind-merge-list comment out not used code for coverage more tests, some nocopy opts rename sql test script, should fix codecov simplify dtmatch inner branch more precise copy, now copy only T or F unused arg not yet in api, wording comments and refer issues codecov hasindex coverage codecov gap tests for join using key, cols argument fix missing import forderv more tests, improve missing on handling more tests for order of inner and full join for long keys new allow.cartesian option, #4383, #914 reduce diff, improve codecov reduce diff, comments need more DT, not lists, mergelist 3+ tbls proper escape heavy check unit tests more tests, address overalloc failure mergelist and cbindlist retain index manual, examples fix manual minor clarify in manual retain keys, right outer join for snowflake schema joins duplicates in cbindlist recycling in cbindlist escape 0 input in copyCols empty input handling closing cbindlist vectorized _on_ and _join.many_ arg rename dtmatch to dtmerge vectorized args: how, mult push down input validation add support for cross join, semi join, anti join full join, reduce overhead for mult=error mult default value dynamic fix manual add "see details" to Rd mention shared on in arg description amend feedback from Michael semi and anti joins will not reorder x columns Merge branch 'master' into cbind-merge-list spelling, thx to @jan-glx check all new funs used and add comments bugfix, sort=T needed for now Merge branch 'master' into cbind-merge-list Update NEWS.md Merge branch 'master' into cbind-merge-list Merge branch 'master' into cbind-merge-list NEWS placement numbering ascArg->order Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list attempt to restore from master Update to stopf() error style Need isFrame for now More quality checks: any(!x)->!all(x); use vapply_1{b,c,i} really restore from master try to PROTECT() before duplicate() update error message in test appease the rchk gods extraneous space missing ';' use catf simplify perhapsDataTableR move sqlite.Rraw.manual into other.Rraw simplify for loop Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list * cbindlist add cbind by reference, timing R prototype of mergelist wording use lower overhead funs stick to int32 for now, correct R_alloc bmerge C refactor for codecov and one loop for speed address revealed codecov gaps refactor vecseq for codecov seqexp helper, some alloccol export on C bmerge codecov, types handled in R bmerge already better comment seqexp bmerge mult=error #655 multiple new C utils swap if branches explain new C utils comments mostly reduce conflicts to PR #4386 comment C code address multiple matches during update-on-join #3747 Revert "address multiple matches during update-on-join #3747" This reverts commit b64c0c3. merge.dt has temporarily mult arg, for testing minor changes to cbindlist c dev mergelist, for single pair now add quiet option to cc() mergelist tests add check for names to perhaps.dt rm mult from merge.dt method rework, clean, polish multer, fix righ and full joins make full join symmetric mergepair inner function to loop on extra check for symmetric mergelist manual ensure no df-dt passed where list expected comments and manual handle 0 cols tables more tests more tests and debugging move more logic closer to bmerge, simplify mergepair more tests revert not used changes reduce not needed checks, cleanup copy arg behavior, manual, no tests yet cbindlist manual, export both cleanup processing bmerge to dtmatch test function match order for easier preview vecseq gets short-circuit batch test allow browser big cleanup remmove unneeded stuff, reduce diff more cleanup, minor manual fixes add proper test scripts Merge branch 'master' into cbind-merge-list comment out not used code for coverage more tests, some nocopy opts rename sql test script, should fix codecov simplify dtmatch inner branch more precise copy, now copy only T or F unused arg not yet in api, wording comments and refer issues codecov hasindex coverage codecov gap tests for join using key, cols argument fix missing import forderv more tests, improve missing on handling more tests for order of inner and full join for long keys new allow.cartesian option, #4383, #914 reduce diff, improve codecov reduce diff, comments need more DT, not lists, mergelist 3+ tbls proper escape heavy check unit tests more tests, address overalloc failure mergelist and cbindlist retain index manual, examples fix manual minor clarify in manual retain keys, right outer join for snowflake schema joins duplicates in cbindlist recycling in cbindlist escape 0 input in copyCols empty input handling closing cbindlist vectorized _on_ and _join.many_ arg rename dtmatch to dtmerge vectorized args: how, mult push down input validation add support for cross join, semi join, anti join full join, reduce overhead for mult=error mult default value dynamic fix manual add "see details" to Rd mention shared on in arg description amend feedback from Michael semi and anti joins will not reorder x columns Merge branch 'master' into cbind-merge-list spelling, thx to @jan-glx check all new funs used and add comments bugfix, sort=T needed for now Merge branch 'master' into cbind-merge-list Update NEWS.md Merge branch 'master' into cbind-merge-list Merge branch 'master' into cbind-merge-list NEWS placement numbering ascArg->order Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list attempt to restore from master Update to stopf() error style Need isFrame for now More quality checks: any(!x)->!all(x); use vapply_1{b,c,i} really restore from master try to PROTECT() before duplicate() update error message in test appease the rchk gods extraneous space missing ';' use catf simplify perhapsDataTableR move sqlite.Rraw.manual into other.Rraw simplify for loop Merge remote-tracking branch 'origin/cbind-merge-list' into cbind-merge-list * Apply Ben's suggested changes Co-authored-by: Benjamin Schwendinger <[email protected]> * Test address overlap with original 'l' directly * Use local() blocks to "seal off" local variables from other current+future tests * whitespace style * test that key is wiped from duplicates * revert: next test already tests wiped keys; add comment * rm redundant test * no local() block needed anymore * retain multiple keys * refine description of copy= for now * missing ')' Co-authored-by: Jan Gorecki <[email protected]> * r->R * fixed approach for reading addresses * mis * more updated error expectations * disable new test for now * move setDT behavior test into tests.Rraw * split out setcbindlist * missed setcbindlist in Rd * Add as TODO * attempt to fix tests * fix setDT test * one last ironing out * ugh * drop unreachable code * Also unreachable by known code paths * Annotate UNPROTECT --------- Co-authored-by: Benjamin Schwendinger <[email protected]> Co-authored-by: Jan Gorecki <[email protected]>
1 parent 96c3e6a commit ae2b815

File tree

9 files changed

+221
-0
lines changed

9 files changed

+221
-0
lines changed

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ export(nafill)
5959
export(setnafill)
6060
export(.Last.updated)
6161
export(fcoalesce)
62+
export(cbindlist, setcbindlist)
6263
export(substitute2)
6364
#export(DT) # mtcars |> DT(i,j,by) #4872 #5472
6465
export(fctr)

R/mergelist.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
cbindlist_impl_ = function(l, copy) {
2+
ans = .Call(Ccbindlist, l, copy)
3+
if (anyDuplicated(names(ans))) { ## invalidate key and index
4+
setattr(ans, "sorted", NULL)
5+
setattr(ans, "index", NULL)
6+
}
7+
setDT(ans)
8+
ans
9+
}
10+
11+
cbindlist = function(l) cbindlist_impl_(l, copy=TRUE)
12+
setcbindlist = function(l) cbindlist_impl_(l, copy=FALSE)

inst/tests/mergelist.Rraw

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
require(methods)
2+
3+
if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) {
4+
if ((tt<-compiler::enableJIT(-1))>0)
5+
cat("This is dev mode and JIT is enabled (level ", tt, ") so there will be a brief pause around the first test.\n", sep="")
6+
} else {
7+
require(data.table)
8+
test = data.table:::test
9+
}
10+
11+
# cbindlist, setcbindlist
12+
13+
local({
14+
l = list(
15+
d1 = data.table(x=1:3, v1=1L),
16+
d2 = data.table(y=3:1, v2=2L),
17+
d3 = data.table(z=2:4, v3=3L)
18+
)
19+
ans = cbindlist(l)
20+
expected = data.table(l$d1, l$d2, l$d3)
21+
test(11.01, ans, expected)
22+
test(11.02, intersect(vapply(ans, address, ""), unlist(lapply(l, vapply, address, ""))), character())
23+
ans = setcbindlist(l)
24+
expected = setDT(c(l$d1, l$d2, l$d3))
25+
test(11.03, ans, expected)
26+
test(11.04, length(intersect(vapply(ans, address, ""), unlist(lapply(l, vapply, address, "")))), ncol(expected))
27+
})
28+
29+
test(11.05, cbindlist(list(data.table(a=1L), data.table(), data.table(d=2L), data.table(f=3L))), data.table(a=1L, d=2L, f=3L))
30+
## codecov
31+
test(12.01, cbindlist(data.frame(a=1L)), error="must be a list")
32+
test(12.02, cbindlist(TRUE), error="must be a list")
33+
test(12.03, cbindlist(list(data.table(a=1L), 1L)), error="is not a data.table")
34+
test(12.04, options = c(datatable.verbose=TRUE), cbindlist(list(data.table(a=1:2), data.table(b=1:2))), data.table(a=1:2, b=1:2), output="cbindlist.*took")
35+
test(12.05, cbindlist(list(data.table(), data.table(a=1:2), data.table(b=1:2))), data.table(a=1:2, b=1:2))
36+
test(12.06, cbindlist(list(data.table(), data.table(a=1:2), list(b=1:2))), data.table(a=1:2, b=1:2))
37+
test(12.07, cbindlist(list(data.table(a=integer()), list(b=integer()))), data.table(a=integer(), b=integer()))
38+
## duplicated names
39+
test(12.08, cbindlist(list(data.table(a=1L, b=2L), data.table(b=3L, d=4L))), data.table(a=1L, b=2L, b=3L, d=4L))
40+
local({
41+
# also test that keys, indices are wiped
42+
ans = cbindlist(list(setindexv(data.table(a=2:1, b=1:2), "a"), data.table(a=1:2, b=2:1, key="a"), data.table(a=2:1, b=1:2)))
43+
test(12.09, ans, data.table(a=2:1, b=1:2, a=1:2, b=2:1, a=2:1, b=1:2))
44+
test(12.10, indices(ans), NULL)
45+
})
46+
## recycling, first ensure cbind recycling that we want to match to
47+
test(12.11, cbind(data.table(x=integer()), data.table(a=1:2)), data.table(x=c(NA_integer_, NA), a=1:2))
48+
test(12.12, cbind(data.table(x=1L), data.table(a=1:2)), data.table(x=c(1L, 1L), a=1:2))
49+
test(12.13, cbindlist(list(data.table(a=integer()), data.table(b=1:2))), error="Recycling.*not yet implemented")
50+
test(12.14, cbindlist(list(data.table(a=1L), data.table(b=1:2))), error="Recycling.*not yet implemented")
51+
test(12.15, setcbindlist(list(data.table(a=integer()), data.table(b=1:2))), error="have to have the same number of rows")
52+
test(12.16, setcbindlist(list(data.table(a=1L), data.table(b=1:2))), error="have to have the same number of rows")
53+
54+
## retain indices
55+
local({
56+
l = list(
57+
data.table(id1=1:5, id2=5:1, id3=1:5, v1=1:5),
58+
data.table(id4=5:1, id5=1:5, v2=1:5),
59+
data.table(id6=5:1, id7=1:5, v3=1:5),
60+
data.table(id8=5:1, id9=5:1, v4=1:5)
61+
)
62+
setkeyv(l[[1L]], "id1"); setindexv(l[[1L]], list("id1", "id2", "id3", c("id1","id2","id3"))); setindexv(l[[3L]], list("id6", "id7")); setindexv(l[[4L]], "id9")
63+
ii = lapply(l, indices)
64+
ans = cbindlist(l)
65+
test(13.1, key(ans), "id1")
66+
test(13.2, indices(ans), c("id1", "id2", "id3", "id1__id2__id3", "id6", "id7", "id9"))
67+
test(13.3, ii, lapply(l, indices)) ## this tests that original indices have not been touched, shallow_duplicate in mergeIndexAttrib
68+
})
69+
test(13.4, cbindlist(list(data.table(a=1:2), data.table(b=3:4, key="b"))), data.table(a=1:2, b=3:4, key="b"))
70+
# TODO(#7116): this could be supported
71+
# test(13.5, cbindlist(list(data.table(a=1:2, key="a"), data.table(b=3:4, key="b"))), data.table(a=1:2, b=3:4, key=c("a", "b")))

inst/tests/tests.Rraw

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21291,3 +21291,15 @@ unlink(f)
2129121291
test(2325.2,
2129221292
fread('"foo","bar","baz"\n"a","b","c"', na.strings=c('"foo"', '"bar"', '"baz"'), header=FALSE),
2129321293
data.table(V1=c(NA, "a"), V2=c(NA, "b"), V3=c(NA, "c")))
21294+
21295+
## ensure setDT will retain key and indices when it is called on the list (cbindlist assumes this)
21296+
local({
21297+
d = data.table(x=1:2, y=2:1, z=2:1, v1=1:2)
21298+
setkeyv(d, "x"); setindexv(d, list("y", "z"))
21299+
a = attributes(d)
21300+
attributes(d) = a[!names(a) %in% c("class", ".internal.selfref", "row.names")]
21301+
test(2326.1, class(d), "list")
21302+
setDT(d)
21303+
test(2326.2, key(d), "x")
21304+
test(2326.3, indices(d), c("y", "z"))
21305+
})

man/cbindlist.Rd

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
\name{cbindlist}
2+
\alias{cbindlist}
3+
\alias{setcbindlist}
4+
\alias{cbind}
5+
\alias{cbind.data.table}
6+
\title{Column bind multiple data.tables}
7+
\description{
8+
Column bind multiple \code{data.table}s.
9+
}
10+
\usage{
11+
cbindlist(l)
12+
setcbindlist(l)
13+
}
14+
\arguments{
15+
\item{l}{ \code{list} of \code{data.table}s to merge. }
16+
}
17+
\details{
18+
Column bind only stacks input elements. Works like \code{\link{data.table}}, but takes \code{list} type on input. Zero-column tables in \code{l} are omitted. Tables in \code{l} should have matching row count; recycling of length-1 rows is not yet implemented. Indices of the input tables are transferred to the resulting table, as well as the \emph{key} of the first keyed table.
19+
}
20+
\value{
21+
A new \code{data.table} based on the stacked objects.
22+
23+
For \code{setcbindlist}, columns in the output will be shared with the input, i.e., \emph{no copy is made}.
24+
}
25+
\note{
26+
No attempt is made to deduplicate resulting names. If the result has any duplicate names, keys and indices are removed.
27+
}
28+
\seealso{
29+
\code{\link{data.table}}, \code{\link{rbindlist}}, \code{\link{setDT}}
30+
}
31+
\examples{
32+
d1 = data.table(x=1:3, v1=1L, key="x")
33+
d2 = data.table(y=3:1, v2=2L, key="y")
34+
d3 = data.table(z=2:4, v3=3L)
35+
cbindlist(list(d1, d2, d3))
36+
cbindlist(list(d1, d1))
37+
d4 = setcbindlist(list(d1))
38+
d4[, v1:=2L]
39+
identical(d4, d1)
40+
}
41+
\keyword{ data }

src/data.table.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,9 @@ SEXP substitute_call_arg_namesR(SEXP expr, SEXP env);
298298
//negate.c
299299
SEXP notchin(SEXP x, SEXP table);
300300

301+
// mergelist.c
302+
SEXP cbindlist(SEXP x, SEXP copyArg);
303+
301304
// functions called from R level .Call/.External and registered in init.c
302305
// these now live here to pass -Wstrict-prototypes, #5477
303306
// all arguments must be SEXP since they are called from R level

src/init.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ R_CallMethodDef callMethods[] = {
149149
{"CstartsWithAny", (DL_FUNC)&startsWithAny, -1},
150150
{"CconvertDate", (DL_FUNC)&convertDate, -1},
151151
{"Cnotchin", (DL_FUNC)&notchin, -1},
152+
{"Ccbindlist", (DL_FUNC) &cbindlist, -1},
152153
{"Cwarn_matrix_column_r", (DL_FUNC)&warn_matrix_column_r, -1},
153154
{NULL, NULL, 0}
154155
};

src/mergelist.c

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#include "data.table.h"
2+
3+
void mergeIndexAttrib(SEXP to, SEXP from) {
4+
if (!isInteger(to) || LENGTH(to)!=0)
5+
internal_error(__func__, "'to' must be integer() already"); // # nocov
6+
if (isNull(from))
7+
return;
8+
SEXP t = ATTRIB(to), f = ATTRIB(from);
9+
if (isNull(t)) // target has no attributes -> overwrite
10+
SET_ATTRIB(to, shallow_duplicate(f));
11+
else {
12+
for (t = ATTRIB(to); CDR(t) != R_NilValue; t = CDR(t)); // traverse to end of attributes list of to
13+
SETCDR(t, shallow_duplicate(f));
14+
}
15+
}
16+
17+
SEXP cbindlist(SEXP x, SEXP copyArg) {
18+
if (!isNewList(x) || isFrame(x))
19+
error(_("'%s' must be a list"), "x");
20+
bool copy = (bool)LOGICAL(copyArg)[0];
21+
const bool verbose = GetVerbose();
22+
double tic = 0;
23+
if (verbose)
24+
tic = omp_get_wtime();
25+
int nx = length(x), nans = 0, nr = -1, *nnx = (int*)R_alloc(nx, sizeof(int));
26+
bool recycle = false;
27+
for (int i=0; i<nx; ++i) {
28+
SEXP thisx = VECTOR_ELT(x, i);
29+
if (!perhapsDataTable(thisx))
30+
error(_("Element %d of 'l' list is not a data.table."), i+1);
31+
nnx[i] = n_columns(thisx);
32+
if (!nnx[i])
33+
continue;
34+
int thisnr = n_rows(thisx);
35+
if (nr < 0) // first (non-zero length table) iteration
36+
nr = thisnr;
37+
else if (nr != thisnr) {
38+
if (!copy)
39+
error(_("For copy=FALSE all non-empty tables in 'l' have to have the same number of rows, but l[[%d]] has %d rows which differs from the previous non-zero number of rows (%d)."), i+1, thisnr, nr);
40+
recycle = true;
41+
}
42+
nans += nnx[i];
43+
}
44+
if (recycle)
45+
error(_("Recycling rows is not yet implemented.")); // dont we have a routines for that already somewhere?
46+
SEXP ans = PROTECT(allocVector(VECSXP, nans));
47+
SEXP index = PROTECT(allocVector(INTSXP, 0));
48+
SEXP key = R_NilValue;
49+
setAttrib(ans, sym_index, index);
50+
SEXP names = PROTECT(allocVector(STRSXP, nans));
51+
for (int i=0, ians=0; i<nx; ++i) {
52+
int protecti =0;
53+
SEXP thisx = VECTOR_ELT(x, i);
54+
SEXP thisnames = PROTECT(getAttrib(thisx, R_NamesSymbol)); protecti++;
55+
for (int j=0; j<nnx[i]; ++j, ++ians) {
56+
SEXP thisxcol;
57+
if (copy) {
58+
thisxcol = PROTECT(duplicate(VECTOR_ELT(thisx, j))); protecti++;
59+
} else {
60+
thisxcol = VECTOR_ELT(thisx, j);
61+
}
62+
SET_VECTOR_ELT(ans, ians, thisxcol);
63+
SET_STRING_ELT(names, ians, STRING_ELT(thisnames, j));
64+
}
65+
mergeIndexAttrib(index, getAttrib(thisx, sym_index));
66+
if (isNull(key)) // first key is retained
67+
key = getAttrib(thisx, sym_sorted);
68+
UNPROTECT(protecti); // thisnames, thisxcol
69+
}
70+
if (isNull(ATTRIB(index)))
71+
setAttrib(ans, sym_index, R_NilValue);
72+
setAttrib(ans, R_NamesSymbol, names);
73+
setAttrib(ans, sym_sorted, key);
74+
if (verbose)
75+
Rprintf(_("cbindlist: took %.3fs\n"), omp_get_wtime()-tic);
76+
UNPROTECT(3); // ans, index, names
77+
return ans;
78+
}

tests/mergelist.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
require(data.table)
2+
test.data.table(script="mergelist.Rraw")

0 commit comments

Comments
 (0)