Skip to content

Commit e1329dd

Browse files
committed
Merge branch 'master' into litedown
2 parents 5ed155d + c941399 commit e1329dd

File tree

18 files changed

+134
-40
lines changed

18 files changed

+134
-40
lines changed

.github/CODE_OF_CONDUCT.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Project members with the Committer role or the CRAN Maintainer role are pledged
2424

2525
Those who prefer to report in a way that is independent of the current Committers and Maintainer may instead contact the Community Engagement Coordinator by e-mailing [r.data.table\@gmail.com](mailto:[email protected]). Messages sent to this e-mail address will be visible only to the current Community Engagement Coordinator, a position always held by an individual who is not a Committer or CRAN Maintainer of the package.
2626

27-
The current Committers are Toby Dylan Hocking (@tdhock), Matt Dowle (@mattdowle), Arun Srinivasan (@arunsrinivasan), Jan Gorecki (@jangorecki), Michael Chirico (@MichaelChirico), and Benjamin Schwendinger (@ben-schwen).
27+
The current Committers are Toby Dylan Hocking (@tdhock), Matt Dowle (@mattdowle), Arun Srinivasan (@arunsrinivasan), Jan Gorecki (@jangorecki), Michael Chirico (@MichaelChirico), Benjamin Schwendinger (@ben-schwen), and Ivan Krylov (@aitap).
2828

2929
The current CRAN Maintainer is Tyson Barrett (@tysonstanley).
3030

.github/workflows/R-CMD-check.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ jobs:
2727
# GHA does run these jobs concurrently but even so reducing the load seems like a good idea.
2828
- {os: windows-latest, r: 'devel'}
2929
# - {os: macOS-latest, r: 'release'} # test-coverage.yaml uses macOS
30-
- {os: ubuntu-20.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"}
31-
# - {os: ubuntu-20.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest", http-user-agent: "R/4.1.0 (ubuntu-20.04) R (4.1.0 x86_64-pc-linux-gnu x86_64 linux-gnu) on GitHub Actions" }
30+
# TODO(remotes>2.5.0): Use 24.04[noble?]
31+
- {os: ubuntu-22.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest"}
32+
# - {os: ubuntu-22.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/jammy/latest", http-user-agent: "R/4.1.0 (ubuntu-22.04) R (4.1.0 x86_64-pc-linux-gnu x86_64 linux-gnu) on GitHub Actions" }
3233
# GLCI covers R-devel; no need to delay contributors in dev due to changes in R-devel in recent days
3334

3435
env:
@@ -64,7 +65,7 @@ jobs:
6465
while read -r cmd
6566
do
6667
eval sudo $cmd
67-
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')
68+
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "22.04"))')
6869
6970
- name: Install dependencies
7071
run: |

CODEOWNERS

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@
2323
/src/programming.c @jangorecki
2424
/vignettes/datatable-programming.Rmd @jangorecki
2525

26+
# roll-up & setops
27+
/R/groupingsets.R @jangorecki
28+
/R/setops.R @jangorecki
29+
2630
# GForce groupby
2731
/src/gsumm.c @ben-schwen
2832
# datetime classes

NEWS.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66

77
1. New `sort_by()` method for data.tables, [#6662](https://github.com/Rdatatable/data.table/issues/6662). It uses `forder()` to improve upon the data.frame method and also match `DT[order(...)]` behavior with respect to locale. Thanks @rikivillalba for the suggestion and PR.
88

9+
2. `melt()` now supports using `patterns()` with `id.vars`, [#6867](https://github.com/Rdatatable/data.table/issues/6867). Thanks to Toby Dylan Hocking for the suggestion and PR.
10+
11+
3. `print.data.table()` now shows column classes at the bottom of large tables when `class=TRUE` and `col.names="auto"` (default) for tables with more than 20 rows, [#6902](https://github.com/Rdatatable/data.table/issues/6902). This follows the same behavior as column names at the bottom, making it easier to see column types for large tables without scrolling back to the top. Thanks to @TimTaylor for the suggestion and @Mukulyadav2004 for the PR.
12+
913
## BUG FIXES
1014

1115
1. Custom binary operators from the `lubridate` package now work with objects of class `IDate` as with a `Date` subclass, [#6839](https://github.com/Rdatatable/data.table/issues/6839). Thanks @emallickhossain for the report and @aitap for the fix.
@@ -14,10 +18,15 @@
1418

1519
3. `fread(keepLeadingZeros=TRUE)` now correctly parses dates with components with leading zeros as dates instead of strings, [#6851](https://github.com/Rdatatable/data.table/issues/6851). Thanks @TurnaevEvgeny for the report and @ben-schwen for the fix.
1620

17-
4. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
21+
4. `as.data.table()` now properly handles keys: specifying keys sets them, omitting keys preserves existing ones, and setting `key=NULL` clears them. Additionally, `keep.rownames` is now consistently passed to `as.data.table(x, keep.rownames)`, [#6859](https://github.com/Rdatatable/data.table/issues/6859). Thanks @brookslogan for the report and @Mukulyadav2004 for the fix.
1822

1923
5. `as.data.table()` on `x` avoids an infinite loop if the output of the corresponding `as.data.frame()` method has the same class as the input, [#6874](https://github.com/Rdatatable/data.table/issues/6874). Concretely, we had `class(x) = c('foo', 'data.frame')` and `class(as.data.frame(x)) = c('foo', 'data.frame')`, so `as.data.frame.foo` wound up getting called repeatedly. Thanks @matschmitz for the report and @ben-schwen for the fix.
2024

25+
6. By-reference sub-assignments to factor columns now match the levels in UTF-8, preventing their duplication when the same level exists in different encodings, [#6886](https://github.com/Rdatatable/data.table/issues/6886). Thanks @iagogv3 for the report and @aitap for the fix.
26+
27+
7. `fwrite()` now avoids a crash when translating strings into a different encoding, [#6883](https://github.com/Rdatatable/data.table/issues/6883). Thanks @filipemsc for the report and @aitap for the fix.
28+
29+
2130
## NOTES
2231

2332
1. Continued work to remove non-API C functions, [#6180](https://github.com/Rdatatable/data.table/issues/6180). Thanks Ivan Krylov for the PRs and for writing a clear and concise guide about the R API: https://aitap.codeberg.page/R-api/.

R/fmelt.R

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -182,13 +182,17 @@ melt.data.table = function(data, id.vars, measure.vars, variable.name = "variabl
182182
value.name = "value", ..., na.rm = FALSE, variable.factor = TRUE, value.factor = FALSE,
183183
verbose = getOption("datatable.verbose")) {
184184
if (!is.data.table(data)) stopf("'data' must be a data.table")
185-
if (missing(id.vars)) id.vars=NULL
186-
if (missing(measure.vars)) measure.vars = NULL
187-
measure.sub = substitute(measure.vars)
188-
if (is.call(measure.sub)) {
189-
eval.result = eval_with_cols(measure.sub, names(data))
190-
if (!is.null(eval.result)) {
191-
measure.vars = eval.result
185+
for(type.vars in c("id.vars","measure.vars")){
186+
sub.lang <- substitute({
187+
if (missing(VAR)) VAR=NULL
188+
substitute(VAR)
189+
}, list(VAR=as.symbol(type.vars)))
190+
sub.result = eval(sub.lang)
191+
if (is.call(sub.result)) {
192+
eval.result = eval_with_cols(sub.result, names(data))
193+
if (!is.null(eval.result)) {
194+
assign(type.vars, eval.result)
195+
}
192196
}
193197
}
194198
if (is.list(measure.vars)) {

R/fwrite.R

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,15 @@ fwrite = function(x, file="", append=FALSE, quote="auto",
111111
}
112112
# nocov end
113113
file = enc2native(file) # CfwriteR cannot handle UTF-8 if that is not the native encoding, see #3078.
114+
# pre-encode any strings or factor levels to avoid translateChar trying to allocate from OpenMP threads
115+
if (encoding %chin% c("UTF-8", "native")) {
116+
enc = switch(encoding, "UTF-8" = enc2utf8, "native" = enc2native)
117+
x = lapply(x, function(x) {
118+
if (is.character(x)) x = enc(x)
119+
if (is.factor(x)) levels(x) = enc(levels(x))
120+
x
121+
})
122+
}
114123
.Call(CfwriteR, x, file, sep, sep2, eol, na, dec, quote, qmethod=="escape", append,
115124
row.names, col.names, logical01, scipen, dateTimeAs, buffMB, nThread,
116125
showProgress, is_gzip, compressLevel, bom, yaml, verbose, encoding)

R/print.data.table.R

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,11 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
142142
if (nrow(toprint)>20L && col.names == "auto")
143143
# repeat colnames at the bottom if over 20 rows so you don't have to scroll up to see them
144144
# option to shut this off per request of Oleg Bondar on SO, #1482
145-
toprint = rbind(toprint, matrix(if (quote) old else colnames(toprint), nrow=1L)) # fixes bug #97
145+
toprint = rbind(
146+
toprint,
147+
matrix(if (quote) old else colnames(toprint), nrow=1L), # see #97
148+
if (isTRUE(class)) matrix(abbs, nrow=1L) # #6902
149+
)
146150
print_default(toprint)
147151
invisible(x)
148152
}

_pkgdown.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ url: https://rdatatable.gitlab.io/data.table
22

33
template:
44
bootstrap: 5
5+
light-switch: true
56

67
development:
78
version_tooltip: "Development version"
@@ -18,7 +19,7 @@ home:
1819
navbar:
1920
structure:
2021
left: [home, introduction, articles, news, benchmarks, presentations, communityarticles, reference]
21-
right: [github]
22+
right: [search, github, lightswitch]
2223
components:
2324
home:
2425
icon: fas fa-home fa-lg

inst/tests/tests.Rraw

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2832,9 +2832,8 @@ test(944.1, DT[, foo:=NULL], DT, warning="Tried to assign NULL to column 'foo',
28322832
test(944.2, DT[,a:=1L], data.table(a=1L)) # can now add columns to an empty data.table from v1.12.2
28332833
test(944.3, DT[,aa:=NULL], data.table(a=1L), warning="Tried to assign NULL to column 'aa', but this column does not exist to remove")
28342834
test(944.4, DT[,a:=NULL], data.table(NULL))
2835-
if (base::getRversion() >= "3.4.0") {
2836-
test(944.5, typeof(structure(NULL, class=c("data.table","data.frame"))), 'list', warning="deprecated, as NULL cannot have attributes") # R warns which is good and we like
2837-
}
2835+
# 944.5 used to test base R behaviour regarding structure(NULL, ...), which changed from warning to error in 4.6.0 and isn't used in data.table.
2836+
28382837
DT = data.table(a=numeric())
28392838
test(945, DT[,b:=a+1], data.table(a=numeric(),b=numeric()))
28402839

@@ -3221,6 +3220,10 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x))
32213220
test(1035.12, attr(melt(DT, id.vars=1:2)$x, "foo"), "bla1")
32223221
test(1035.13, attr(melt(DT, id.vars=1:2)$y, "bar"), 1:4)
32233222

3223+
# issue #6867 - id.vars=patterns().
3224+
DT=data.table(x_long=0, x_short=0, z=0, y1=1, y2=2)
3225+
test(1035.131, melt(DT, measure.vars=patterns("y"), id.vars=patterns("x")), data.table(x_long=0, x_short=0, variable=factor(c("y1","y2")), value=c(1,2)))
3226+
32243227
# bug #699 - melt segfaults when vars are not in dt; was test 1316
32253228
x = data.table(a=c(1,2),b=c(2,3),c=c(3,4))
32263229
test(1035.14, melt(x, id.vars="d"), error="One or more values")
@@ -21105,7 +21108,36 @@ test(2309.06, key(as.data.table(DT, key="a")), "a")
2110521108
test(2309.07, key(as.data.table(DT)), NULL)
2110621109
test(2309.08, key(as.data.table(DT, key=NULL)), NULL)
2110721110

21111+
# as.data.table(x, keep.rownames=TRUE) keeps rownames for class(x)==c("*", "data.frame")
21112+
df = structure(list(i = 1:2), class = c("tbl", "data.frame"), row.names = c("a","b"))
21113+
test(2309.09, as.data.table(df, keep.rownames=TRUE), data.table(rn = c("a","b"), i=1:2))
21114+
2110821115
# as.data.frame(x) does not reset class(x) to "data.frame" #6874
2110921116
as.data.frame.no.reset = function(x) x
2111021117
DF = structure(list(a = 1:2), class = c("data.frame", "no.reset"), row.names = c(NA, -2L))
2111121118
test(2310.01, as.data.table(DF), data.table(a=1:2))
21119+
21120+
# memrecycle() did not consider string encodings for factor levels #6886
21121+
DT = data.table(factor(rep("\uf8", 3)))
21122+
# identical() to V1's only level but stored in a different CHARSXP
21123+
samelevel = iconv(levels(DT$V1), from = "UTF-8", to = "latin1")
21124+
DT[1, V1 := samelevel]
21125+
test(2311.1, nlevels(DT$V1), 1L) # used to be 2
21126+
DT[1, V1 := factor("a", levels = c("a", samelevel))]
21127+
test(2311.2, nlevels(DT$V1), 2L) # used to be 3
21128+
21129+
# avoid translateChar*() in OpenMP threads, #6883
21130+
DF = list(rep(iconv("\uf8", from = "UTF-8", to = "latin1"), 2e6))
21131+
test(2312, fwrite(DF, nullfile(), encoding = "UTF-8", nThread = 2L), NULL)
21132+
21133+
# avoid memcpy of 0-length inputs
21134+
test(2313,
21135+
melt(data.table(a=numeric(), b=numeric(), c=numeric()), id.vars=c('a', 'b')),
21136+
data.table(a=numeric(), b=numeric(), variable=factor(levels='c'), value=numeric()))
21137+
21138+
# Testing column footer display with col.names options in print.data.table #6902
21139+
dt = data.table(id = 1:25)
21140+
# Test with class=TRUE shows classes at bottom with default col.names="auto"
21141+
test(2314.1, any(grepl("<int>", tail(capture.output(print(dt, class = TRUE)), 2))), TRUE)
21142+
# Test that class=TRUE with col.names="top" doesn't show classes at bottom
21143+
test(2314.2, !any(grepl("<int>", tail(capture.output(print(dt, class = TRUE, col.names = "top")), 2))), TRUE)

man/melt.data.table.Rd

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ multiple columns simultaneously.
1919
\arguments{
2020
\item{data}{ A \code{data.table} object to melt.}
2121
\item{id.vars}{vector of id variables. Can be integer (corresponding id
22-
column numbers) or character (id column names) vector. If missing, all
22+
column numbers) or character (id column names) vector, perhaps created using \code{patterns()}. If missing, all
2323
non-measure columns will be assigned to it. If integer, must be positive; see Details. }
2424
\item{measure.vars}{Measure variables for \code{melt}ing. Can be missing, vector, list, or pattern-based.
2525
@@ -131,6 +131,7 @@ melt(DT, id.vars=1, measure.vars=c("c_1", "c_2"), na.rm=TRUE) # remove NA
131131
# melt "f_1,f_2" and "d_1,d_2" simultaneously, retain 'factor' attribute
132132
# convenient way using internal function patterns()
133133
melt(DT, id.vars=1:2, measure.vars=patterns("^f_", "^d_"), value.factor=TRUE)
134+
melt(DT, id.vars=patterns("[in]"), measure.vars=patterns("^f_", "^d_"), value.factor=TRUE)
134135
# same as above, but provide list of columns directly by column names or indices
135136
melt(DT, id.vars=1:2, measure.vars=list(3:4, c("d_1", "d_2")), value.factor=TRUE)
136137
# same as above, but provide names directly:

0 commit comments

Comments
 (0)