Skip to content

Commit 826ab8c

Browse files
Merge branch 'master' into fix_fwrite_length
2 parents 5c57eba + b48649a commit 826ab8c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+588
-288
lines changed

.github/ISSUE_TEMPLATE/issue_template.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,18 @@ about: Report a bug or describe a new requested feature
55

66
Click preview tab ^^^ above!
77

8-
By continuing to file this new issue / feature request, I confirm I have :
8+
By continuing to file this new issue / feature request, I confirm I have:
99
1. searched the [live NEWS file](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) to see if it has been fixed in dev already. If so, I tried the [latest dev version](https://github.com/Rdatatable/data.table/wiki/Installation#windows).
1010
2. looked at the titles of all the issues in the [current milestones](https://github.com/Rdatatable/data.table/milestones) and am aware of those. (Adding new information to existing issues is very helpful and appreciated.)
1111
3. [searched all issues](https://github.com/Rdatatable/data.table/issues) (i.e. not in a milestone yet) for similar issues to mine and will include links to them explaining why mine is different.
1212
4. searched on [Stack Overflow's data.table tag](http://stackoverflow.com/questions/tagged/data.table) and there is nothing similar there.
1313
5. read the [Support](https://github.com/Rdatatable/data.table/wiki/Support) and [Contributing](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md) guides.
14-
6. please don't tag your issue with text in the title; project members will add the appropriate tags later.
14+
15+
Some general advice on the title and description fields for your PR
16+
17+
- Please don't tag your issue with text in the title like '[Joins]'; project members will add the appropriate tags later.
18+
- Don't write text like 'Closes #xxx' in the PR title either; GitHub does not recognize this text, whereas GitHub auto-links issues in the description, [see docs](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword).
19+
- Title and Description fields should try and be self-contained as much as possible. The title answers "what is this change" and the description provides necessary details/thought processes/things tried but abandoned. Imagine visiting your PR in 5 years' time and trying to glean what it's all about quickly and without needing to open 10 new tabs.
1520

1621
#### Thanks! Please remove the text above and include the two items below.
1722

.gitlab-ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ test-lin-dev-clang-cran:
194194
- echo 'CFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars
195195
- echo 'CXXFLAGS=-g -O2 -fno-common -Wall -Wvla -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars
196196
- *install-deps
197+
- clang-tidy -extra-arg=-I/usr/local/lib/R/include -checks='readability-inconsistent-declaration-parameter' src/*.c -- -std=c99
197198
- R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1)
198199
- (! grep "warning:" data.table.Rcheck/00install.out)
199200
- >-

DESCRIPTION

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,5 +98,7 @@ Authors@R: c(
9898
person("Christian", "Wia", role="ctb"),
9999
person("Elise", "Maigné", role="ctb"),
100100
person("Vincent", "Rocher", role="ctb"),
101-
person("Vijay", "Lulla", role="ctb")
101+
person("Vijay", "Lulla", role="ctb"),
102+
person("Aljaž", "Sluga", role="ctb"),
103+
person("Bill", "Evans", role="ctb")
102104
)

NAMESPACE

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ importFrom(methods, "S3Part<-", slotNames)
55
exportClasses(data.table, IDate, ITime)
66
##
77

8-
export(data.table, tables, setkey, setkeyv, key, "key<-", haskey, CJ, SJ, copy)
8+
export(data.table, tables, setkey, setkeyv, key, haskey, CJ, SJ, copy)
99
export(rowwiseDT)
1010
export(setindex, setindexv, indices)
1111
export(as.data.table,is.data.table,test.data.table)
@@ -104,6 +104,7 @@ if (getRversion() >= "4.0.0") {
104104
# version of R (and that is checked in .onLoad with error if not).
105105
export(.rbind.data.table) # only export in R<4.0.0 where it is still used; R-devel now detects it is missing doc, #5600
106106
}
107+
if (getRversion() >= "3.6.0") S3method(knitr::knit_print, data.table) # else manual delayed registration from the onLoad hook
107108
S3method(dim, data.table)
108109
S3method(dimnames, data.table)
109110
S3method("dimnames<-", data.table)
@@ -201,6 +202,7 @@ S3method(format_col, POSIXct)
201202
S3method(format_col, expression)
202203
export(format_list_item)
203204
S3method(format_list_item, default)
205+
S3method(format_list_item, data.frame)
204206

205207
export(fdroplevels, setdroplevels)
206208
S3method(droplevels, data.table)

NEWS.md

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,13 @@ rowwiseDT(
6565

6666
4. `patterns()` in `melt()` combines correctly with user-defined `cols=`, which can be useful to specify a subset of columns to reshape without having to use a regex, for example `patterns("2", cols=c("y1", "y2"))` will only give `y2` even if there are other columns in the input matching `2`, [#6498](https://github.com/Rdatatable/data.table/issues/6498). Thanks to @hongyuanjia for the report, and to @tdhock for the PR.
6767

68-
5. `fwrite()` with `compress="gzip"` produces compatible gz files when composed of multiple independent chunks owing to parallelization, [#6356](https://github.com/Rdatatable/data.table/issues/6356). Earlier `fwrite()` versions could have issues with HTTP upload using `Content-Encoding: gzip` and `Transfer-Encoding: chunked`. Thanks to @oliverfoster for report and @philippechataignon for the fix.
68+
5. `setcolorder()` gains `skip_absent` to ignore unrecognized columns (i.e. columns included in `neworder` but not present in the data), [#6044, #6068](https://github.com/Rdatatable/data.table/pull/6044). Default behavior (`skip_absent=FALSE`) remains unchanged, i.e. unrecognized columns result in an error. Thanks to @sluga for the suggestion and @sluga & @Nj221102 for the PRs.
6969

70-
6. `fwrite()` gains a new parameter `compressLevel` to control compression level for gzip, [#5506](https://github.com/Rdatatable/data.table/issues/5506). This parameter balances compression speed and total compression, and corresponds directly to the analogous command-line parameter, e.g. `compressLevel=4` corresponds to passing `-4`; the default, `6`, matches the command-line default, i.e. equivalent to passing `-6`. Thanks @mgarbuzov for the request and @philippechataignon for implementing.
70+
6. `fread()` gains `logicalYN` argument to read columns consisting only of strings `Y`, `N` as `logical` (as opposed to character), [#4563](https://github.com/Rdatatable/data.table/issues/4563). The default is controlled by option `datatable.logicalYN`, itself defaulting to `FALSE`, for back-compatibility -- some smaller tables (especially sharded tables) might inadvertently read a "true" string column as `logical` and cause bugs. This is particularly important for tables with a column named `y` or `n` -- automatic header detection under `logicalYN=TRUE` will see these values in the first row as being "data" as opposed to column names. A parallel option was not included for `fwrite()` at this time -- users looking for a compact representation of logical columns can still use `fwrite(logical01=TRUE)`. We also opted for now to check only `Y`, `N` and not `Yes`/`No`/`YES`/`NO`.
71+
72+
7. `fwrite()` with `compress="gzip"` produces compatible gz files when composed of multiple independent chunks owing to parallelization, [#6356](https://github.com/Rdatatable/data.table/issues/6356). Earlier `fwrite()` versions could have issues with HTTP upload using `Content-Encoding: gzip` and `Transfer-Encoding: chunked`. Thanks to @oliverfoster for report and @philippechataignon for the fix.
73+
74+
8. `fwrite()` gains a new parameter `compressLevel` to control compression level for gzip, [#5506](https://github.com/Rdatatable/data.table/issues/5506). This parameter balances compression speed and total compression, and corresponds directly to the analogous command-line parameter, e.g. `compressLevel=4` corresponds to passing `-4`; the default, `6`, matches the command-line default, i.e. equivalent to passing `-6`. Thanks @mgarbuzov for the request and @philippechataignon for implementing.
7175

7276
## BUG FIXES
7377

@@ -95,7 +99,7 @@ rowwiseDT(
9599
# [1] "V1" "b" "c"
96100
```
97101

98-
4. Queries like `DT[, min(x):max(x)]` now work as expected, i.e. the same as `DT[, seq(min(x), max(x))]` or `with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for a follow-up ensuring back-compatibility.
102+
4. Queries like `DT[, min(x):max(x)]` now work as expected, i.e. the same as `DT[, seq(min(x), max(x))]` or `with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for follow-ups ensuring back-compatibility.
99103

100104
5. `fread()` performance improves when specifying `Date` among `colClasses`, [#6105](https://github.com/Rdatatable/data.table/issues/6105). One implication of the change is that the column will be an `IDate` (which also inherits from `Date`), which may affect code strongly relying on the column class to be `Date` exactly; computations with `IDate` and `Date` columns should otherwise be the same. If you strongly prefer the `Date` class, run `as.Date()` explicitly following `fread()`. Thanks @scipima for the report and @MichaelChirico for the fix.
101105

@@ -111,19 +115,49 @@ rowwiseDT(
111115
112116
11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
113117
114-
12. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
118+
12. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
119+
120+
13. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
121+
122+
14. Auto-printing gets some substantial improvements
123+
- Suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). The old way was fragile and wound up broken by some implementation changes in {knitr}. Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
124+
- `print()` methods for S3 subclasses of data.table (e.g. an object of class `c("my.table", "data.table", "data.frame")`) no longer print where plain data.tables wouldn't, e.g. `myDT[, y := 2]`, [#3029](https://github.com/Rdatatable/data.table/issues/3029). The improved detection of auto-printing scenarios has the added benefit of _allowing_ print in highly explicit statements like `print(DT[, y := 2])`, obviating our recommendation since v1.9.6 to append `[]` to signal "please print me".
125+
126+
15. Joins of `integer64` and `double` columns succeed when the `double` column has lossless `integer64` representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
127+
128+
17. `DT[order(...)]` better matches `base::order()` behavior by (1) recognizing the `method=` argument (and erroring since this is not supported) and (2) accepting a vector of `TRUE`/`FALSE` in `decreasing=` as an alternative to using `-a` to convey "sort `a` decreasing", [#4456](https://github.com/Rdatatable/data.table/issues/4456). Thanks @jangorecki for the FR and @MichaelChirico for the PR.
129+
130+
17. Assignment with `:=` to an S4 slot of an under-allocated data.table now works, [#6704](https://github.com/Rdatatable/data.table/issues/6704). Thanks @MichaelChirico for the report and fix.
115131

116132
## NOTES
117133

118-
1. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix.
134+
1. There is a new vignette on joins! See `vignette("datatable-joins")`. Thanks to Angel Feliz for authoring it! Feedback welcome. This vignette has been highly requested since 2017: [#2181](https://github.com/Rdatatable/data.table/issues/2181).
135+
136+
2. Tests run again when some Suggests packages are missing, [#6411](https://github.com/Rdatatable/data.table/issues/6411). Thanks @aadler for the note and @MichaelChirico for the fix.
137+
138+
3. Some grouping operations run much faster under `verbose=TRUE`, [#6286](https://github.com/Rdatatable/data.table/issues/6286). Thanks @joshhwuu for the report and fix. This overhead was not present on Windows. As a rule, users should expect `verbose=TRUE` operations to run more slowly, as extra statistics might be calculated as part of the report; here was a case where the overhead was particularly high and the fix was particularly easy.
119139

120-
2. Some grouping operations run much faster under `verbose=TRUE`, [#6286](https://github.com/Rdatatable/data.table/issues/6286). Thanks @joshhwuu for the report and fix. This overhead was not present on Windows. As a rule, users should expect `verbose=TRUE` operations to run more slowly, as extra statistics might be calculated as part of the report; here was a case where the overhead was particularly high and the fix was particularly easy.
140+
4. `set()` and `:=` now provide some extra guidance for common incorrect approaches to assigning `NULL` to some rows of a list column. The correct way is to put `list(list(NULL))` on the RHS of `:=` (or `.(.(NULL))` for short). Thanks to @MichaelChirico for the suggestion and @Nj221102 for the implementation.
121141

122-
3. `set()` and `:=` now provide some extra guidance for common incorrect approaches to assigning `NULL` to some rows of a list column. The correct way is to put `list(list(NULL))` on the RHS of `:=` (or `.(.(NULL))` for short). Thanks to @MichaelChirico for the suggestion and @Nj221102 for the implementation.
142+
5. Improved the error message when trying to write code like `DT[, ":="(a := b, c := d)]` (which should be `DT[, ":="(a = b, c = d)]`), [#5296](https://github.com/Rdatatable/data.table/issues/5296). Thanks @MichaelChirico for the suggestion & fix.
123143

124-
4. Improved the error message when trying to write code like `DT[, ":="(a := b, c := d)]` (which should be `DT[, ":="(a = b, c = d)]`), [#5296](https://github.com/Rdatatable/data.table/issues/5296). Thanks @MichaelChirico for the suggestion & fix.
144+
6. `measurev()` was implemented and documented in v1.15.0, for use within `melt()`, and it is now exported (dependent packages can now use without a NOTE from CRAN check).
145+
146+
7. The `dcast()` and `melt()` generics no longer attempt to redirect to {reshape2} methods when passed non-`data.table`s. If you're still using {reshape2}, you must use namespace-qualification: `reshape2::dcast()`, `reshape2::melt()`. We have been warning about the deprecation since v1.12.4 (2019). Please note that {reshape2} is retired.
147+
148+
8. `showProgress` in `[` is disabled for "trivial" grouping (`.NGRP==1L`), [#6668](https://github.com/Rdatatable/data.table/issues/6668). Thanks @MichaelChirico for the request and @joshhwuu for the PR.
149+
150+
9. `key<-`, marked as deprecated since 2012 and unusable since v1.15.0, has been fully removed.
151+
152+
10. Deprecation of `logicalAsInt` argument to `fwrite()` has been upgraded from a warning (since v1.15.0) to an error. It will be removed in the next release.
153+
154+
11. Deprecation of `fread(autostart=)` has been upgraded to an error. It has been warning since v1.11.0 (6 years ago). The argument will be removed in the next release.
155+
156+
# data.table [v1.16.4](https://github.com/Rdatatable/data.table/milestone/36) 4 December 2024
157+
158+
## BUG FIXES
125159
126-
5. `measurev()` was implemented and documented in v1.15.0, for use within `melt()`, and it is now exported (dependent packages can now use without a NOTE from CRAN check).
160+
1. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
127161
128162
# data.table [v1.16.2](https://github.com/Rdatatable/data.table/milestone/35) (9 October 2024)
129163

R/IDateTime.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,9 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
9999
# TODO: investigate Ops.IDate method a la Ops.difftime
100100
if (inherits(e1, "difftime") || inherits(e2, "difftime"))
101101
internal_error("difftime objects may not be added to IDate, but Ops dispatch should have intervened to prevent this") # nocov
102-
if (isReallyReal(e1) || isReallyReal(e2)) {
102+
# IDate doesn't support fractional days; revert to base Date
103+
if ((is.double(e1) && !fitsInInt32(e1)) || (is.double(e2) && !fitsInInt32(e2))) {
103104
return(`+.Date`(e1, e2))
104-
# IDate doesn't support fractional days; revert to base Date
105105
}
106106
if (inherits(e1, "Date") && inherits(e2, "Date"))
107107
stopf("binary + is not defined for \"IDate\" objects")
@@ -120,7 +120,7 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
120120
if (inherits(e2, "difftime"))
121121
internal_error("difftime objects may not be subtracted from IDate, but Ops dispatch should have intervened to prevent this") # nocov
122122

123-
if ( isReallyReal(e2) ) {
123+
if ( is.double(e2) && !fitsInInt32(e2) ) {
124124
# IDate deliberately doesn't support fractional days so revert to base Date
125125
return(base::`-.Date`(as.Date(e1), e2))
126126
# can't call base::.Date directly (last line of base::`-.Date`) as tried in PR#3168 because

0 commit comments

Comments
 (0)