Skip to content

Commit 766e852

Browse files
authored
Merge branch 'master' into hindi_j
2 parents a285934 + 6ad0524 commit 766e852

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+492
-274
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
FROM docker.io/rhub/r-minimal:devel
2+
3+
RUN apk update \
4+
&& apk add --no-cache \
5+
gcc git musl-dev openmp pkgconf tzdata zlib-dev \
6+
&& echo 'options("repos"="https://cloud.r-project.org")' >> /usr/local/lib/R/etc/Rprofile.site
7+
8+
ENV TZDIR=/usr/share/zoneinfo
9+
10+
COPY DESCRIPTION .
11+
12+
RUN Rscript -e ' \
13+
read.dcf("DESCRIPTION", c("Imports", "Suggests")) |> \
14+
tools:::.split_dependencies() |> \
15+
names() |> \
16+
setdiff(tools:::.get_standard_package_names()$base) |> \
17+
install.packages(repos="https://cloud.r-project.org") \
18+
'
19+
20+
# setup cc()
21+
WORKDIR /root
22+
COPY .devcontainer/.Rprofile .
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"build": { "dockerfile": "Dockerfile", "context": "../.." },
3+
"customizations": { "vscode": {
4+
"extensions": [
5+
"REditorSupport.r",
6+
"ms-vscode.cpptools-extension-pack"
7+
]
8+
}}
9+
}

.github/ISSUE_TEMPLATE/issue_template.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,18 @@ about: Report a bug or describe a new requested feature
55

66
Click preview tab ^^^ above!
77

8-
By continuing to file this new issue / feature request, I confirm I have :
8+
By continuing to file this new issue / feature request, I confirm I have:
99
1. searched the [live NEWS file](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) to see if it has been fixed in dev already. If so, I tried the [latest dev version](https://github.com/Rdatatable/data.table/wiki/Installation#windows).
1010
2. looked at the titles of all the issues in the [current milestones](https://github.com/Rdatatable/data.table/milestones) and am aware of those. (Adding new information to existing issues is very helpful and appreciated.)
1111
3. [searched all issues](https://github.com/Rdatatable/data.table/issues) (i.e. not in a milestone yet) for similar issues to mine and will include links to them explaining why mine is different.
1212
4. searched on [Stack Overflow's data.table tag](http://stackoverflow.com/questions/tagged/data.table) and there is nothing similar there.
1313
5. read the [Support](https://github.com/Rdatatable/data.table/wiki/Support) and [Contributing](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md) guides.
14-
6. please don't tag your issue with text in the title; project members will add the appropriate tags later.
14+
15+
Some general advice on the title and description fields for your PR
16+
17+
- Please don't tag your issue with text in the title like '[Joins]'; project members will add the appropriate tags later.
18+
- Don't write text like 'Closes #xxx' in the PR title either; GitHub does not recognize this text, whereas GitHub auto-links issues in the description, [see docs](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword).
19+
- Title and Description fields should try and be self-contained as much as possible. The title answers "what is this change" and the description provides necessary details/thought processes/things tried but abandoned. Imagine visiting your PR in 5 years' time and trying to glean what it's all about quickly and without needing to open 10 new tabs.
1520

1621
#### Thanks! Please remove the text above and include the two items below.
1722

NAMESPACE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ importFrom(methods, "S3Part<-", slotNames)
55
exportClasses(data.table, IDate, ITime)
66
##
77

8-
export(data.table, tables, setkey, setkeyv, key, "key<-", haskey, CJ, SJ, copy)
8+
export(data.table, tables, setkey, setkeyv, key, haskey, CJ, SJ, copy)
99
export(rowwiseDT)
1010
export(setindex, setindexv, indices)
1111
export(as.data.table,is.data.table,test.data.table)

NEWS.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ rowwiseDT(
6767

6868
5. `setcolorder()` gains `skip_absent` to ignore unrecognized columns (i.e. columns included in `neworder` but not present in the data), [#6044, #6068](https://github.com/Rdatatable/data.table/pull/6044). Default behavior (`skip_absent=FALSE`) remains unchanged, i.e. unrecognized columns result in an error. Thanks to @sluga for the suggestion and @sluga & @Nj221102 for the PRs.
6969

70+
6. `fread()` gains `logicalYN` argument to read columns consisting only of strings `Y`, `N` as `logical` (as opposed to character), [#4563](https://github.com/Rdatatable/data.table/issues/4563). The default is controlled by option `datatable.logicalYN`, itself defaulting to `FALSE`, for back-compatibility -- some smaller tables (especially sharded tables) might inadvertently read a "true" string column as `logical` and cause bugs. This is particularly important for tables with a column named `y` or `n` -- automatic header detection under `logicalYN=TRUE` will see these values in the first row as being "data" as opposed to column names. A parallel option was not included for `fwrite()` at this time -- users looking for a compact representation of logical columns can still use `fwrite(logical01=TRUE)`. We also opted for now to check only `Y`, `N` and not `Yes`/`No`/`YES`/`NO`.
71+
7072
## BUG FIXES
7173

7274
1. `fwrite()` respects `dec=','` for timestamp columns (`POSIXct` or `nanotime`) with sub-second accuracy, [#6446](https://github.com/Rdatatable/data.table/issues/6446). Thanks @kav2k for pointing out the inconsistency and @MichaelChirico for the PR.
@@ -93,7 +95,7 @@ rowwiseDT(
9395
# [1] "V1" "b" "c"
9496
```
9597

96-
4. Queries like `DT[, min(x):max(x)]` now work as expected, i.e. the same as `DT[, seq(min(x), max(x))]` or `with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for a follow-up ensuring back-compatibility.
98+
4. Queries like `DT[, min(x):max(x)]` now work as expected, i.e. the same as `DT[, seq(min(x), max(x))]` or `with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for follow-ups ensuring back-compatibility.
9799

98100
5. `fread()` performance improves when specifying `Date` among `colClasses`, [#6105](https://github.com/Rdatatable/data.table/issues/6105). One implication of the change is that the column will be an `IDate` (which also inherits from `Date`), which may affect code strongly relying on the column class to be `Date` exactly; computations with `IDate` and `Date` columns should otherwise be the same. If you strongly prefer the `Date` class, run `as.Date()` explicitly following `fread()`. Thanks @scipima for the report and @MichaelChirico for the fix.
99101

@@ -109,13 +111,19 @@ rowwiseDT(
109111
110112
11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
111113
112-
12. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
114+
12. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
115+
116+
13. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
113117
114-
13. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
118+
14. Auto-printing gets some substantial improvements
119+
- Suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). The old way was fragile and wound up broken by some implementation changes in {knitr}. Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
120+
- `print()` methods for S3 subclasses of data.table (e.g. an object of class `c("my.table", "data.table", "data.frame")`) no longer print where plain data.tables wouldn't, e.g. `myDT[, y := 2]`, [#3029](https://github.com/Rdatatable/data.table/issues/3029). The improved detection of auto-printing scenarios has the added benefit of _allowing_ print in highly explicit statements like `print(DT[, y := 2])`, obviating our recommendation since v1.9.6 to append `[]` to signal "please print me".
115121

116-
14. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
122+
15. Joins of `integer64` and `double` columns succeed when the `double` column has lossless `integer64` representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
117123

118-
15. The auto-printing suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
124+
16. `DT[order(...)]` better matches `base::order()` behavior by (1) recognizing the `method=` argument (and erroring since this is not supported) and (2) accepting a vector of `TRUE`/`FALSE` in `decreasing=` as an alternative to using `-a` to convey "sort `a` decreasing", [#4456](https://github.com/Rdatatable/data.table/issues/4456). Thanks @jangorecki for the FR and @MichaelChirico for the PR.
125+
126+
17. Assignment with `:=` to an S4 slot of an under-allocated data.table now works, [#6704](https://github.com/Rdatatable/data.table/issues/6704). Thanks @MichaelChirico for the report and fix.
119127

120128
## NOTES
121129

@@ -131,6 +139,24 @@ rowwiseDT(
131139

132140
6. `measurev()` was implemented and documented in v1.15.0, for use within `melt()`, and it is now exported (dependent packages can now use without a NOTE from CRAN check).
133141

142+
7. The `dcast()` and `melt()` generics no longer attempt to redirect to {reshape2} methods when passed non-`data.table`s. If you're still using {reshape2}, you must use namespace-qualification: `reshape2::dcast()`, `reshape2::melt()`. We have been warning about the deprecation since v1.12.4 (2019). Please note that {reshape2} is retired.
143+
144+
8. `showProgress` in `[` is disabled for "trivial" grouping (`.NGRP==1L`), [#6668](https://github.com/Rdatatable/data.table/issues/6668). Thanks @MichaelChirico for the request and @joshhwuu for the PR.
145+
146+
9. `key<-`, marked as deprecated since 2012 and unusable since v1.15.0, has been fully removed.
147+
148+
10. Deprecation of `logicalAsInt` argument to `fwrite()` has been upgraded from a warning (since v1.15.0) to an error. It will be removed in the next release.
149+
150+
11. Deprecation of `fread(autostart=)` has been upgraded to an error. It has been warning since v1.11.0 (6 years ago). The argument will be removed in the next release.
151+
152+
12. Deprecation of `droplevels(in.place=TRUE)` (warning since v1.16.0) has been upgraded from warning to error. The argument will be removed in the next release.
153+
154+
# data.table [v1.16.4](https://github.com/Rdatatable/data.table/milestone/36) 4 December 2024
155+
156+
## BUG FIXES
157+
158+
1. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
159+
134160
# data.table [v1.16.2](https://github.com/Rdatatable/data.table/milestone/35) (9 October 2024)
135161
136162
## BUG FIXES

R/IDateTime.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,9 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
9999
# TODO: investigate Ops.IDate method a la Ops.difftime
100100
if (inherits(e1, "difftime") || inherits(e2, "difftime"))
101101
internal_error("difftime objects may not be added to IDate, but Ops dispatch should have intervened to prevent this") # nocov
102-
if (isReallyReal(e1) || isReallyReal(e2)) {
102+
# IDate doesn't support fractional days; revert to base Date
103+
if ((is.double(e1) && !fitsInInt32(e1)) || (is.double(e2) && !fitsInInt32(e2))) {
103104
return(`+.Date`(e1, e2))
104-
# IDate doesn't support fractional days; revert to base Date
105105
}
106106
if (inherits(e1, "Date") && inherits(e2, "Date"))
107107
stopf("binary + is not defined for \"IDate\" objects")
@@ -120,7 +120,7 @@ round.IDate = function(x, digits=c("weeks", "months", "quarters", "years"), ...)
120120
if (inherits(e2, "difftime"))
121121
internal_error("difftime objects may not be subtracted from IDate, but Ops dispatch should have intervened to prevent this") # nocov
122122

123-
if ( isReallyReal(e2) ) {
123+
if ( is.double(e2) && !fitsInInt32(e2) ) {
124124
# IDate deliberately doesn't support fractional days so revert to base Date
125125
return(base::`-.Date`(as.Date(e1), e2))
126126
# can't call base::.Date directly (last line of base::`-.Date`) as tried in PR#3168 because

R/bmerge.R

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ mergeType = function(x) {
44
ans = typeof(x)
55
if (ans=="integer") { if (is.factor(x)) ans = "factor" }
66
else if (ans=="double") { if (inherits(x, "integer64")) ans = "integer64" }
7-
# do not call isReallyReal(x) yet because i) if both types are double we don't need to coerce even if one or both sides
7+
# do not call fitsInInt*(x) yet because i) if both types are double we don't need to coerce even if one or both sides
88
# are int-as-double, and ii) to save calling it until we really need it
99
ans
1010
}
@@ -103,23 +103,23 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
103103
if (x_merge_type=="integer64" || i_merge_type=="integer64") {
104104
nm = c(iname, xname)
105105
if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; nm=rev(nm) } # w is which to coerce
106-
if (wclass=="integer" || (wclass=="double" && !isReallyReal(w[[wc]]))) {
107-
if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which contains no fractions)" else "", nm[2L])
106+
if (wclass=="integer" || (wclass=="double" && fitsInInt64(w[[wc]]))) {
107+
if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which has integer64 representation, e.g. no fractions)" else "", nm[2L])
108108
set(w, j=wc, value=bit64::as.integer64(w[[wc]]))
109-
} else stopf("Incompatible join types: %s is type integer64 but %s is type double and contains fractions", nm[2L], nm[1L])
109+
} else stopf("Incompatible join types: %s is type integer64 but %s is type double and cannot be coerced to integer64 (e.g. has fractions)", nm[2L], nm[1L])
110110
} else {
111111
# just integer and double left
112112
ic_idx = which(icol == icols) # check if on is joined on multiple conditions, #6602
113113
if (i_merge_type=="double") {
114114
coerce_x = FALSE
115-
if (!isReallyReal(i[[icol]])) {
115+
if (fitsInInt32(i[[icol]])) {
116116
coerce_x = TRUE
117117
# common case of ad hoc user-typed integers missing L postfix joining to correct integer keys
118118
# we've always coerced to int and returned int, for convenience.
119119
if (length(ic_idx)>1L) {
120120
xc_idx = xcols[ic_idx]
121121
for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "double")]) {
122-
if (isReallyReal(x[[xb]])) {
122+
if (!fitsInInt32(x[[xb]])) {
123123
coerce_x = FALSE
124124
break
125125
}

0 commit comments

Comments
 (0)