You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Slow="fd24a3105953f7785ea7414678ed8e04524e6955", # Parent of the merge commit (https://github.com/Rdatatable/data.table/commit/ed72e398df76a0fcfd134a4ad92356690e4210ea) of the PR (https://github.com/Rdatatable/data.table/pull/5054) that fixes the issue
234
-
Fast="ed72e398df76a0fcfd134a4ad92356690e4210ea"), # Merge commit of the PR (https://github.com/Rdatatable/data.table/pull/5054) that fixes the issue
234
+
Fast="ed72e398df76a0fcfd134a4ad92356690e4210ea"), # Merge commit of the PR (https://github.com/Rdatatable/data.table/pull/5054) that fixes the issue # Test case created directly using the atime code below (not adapted from any other benchmark), based on the issue/fix PR https://github.com/Rdatatable/data.table/pull/5054#issue-930603663 "melt should be more efficient when there are missing input columns."
235
+
236
+
# Test case created from @tdhock's comment https://github.com/Rdatatable/data.table/pull/6393#issuecomment-2327396833, in turn adapted from @philippechataignon's comment https://github.com/Rdatatable/data.table/pull/6393#issuecomment-2326714012
Before="f339aa64c426a9cd7cf2fcb13d91fc4ed353cd31", # Parent of the first commit https://github.com/Rdatatable/data.table/commit/fcc10d73a20837d0f1ad3278ee9168473afa5ff1 in the PR https://github.com/Rdatatable/data.table/pull/6393/commits with major change to fwrite with gzip.
247
+
PR="3630413ae493a5a61b06c50e80d166924d2ef89a"), # Close-to-last merge commit in the PR.
Copy file name to clipboardExpand all lines: .github/ISSUE_TEMPLATE/issue_template.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,18 @@ about: Report a bug or describe a new requested feature
5
5
6
6
Click preview tab ^^^ above!
7
7
8
-
By continuing to file this new issue / feature request, I confirm I have:
8
+
By continuing to file this new issue / feature request, I confirm I have:
9
9
1. searched the [live NEWS file](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) to see if it has been fixed in dev already. If so, I tried the [latest dev version](https://github.com/Rdatatable/data.table/wiki/Installation#windows).
10
10
2. looked at the titles of all the issues in the [current milestones](https://github.com/Rdatatable/data.table/milestones) and am aware of those. (Adding new information to existing issues is very helpful and appreciated.)
11
11
3.[searched all issues](https://github.com/Rdatatable/data.table/issues) (i.e. not in a milestone yet) for similar issues to mine and will include links to them explaining why mine is different.
12
12
4. searched on [Stack Overflow's data.table tag](http://stackoverflow.com/questions/tagged/data.table) and there is nothing similar there.
13
13
5. read the [Support](https://github.com/Rdatatable/data.table/wiki/Support) and [Contributing](https://github.com/Rdatatable/data.table/blob/master/.github/CONTRIBUTING.md) guides.
14
-
6. please don't tag your issue with text in the title; project members will add the appropriate tags later.
14
+
15
+
Some general advice on the title and description fields for your PR
16
+
17
+
- Please don't tag your issue with text in the title like '[Joins]'; project members will add the appropriate tags later.
18
+
- Don't write text like 'Closes #xxx' in the PR title either; GitHub does not recognize this text, whereas GitHub auto-links issues in the description, [see docs](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword).
19
+
- Title and Description fields should try and be self-contained as much as possible. The title answers "what is this change" and the description provides necessary details/thought processes/things tried but abandoned. Imagine visiting your PR in 5 years' time and trying to glean what it's all about quickly and without needing to open 10 new tabs.
15
20
16
21
#### Thanks! Please remove the text above and include the two items below.
Copy file name to clipboardExpand all lines: NEWS.md
+25-9Lines changed: 25 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,6 +69,10 @@ rowwiseDT(
69
69
70
70
6.`fread()`gains`logicalYN`argumenttoreadcolumnsconsistingonlyofstrings`Y`, `N`as `logical` (asopposedtocharacter), [#4563](https://github.com/Rdatatable/data.table/issues/4563). The default is controlled by option `datatable.logicalYN`, itself defaulting to `FALSE`, for back-compatibility -- some smaller tables (especially sharded tables) might inadvertently read a "true" string column as `logical` and cause bugs. This is particularly important for tables with a column named `y` or `n` -- automatic header detection under `logicalYN=TRUE` will see these values in the first row as being "data" as opposed to column names. A parallel option was not included for `fwrite()` at this time -- users looking for a compact representation of logical columns can still use `fwrite(logical01=TRUE)`. We also opted for now to check only `Y`, `N` and not `Yes`/`No`/`YES`/`NO`.
71
71
72
+
7.`fwrite()`with`compress="gzip"`producescompatiblegzfileswhencomposedofmultipleindependentchunksowingtoparallelization, [#6356](https://github.com/Rdatatable/data.table/issues/6356). Earlier `fwrite()` versions could have issues with HTTP upload using `Content-Encoding: gzip` and `Transfer-Encoding: chunked`. Thanks to @oliverfoster for report and @philippechataignon for the fix.
73
+
74
+
8.`fwrite()`gainsanewparameter`compressLevel`tocontrolcompressionlevelforgzip, [#5506](https://github.com/Rdatatable/data.table/issues/5506). This parameter balances compression speed and total compression, and corresponds directly to the analogous command-line parameter, e.g. `compressLevel=4` corresponds to passing `-4`; the default, `6`, matches the command-line default, i.e. equivalent to passing `-6`. Thanks @mgarbuzov for the request and @philippechataignon for implementing.
75
+
72
76
## BUG FIXES
73
77
74
78
1.`fwrite()`respects`dec=','`fortimestamp columns (`POSIXct`or`nanotime`) withsub-secondaccuracy, [#6446](https://github.com/Rdatatable/data.table/issues/6446). Thanks @kav2k for pointing out the inconsistency and @MichaelChirico for the PR.
@@ -95,7 +99,7 @@ rowwiseDT(
95
99
# [1] "V1" "b" "c"
96
100
```
97
101
98
-
4.Querieslike`DT[, min(x):max(x)]`nowworkasexpected, i.e.thesameas`DT[, seq(min(x), max(x))]`or`with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for a follow-up ensuring back-compatibility.
102
+
4.Querieslike`DT[, min(x):max(x)]`nowworkasexpected, i.e.thesameas`DT[, seq(min(x), max(x))]`or`with(DT, min(x):max(x))`, [#2069](https://github.com/Rdatatable/data.table/issues/2069). Shorthand like `DT[, a:b]` meaning "select from columns `a` through `b`" still works. Thanks to @franknarf1 for reporting, @jangorecki for the fix, and @MichaelChirico for follow-ups ensuring back-compatibility.
99
103
100
104
5.`fread()`performanceimproveswhenspecifying`Date`among`colClasses`, [#6105](https://github.com/Rdatatable/data.table/issues/6105). One implication of the change is that the column will be an `IDate` (which also inherits from `Date`), which may affect code strongly relying on the column class to be `Date` exactly; computations with `IDate` and `Date` columns should otherwise be the same. If you strongly prefer the `Date` class, run `as.Date()` explicitly following `fread()`. Thanks @scipima for the report and @MichaelChirico for the fix.
101
105
@@ -111,17 +115,21 @@ rowwiseDT(
111
115
112
116
11. `tables()` now returns the correct size for data.tables over 2GiB, [#6607](https://github.com/Rdatatable/data.table/issues/6607). Thanks to @vlulla for the report and the PR.
113
117
114
-
12. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
115
-
116
-
13. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
118
+
12. `rbindlist(l, use.names=TRUE)` can now handle different encodings for the column names in different entries of `l`, [#5452](https://github.com/Rdatatable/data.table/issues/5452). Thanks to @MEO265 for the report, and Benjamin Schwendinger for the fix.
117
119
118
-
14. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
120
+
13. Added a `data.frame` method for `format_list_item()` to fix error printing data.tables with columns containing 1-column data.frames, [#6592](https://github.com/Rdatatable/data.table/issues/6592). Thanks to @r2evans for the bug report and fix.
119
121
120
-
15. Auto-printing gets some substantial improvements
122
+
14. Auto-printing gets some substantial improvements
121
123
- Suppression in `knitr` documents is now done by implementing a method for `knit_print` instead of looking up the call stack, [#6589](https://github.com/Rdatatable/data.table/pull/6589). The old way was fragile and wound up broken by some implementation changes in {knitr}. Thanks to @jangorecki for the report [#6509](https://github.com/Rdatatable/data.table/issues/6509) and @aitap for the fix.
122
124
- `print()` methods for S3 subclasses of data.table (e.g. an object of class `c("my.table", "data.table", "data.frame")`) no longer print where plain data.tables wouldn't, e.g.`myDT[, y := 2]`, [#3029](https://github.com/Rdatatable/data.table/issues/3029). The improved detection of auto-printing scenarios has the added benefit of _allowing_ print in highly explicit statements like `print(DT[, y := 2])`, obviating our recommendation since v1.9.6 to append `[]` to signal "please print me".
123
125
124
-
16.Joinsof`integer64`and`double`columnssucceedwhenthe`double`columnhaslossless`integer64`representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
126
+
15.Joinsof`integer64`and`double`columnssucceedwhenthe`double`columnhaslossless`integer64`representation, [#4167](https://github.com/Rdatatable/data.table/issues/4167) and [#6625](https://github.com/Rdatatable/data.table/issues/6625). Previously, this only worked when the double column had lossless _32-bit_ integer representation. Thanks @MichaelChirico for the reports and fix.
127
+
128
+
16.`DT[order(...)]`bettermatches`base::order()`behavior by (1) recognizingthe`method=` argument (anderroringsincethisisnotsupported) and (2) acceptingavectorof`TRUE`/`FALSE`in`decreasing=`asanalternativetousing`-a`toconvey"sort `a` decreasing", [#4456](https://github.com/Rdatatable/data.table/issues/4456). Thanks @jangorecki for the FR and @MichaelChirico for the PR.
129
+
130
+
17.Assignmentwith`:=`toanS4slotofanunder-allocateddata.tablenowworks, [#6704](https://github.com/Rdatatable/data.table/issues/6704). Thanks @MichaelChirico for the report and fix.
131
+
132
+
18.`as.data.table()`methodfor`data.frame`s (especiallythosewithextendedclasses) ismoreconsistentwith`as.data.frame()`withrespecttorentionofattributes, [#5699](https://github.com/Rdatatable/data.table/issues/5699). Thanks @jangorecki for the report and fix.
125
133
126
134
17.Groupedqueriesonkeyedtablesnolongerreturnanincorrectlykeyedresultifthe_adhoc_`by=`listhassomefunction call (inparticular, afunctionwhichhappenstoreturnastrictlydecreasingfunctionofthekeys), e.g.`by=.(a = rev(a))`, [#5583](https://github.com/Rdatatable/data.table/issues/5583). Thanks @AbrJA for the report and @MichaelChirico for the fix.
127
135
@@ -149,6 +157,14 @@ rowwiseDT(
149
157
150
158
11. Deprecation of `fread(autostart=)` has been upgraded to an error. It has been warning since v1.11.0 (6 years ago). The argument will be removed in the next release.
151
159
160
+
12. Deprecation of `droplevels(in.place=TRUE)` (warning since v1.16.0) has been upgraded from warning to error. The argument will be removed in the next release.
161
+
162
+
# data.table [v1.16.4](https://github.com/Rdatatable/data.table/milestone/36) 4 December 2024
163
+
164
+
## BUG FIXES
165
+
166
+
1. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.
167
+
152
168
# data.table [v1.16.2](https://github.com/Rdatatable/data.table/milestone/35) (9 October 2024)
153
169
154
170
## BUG FIXES
@@ -296,7 +312,7 @@ rowwiseDT(
296
312
297
313
5.Inputfilesarenowkeptopenduring`mmap()`whenrunningunderEmscripten, [emscripten-core/emscripten#20459](https://github.com/emscripten-core/emscripten/issues/20459). This avoids an error in `fread()` when running in WebAssembly, [#5969](https://github.com/Rdatatable/data.table/issues/5969). Thanks to @maek-ies for the report and @georgestagg for the PR.
a.Thisnowtriggersawarning, notamessage, sincerelyingonthisdefaultoftensignalsunexpectedduplicatesinthedata, [#5386](https://github.com/Rdatatable/data.table/issues/5386). The warning is classed as `dt_missing_fun_aggregate_warning`, allowing for more targeted handling in user code. Thanks @MichaelChirico for the suggestion and @Nj221102 for the fix.
302
318
@@ -1011,7 +1027,7 @@ rowwiseDT(
1011
1027
1012
1028
14. The options `datatable.print.class` and `datatable.print.keys` are now `TRUE` by default. They have been available since v1.9.8 (Nov 2016) and v1.11.0 (May 2018) respectively.
1013
1029
1014
-
15. Thanks to @ssh352, Václav Tlapák, Cole Miller, András Svraka and Toby Dylan Hocking for reporting and bisecting a significant performance regression in dev. This was fixed before release thanks to a PR by Jan Gorecki, [#5463](https://github.com/Rdatatable/data.table/pull/5463).
1030
+
15. Thanks to @ssh352, Václav Tlapák, Cole Miller, András Svraka and Toby Dylan Hocking for reporting and bisecting a significant performance regression in dev. This was fixed before release thanks to a PR by Jan Gorecki, [#5463](https://github.com/Rdatatable/data.table/pull/5463).
1015
1031
1016
1032
16. `key(x) <- value` is now fully deprecated (from warning to error). Use `setkey()` to set a table'skey.Westartedwarningnottousethisapproachin2012, withastrongerwarningstartingin2019 (1.12.2).Thisfunctionwillberemovedinthenextrelease.
0 commit comments