You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -340,7 +340,9 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
340
340
341
341
20. `forderv` could segfault on keys with long runs of identical bytes (e.g., many duplicate columns) because the single-group branch tail-recursed radix-by-radix until the C stack ran out, [#4300](https://github.com/Rdatatable/data.table/issues/4300). This is a major problem since sorting is extensively used in `data.table`. Thanks @quantitative-technologies for the report and @ben-schwen for the fix.
342
342
343
-
21. `setDTthreads(percent=)` and `setDTthreads(threads=)` now respect `OMP_NUM_THREADS` and `omp_get_max_threads()`, ensuring consistency with `setDTthreads()` (no arguments) when OpenMP environment variables are set, [#7165](https://github.com/Rdatatable/data.table/issues/7165). Previously, explicitly setting a thread count or percentage would ignore these OpenMP limits, potentially exceeding the user'sintendedthreadcap.Thanksto@bastisticianforthereportand@ben-schwenforthefix.
343
+
21. `[` now preserves existing key(s) when new columns are added before them, instead of incorrectly setting a new column as key, [#7364](https://github.com/Rdatatable/data.table/issues/7364). Thanks @czeildi for the bug report and the fix.
344
+
345
+
22. `setDTthreads(percent=)` and `setDTthreads(threads=)` now respect `OMP_NUM_THREADS` and `omp_get_max_threads()`, ensuring consistency with `setDTthreads()` (no arguments) when OpenMP environment variables are set, [#7165](https://github.com/Rdatatable/data.table/issues/7165). Previously, explicitly setting a thread count or percentage would ignore these OpenMP limits, potentially exceeding the user'sintendedthreadcap.Thanksto@bastisticianforthereportand@ben-schwenforthefix.
344
346
345
347
### NOTES
346
348
@@ -538,6 +540,8 @@ rowwiseDT(
538
540
539
541
21.`setDT(get0('var'))`nowcorrectlymodifies`var`byreference, consistentwiththelong-standingbehaviorof`setDT(get('var'))`, [#6864](https://github.com/Rdatatable/data.table/issues/6864). Thanks to @rikivillalba for the report and @venom1204 for the fix.
540
542
543
+
22.`fread()`couldfailtoreadMacCSV files (with`\r`lineendings) ifthefilecontainedany`\n`character, suchasafinal`\r\n`.Thiswasfixedbydetectingthepredominantlineendinginasampleofthefile, [#4186](https://github.com/Rdatatable/data.table/issues/4186). Thanks to @MPagel for the report and @ben-schwen for the fix.
Copy file name to clipboardExpand all lines: src/fread.c
+27-4Lines changed: 27 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1628,12 +1628,35 @@ int freadMain(freadMainArgs _args)
1628
1628
if (verbose) DTPRINT(_("[04] Arrange mmap to be \\0 terminated\n"));
1629
1629
1630
1630
// First, set 'eol_one_r' for use by eol() to know if \r-only line ending is allowed, #2371
1631
+
// Count different line ending types to handle mixed endings (e.g. Mac CSV with mostly \r and final \r\n) #4186
1632
+
intcount_r_only=0; // \r not followed by \n
1633
+
intcount_with_n=0; // \n with or without \r
1631
1634
ch=sof;
1632
-
while (ch<eof&&*ch!='\n') ch++;
1633
-
eol_one_r= (ch==eof);
1635
+
constchar*sample_end=eof;
1636
+
if ((size_t)(eof-sof) >100000) sample_end=sof+100000; // Sample first 100KB or whole file if smaller
1637
+
while (ch<sample_end) {
1638
+
if (*ch=='\r') {
1639
+
// Skip consecutive \r to avoid miscounting \r\r\n as multiple line endings
1640
+
while (ch<sample_end&&*ch=='\r') ch++;
1641
+
if (ch<sample_end&&*ch=='\n') {
1642
+
count_with_n++;
1643
+
ch++;
1644
+
} else {
1645
+
count_r_only++;
1646
+
}
1647
+
} elseif (*ch=='\n') {
1648
+
count_with_n++;
1649
+
ch++;
1650
+
} else {
1651
+
ch++;
1652
+
}
1653
+
}
1654
+
// If file has mostly \r-only line endings, treat \r as line ending
1655
+
eol_one_r= (count_r_only>count_with_n);
1634
1656
if (verbose) DTPRINT(eol_one_r ?
1635
-
_(" No \\n exists in the file at all, so single \\r (if any) will be taken as one line ending. This is unusual but will happen normally when there is no \\r either; e.g. a single line missing its end of line.\n") :
1636
-
_(" \\n has been found in the input and different lines can end with different line endings (e.g. mixed \\n and \\r\\n in one file). This is common and ideal.\n"));
1657
+
_(" An \\r by itself will be taken as one line ending (counts: %d \\r by themselves vs %d [\\r]*\\n). This happens with old Mac CSV or when there is no \\r at all.\n") :
1658
+
_(" \\n has been found in the input (counts: %d \\r by themselves vs %d [\\r]*\\n) and different lines can end with different line endings (e.g. mixed \\n and \\r\\n in one file). This is common and ideal.\n"),
0 commit comments