Skip to content

Commit eda3b09

Browse files
authored
fix fread(..., nrows=0) infinite loop (#5869)
* fix fread infinite loop nrows=0
1 parent d5090f9 commit eda3b09

File tree

3 files changed

+10
-4
lines changed

3 files changed

+10
-4
lines changed

NEWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@
299299

300300
2. `print(DT, trunc.cols=TRUE)` and the corresponding `datatable.print.trunc.cols` option (new feature 3 in v1.13.0) could incorrectly display an extra column, [#4266](https://github.com/Rdatatable/data.table/issues/4266). Thanks to @tdhock for the bug report and @MichaelChirico for the PR.
301301

302-
3. `fread(..., nrows=0L)` now works as intended and the same as `nrows=0`; i.e. returning the column names and typed empty columns determined by the large sample, [#4686](https://github.com/Rdatatable/data.table/issues/4686), [#4029](https://github.com/Rdatatable/data.table/issues/4029). Thanks to @hongyuanjia and @michaelpaulhirsch for reporting, and Benjamin Schwendinger for the PR.
302+
3. `fread(..., nrows=0L)` now works as intended and the same as `nrows=0`; i.e. returning the column names and typed empty columns determined by the large sample, [#4686](https://github.com/Rdatatable/data.table/issues/4686), [#4029](https://github.com/Rdatatable/data.table/issues/4029). Thanks to @hongyuanjia and @michaelpaulhirsch for reporting, and Benjamin Schwendinger for the PR. Also thanks to @HughParsonage for testing dev and reporting a bug which was fixed before release.
303303

304304
4. Passing `.SD` to `frankv()` with `ties.method='random'` or with `na.last=NA` failed with `.SD is locked`, [#4429](https://github.com/Rdatatable/data.table/issues/4429). Thanks @smarches for the report.
305305

inst/tests/tests.Rraw

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13284,6 +13284,12 @@ test(1958.19, fread(text=txt, nrows=0L, verbose=TRUE), data.table(A=integer(), B
1328413284
txt = paste(c("A,B\n1,2\n", rep("3,4\n",5100), "3,4.1\n", rep("5,6\n",4900)), collapse="")
1328513285
test(1958.20, fread(text=txt, nrows=0L, verbose=TRUE), data.table(A=integer(), B=integer()),
1328613286
output="Sampled 1049 rows.*at 11 jump points")
13287+
# invalid head position for nrows=0 #5868
13288+
# following example has to be big enough to trigger a jump and enough blanks so we get an invalid header position after a jump (at least n=599101 for following two cols)
13289+
test(1958.21, fread(sprintf('%s%s\n', 'a,b', paste0('\n', sample(c('', '3'), 6e5L, TRUE), sample(c(',a',',b',',c'), 6e5L, TRUE), collapse="")), nrows=0L),
13290+
data.table(a=integer(), b=character()))
13291+
test(1958.22, fread(sprintf('%s%s\n', 'a,b', paste0('\n', sample(c('', '3'), 6e5L, TRUE), sample(c(',a',',b',',c'), 6e5L, TRUE), collapse="")), nrows=0L, nThread=1L),
13292+
data.table(a=integer(), b=character()))
1328713293

1328813294
# Skip should work with all types of newlines #3006
1328913295
eols = c("\n", "\r\n", "\r", "\n\r")

src/fread.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2431,7 +2431,7 @@ int freadMain(freadMainArgs _args) {
24312431
if (stopTeam) { // A previous thread stopped while I was waiting my turn to enter ordered
24322432
myNrow = 0; // # nocov; discard my buffer
24332433
}
2434-
else if (headPos!=thisJumpStart) {
2434+
else if (headPos!=thisJumpStart && nrowLimit>0) { // do not care for dirty jumps since we do not read data and only want to know types
24352435
// # nocov start
24362436
snprintf(internalErr, internalErrSize, _("Internal error: invalid head position. jump=%d, headPos=%p, thisJumpStart=%p, sof=%p"), jump, (void*)headPos, (void*)thisJumpStart, (void*)sof);
24372437
stopTeam = true;
@@ -2503,7 +2503,7 @@ int freadMain(freadMainArgs _args) {
25032503
}
25042504
stopTeam = false;
25052505

2506-
if (extraAllocRows) {
2506+
if (extraAllocRows && nrowLimit>0) { // no allocating needed for nrows=0
25072507
allocnrow += extraAllocRows;
25082508
if (allocnrow > nrowLimit) allocnrow = nrowLimit;
25092509
if (verbose) DTPRINT(_(" Too few rows allocated. Allocating additional %"PRIu64" rows (now nrows=%"PRIu64") and continue reading from jump %d\n"),
@@ -2512,7 +2512,7 @@ int freadMain(freadMainArgs _args) {
25122512
extraAllocRows = 0;
25132513
goto read;
25142514
}
2515-
if (restartTeam) {
2515+
if (restartTeam && nrowLimit>0) { // no restarting needed for nrows=0 since we discard read data anyway
25162516
if (verbose) DTPRINT(_(" Restarting team from jump %d. nSwept==%d quoteRule==%d\n"), jump0, nSwept, quoteRule);
25172517
ASSERT(nSwept>0 || quoteRuleBumpedCh!=NULL, "Internal error: team restart but nSwept==%d and quoteRuleBumpedCh==%p", nSwept, quoteRuleBumpedCh); // # nocov
25182518
goto read;

0 commit comments

Comments
 (0)