Skip to content

Commit a43f300

Browse files
committed
fread no longer skips on skip=0
1 parent 348aaf4 commit a43f300

File tree

3 files changed

+12
-3
lines changed

3 files changed

+12
-3
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,8 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T
352352

353353
26. Grouping by a factor with many groups is now fast again, fixing a timing regression introduced in [#6890](https://github.com/Rdatatable/data.table/pull/6890) where UTF-8 coercion and level remapping were performed unnecessarily, [#7404](https://github.com/Rdatatable/data.table/issues/7404). Thanks @ben-schwen for the report and fix.
354354

355+
27. `fread()` with `skip=0` and `(header=TRUE|FALSE)` no longer skips the first row when it has fewer fields than subsequent rows, [#7463](https://github.com/Rdatatable/data.table/issues/7463). Thanks @emayerhofer for the report and @ben-schwen for the fix.
356+
355357
### NOTES
356358

357359
1. The following in-progress deprecations have proceeded:

inst/tests/tests.Rraw

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21896,3 +21896,10 @@ DT = data.table(x = strings)
2189621896
setorder(DT, x)
2189721897
test(2350, DT[["x"]], sort.int(strings, method='radix'))
2189821898
rm(DT, strings)
21899+
21900+
# fread dont skip on skip=0, #7463
21901+
txt = 'a1;a2\nb1;b2;b3\nc1;c2;c3'
21902+
test(2351.1, fread(txt, skip=0), data.table(V1 = c("b1", "c1"), a1 = c("b2", "c2"), a2 = c("b3", "c3")), warning="Added an extra default column name")
21903+
test(2351.2, fread(txt, skip=0, header=TRUE), data.table(V1 = c("b1", "c1"), a1 = c("b2", "c2"), a2 = c("b3", "c3")), warning="Added an extra default column name")
21904+
test(2351.3, fread(txt, skip=0, header=FALSE), data.table(V1=character(), V2=character(), V3=character()), warning="Consider fill=TRUE")
21905+
test(2351.4, fread(txt, skip=0, fill=TRUE), data.table(V1 = c("a1", "b1", "c1"), V2 = c("a2", "b2", "c2"), V3 = c("", "b3", "c3")))

src/fread.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2190,15 +2190,15 @@ int freadMain(freadMainArgs _args)
21902190
}
21912191
}
21922192

2193-
if (args.header == NA_BOOL8 && prevStart != NULL) {
2193+
if (prevStart != NULL && (args.header == NA_BOOL8 || args.skipNrow >= 0)) {
21942194
// The first data row matches types in the row after that, and user didn't override default auto detection.
21952195
// Maybe previous line (if there is one, prevStart!=NULL) contains column names but there are too few (which is why it didn't become the first data row).
21962196
ch = prevStart;
21972197
int tt = countfields(&ch);
21982198
if (tt == ncol) INTERNAL_STOP("row before first data row has the same number of fields but we're not using it"); // # nocov
21992199
if (ch != pos) INTERNAL_STOP("ch!=pos after counting fields in the line before the first data row"); // # nocov
22002200
if (verbose) DTPRINT(_("Types in 1st data row match types in 2nd data row but previous row has %d fields. Taking previous row as column names."), tt);
2201-
if (tt < ncol) {
2201+
if (tt < ncol && args.header != false) {
22022202
autoFirstColName = (ncol - tt == 1);
22032203
if (autoFirstColName) {
22042204
DTWARN(_("Detected %d column names but the data has %d columns (i.e. invalid file). Added an extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.\n"),
@@ -2216,7 +2216,7 @@ int freadMain(freadMainArgs _args)
22162216
for (int j = ncol; j < tt; j++) { tmpType[j] = type[j] = type0; }
22172217
ncol = tt;
22182218
}
2219-
args.header = true;
2219+
if (args.header == NA_BOOL8) args.header = true;
22202220
pos = prevStart;
22212221
row1line--;
22222222
}

0 commit comments

Comments
 (0)