Skip to content

fread omitting rows without warning #7463

@emayerhofer

Description

@emayerhofer

When reading in datasets that have different number of rows, with the first row(s) having less columns than the other rows, e.g.

fread("row1_1 row1_2 row1_3\nrow2_1 row2_2 row2_3 row2_4")

V1 row1_1 row1_2 row1_3
row2_1 row2_2 row2_3 row2_4
Warning message:
In fread("row1_1\trow1_2\trow1_3\nrow2_1\trow2_2\trow2_3\trow2_4", :
Detected 3 column names but the data has 4 columns (i.e. invalid file). Added an extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.

Without any arguments, this emits a warning and attempts to fix the header, which is I think the intended behavior.

However, when supplying header=FALSE, the first row is just omitted :
fread("row1_1 row1_2 row1_3\nrow2_1 row2_2 row2_3 row2_4", header = FALSE)

V1 V2 V3 V4
row2_1 row2_2 row2_3 row2_4

When supplying header=TRUE, the first row is also omitted and the second row is interpreted as header:
fread("row1_1 row1_2 row1_3\nrow2_1 row2_2 row2_3 row2_4", header = TRUE)

Empty data.table (0 rows and 4 cols): row2_1,row2_2,row2_3,row2_4

I think that header=TRUE should yield the same result as not supplying header argument; and header=FALSE should either emit a warning/error and/or treat incomplete rows as such.

fread() also eats up the first row if the second row has less entries:
fread("row1_1 row1_2 row1_3 row1_4\nrow2_1 row2_2 row2_3", header = FALSE)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions