Skip to content

Commit fd626b5

Browse files
Allow row names when header is detected by fread() (#3455)
It appears though we already allowed the row names in `fread()` but only in the case of the space separator. In this PR we allow row names for any separator, document this behavior and add a test. Closes #3453
1 parent bf06212 commit fd626b5

File tree

3 files changed

+17
-2
lines changed

3 files changed

+17
-2
lines changed

docs/api/dt/fread.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,9 @@
8888
If `True` then the first line of the CSV file contains the header.
8989
If `False` then there is no header. By default the presence of the
9090
header is heuristically determined from the contents of the file.
91+
When the number of column names in the header is one less than
92+
the actual number of columns, the first column is assumed
93+
to contain row names and gets the name "index".
9194

9295
na_strings: List[str]
9396
The list of strings that were used in the input file to represent

src/core/csv/reader_fread.cc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -920,7 +920,7 @@ void FreadReader::parse_column_names(dt::read::ParseContext& ctx) {
920920
if (ilen > 0) {
921921
preframe.column(i).set_name(std::string(start, start + ilen));
922922
}
923-
// Skip the separator, handling special case of sep=' ' (multiple spaces are
923+
// Skip the separator, handling special case of sep == ' ' (multiple spaces are
924924
// treated as a single separator, and spaces at the beginning/end of line
925925
// are ignored).
926926
if (ch < eof && sep == ' ' && *ch == ' ') {
@@ -939,7 +939,10 @@ void FreadReader::parse_column_names(dt::read::ParseContext& ctx) {
939939
}
940940
}
941941

942-
if (sep == ' ' && ncols_found == ncols - 1) {
942+
// When the number of column names in the header is one less than
943+
// the actual number of columns, the first column is assumed
944+
// to contain row names and gets the name "index".
945+
if (ncols_found == ncols - 1) {
943946
for (size_t j = ncols - 1; j > 0; j--){
944947
preframe.column(j).swap_names(preframe.column(j-1));
945948
}

tests/fread/test-fread-api.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -951,6 +951,15 @@ def test_fread_header():
951951
assert d1.to_list() == [["A", "1"], ["B", "2"]]
952952

953953

954+
def test_fread_header_rownames():
955+
inp = 'digit1,digit2\none,1,0\ntwo,2,0'
956+
d0 = dt.fread(inp, header=None)
957+
d1 = dt.fread(inp, header=True)
958+
assert_equals(d0, d1)
959+
assert d0.to_list() == [["one", "two"], [1, 2], [0, 0]]
960+
assert d0.names == ("index", "digit1", "digit2")
961+
962+
954963

955964
#-------------------------------------------------------------------------------
956965
# `skip_to_line/string`

0 commit comments

Comments
 (0)