Skip to content

fread() allergic to \x1a bytes #7407

@aitap

Description

@aitap

# Minimal reproducible example

Somehow, an \x1a (octal \032, decimal 26, ASCII SUB) somewhere in the input may cause fread() to crash.

At the end of the input it causes an out-of-bounds read:

data.table::fread(text = paste0("foo\n", strrep("a", 4096*100), "\x1a"))
#  *** caught segfault ***
# address 0x5574bcafc000, cause 'memory not mapped'
# 
# Traceback:
#  1: data.table::fread(text = paste0("foo\n", strrep("a", 4096 * 100),     "\032"))
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation fault
Program received signal SIGSEGV, Segmentation fault.
end_of_field (ch=<optimized out>) at fread.c:584
584         while(!end_of_field(ch)) ch++;  // sep, \r, \n or eof will end
(gdb) bt
#0  end_of_field (ch=<optimized out>) at fread.c:584
#1  Field (ctx=0x7fffffffa7b0) at fread.c:584
#2  0x00007ffff42c7ccd in countfields (pch=pch@entry=0x7fffffffa978) at fread.c:413
#3  0x00007ffff42cb118 in freadMain (_args=...) at fread.c:1882
#4  0x00007ffff42d1dcb in freadR (inputArg=<optimized out>, isFileNameArg=<optimized out>, sepArg=<optimized out>, decArg=<optimized out>, quoteArg=<optimized out>,
    headerArg=0x5555559c7360, nrowLimitArg=0x555557f151e8, skipArg=0x555557f7f648, NAstringsArg=0x555557f150d0, stripWhiteArg=0x555557f14ea0,
    skipEmptyLinesArg=0x555557f14e30, commentCharArg=0x555557f14df8, fillArg=0x555557f80410, showProgressArg=0x5555559c7398, nThreadArg=0x555557f80090,
    verboseArg=0x555557f712c8, warnings2errorsArg=0x5555559c7398, logical01Arg=0x555557f148f0, logicalYNArg=0x555557f14880, selectArg=0x5555559c54b0, dropArg=0x5555559c54b0,
    colClassesArg=0x5555559c54b0, integer64Arg=0x555557f14f48, encodingArg=0x555557f14ed8, keepLeadingZerosArgs=0x555557f14810, noTZasUTC=0x555557f7f5a0) at freadR.c:229
(gdb) frame 1
#1  Field (ctx=0x7fffffffa7b0) at fread.c:584
584         while(!end_of_field(ch)) ch++;  // sep, \r, \n or eof will end
(gdb) p ch
$1 = 0x555558d64fff ""

I've also seen memory corruption (crashes in malloc) when it's not at the end of the field, but those are harder to reproduce. Hopefully they stem from the same problem: field size overestimated, fread() correcting embedded NUL bytes and overwriting allocator metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions