-
Notifications
You must be signed in to change notification settings - Fork 1k
Labels
Description
# Minimal reproducible example
Somehow, an \x1a (octal \032, decimal 26, ASCII SUB) somewhere in the input may cause fread() to crash.
At the end of the input it causes an out-of-bounds read:
data.table::fread(text = paste0("foo\n", strrep("a", 4096*100), "\x1a"))
# *** caught segfault ***
# address 0x5574bcafc000, cause 'memory not mapped'
#
# Traceback:
# 1: data.table::fread(text = paste0("foo\n", strrep("a", 4096 * 100), "\032"))
# An irrecoverable exception occurred. R is aborting now ...
# Segmentation faultProgram received signal SIGSEGV, Segmentation fault.
end_of_field (ch=<optimized out>) at fread.c:584
584 while(!end_of_field(ch)) ch++; // sep, \r, \n or eof will end
(gdb) bt
#0 end_of_field (ch=<optimized out>) at fread.c:584
#1 Field (ctx=0x7fffffffa7b0) at fread.c:584
#2 0x00007ffff42c7ccd in countfields (pch=pch@entry=0x7fffffffa978) at fread.c:413
#3 0x00007ffff42cb118 in freadMain (_args=...) at fread.c:1882
#4 0x00007ffff42d1dcb in freadR (inputArg=<optimized out>, isFileNameArg=<optimized out>, sepArg=<optimized out>, decArg=<optimized out>, quoteArg=<optimized out>,
headerArg=0x5555559c7360, nrowLimitArg=0x555557f151e8, skipArg=0x555557f7f648, NAstringsArg=0x555557f150d0, stripWhiteArg=0x555557f14ea0,
skipEmptyLinesArg=0x555557f14e30, commentCharArg=0x555557f14df8, fillArg=0x555557f80410, showProgressArg=0x5555559c7398, nThreadArg=0x555557f80090,
verboseArg=0x555557f712c8, warnings2errorsArg=0x5555559c7398, logical01Arg=0x555557f148f0, logicalYNArg=0x555557f14880, selectArg=0x5555559c54b0, dropArg=0x5555559c54b0,
colClassesArg=0x5555559c54b0, integer64Arg=0x555557f14f48, encodingArg=0x555557f14ed8, keepLeadingZerosArgs=0x555557f14810, noTZasUTC=0x555557f7f5a0) at freadR.c:229
(gdb) frame 1
#1 Field (ctx=0x7fffffffa7b0) at fread.c:584
584 while(!end_of_field(ch)) ch++; // sep, \r, \n or eof will end
(gdb) p ch
$1 = 0x555558d64fff ""
I've also seen memory corruption (crashes in malloc) when it's not at the end of the field, but those are harder to reproduce. Hopefully they stem from the same problem: field size overestimated, fread() correcting embedded NUL bytes and overwriting allocator metadata.