Skip to content

Conversation

@aitap
Copy link
Contributor

@aitap aitap commented Nov 3, 2025

Since the limit is up to 500 characters, make sure to allocate 501 bytes, including the terminator at the end, avoiding the stack overflow introduced in #7005:

data.table::fread(text = paste0(
 strrep("mary had a little lamb\n", 100),
 strrep("a", 500), "\n", "a")
)
==5358==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd0f20c7e4 at pc 0x7f06826bb311 bp 0x7ffd0f20ad90 sp 0x7ffd0f20ad88
WRITE of size 1 at 0x7ffd0f20c7e4 thread T0
    #0 0x7f06826bb310 in strlim /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/fread.c:235
    #1 0x7f06826bb310 in freadMain /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/fread.c:2947
    #2 0x7f06826c2d62 in freadR /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/freadR.c:229

Will this need a test? By itself the overflow is probably harmless, as it only writes a single zero into a place on the stack that doesn't seem to be used by anything, but it trips up sanitizers before other crashes can be reproduced, for example:

options(warn=-1)
repeat
 sample(setdiff(1:32, 26), 1024, replace = TRUE) |>
 as.raw() |>
 rawToChar() |>
 data.table::fread(text = _) |>
 try()

...eventually:

free(): invalid next size (fast)

Program received signal SIGABRT, Aborted.
WRITE of size 4 at 0x6020001de644 thread T0                                                                                                                                        #0 0x7fae27ea04f2 in Field /tmp/Rtmpd8OIyO/R.INSTALL20227793d77d/data.table/src/fread.c:592                                                                                
    #1 0x7fae27eb4711 in freadMain /tmp/Rtmpd8OIyO/R.INSTALL20227793d77d/data.table/src/fread.c:2309
    #2 0x7fae27ec2d42 in freadR /tmp/Rtmpd8OIyO/R.INSTALL20227793d77d/data.table/src/freadR.c:229                                                                              

0x6020001de644 is located 4 bytes to the right of 16-byte region [0x6020001de630,0x6020001de640)    
allocated by thread T0 here:                                                                                                                                                   
    #0 0x7fae332b83b7 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77                                                                       
    #1 0x7fae27eb445e in freadMain /tmp/Rtmpd8OIyO/R.INSTALL20227793d77d/data.table/src/fread.c:2295                                                                           
    #2 0x7fae27ec2d42 in freadR /tmp/Rtmpd8OIyO/R.INSTALL20227793d77d/data.table/src/freadR.c:229                                                                              

Since the limit is up to 500 characters, make sure to allocate 501
bytes, including the terminator at the end, avoiding the stack overflow:

data.table::fread(text = paste0(
 strrep("mary had a little lamb\n", 100),
 strrep("a", 500), "\n", "a")
)
==5358==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd0f20c7e4 at pc 0x7f06826bb311 bp 0x7ffd0f20ad90 sp 0x7ffd0f20ad88
WRITE of size 1 at 0x7ffd0f20c7e4 thread T0
    #0 0x7f06826bb310 in strlim /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/fread.c:235
    #1 0x7f06826bb310 in freadMain /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/fread.c:2947
    #2 0x7f06826c2d62 in freadR /tmp/RtmpvbbbcM/R.INSTALLd607e4c6707/data.table/src/freadR.c:229
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.13%. Comparing base (1685a3b) to head (53ccb07).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7408   +/-   ##
=======================================
  Coverage   99.13%   99.13%           
=======================================
  Files          85       85           
  Lines       16610    16618    +8     
=======================================
+ Hits        16466    16474    +8     
  Misses        144      144           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

No obvious timing issues in HEAD=strlim-overflow
Comparison Plot

Generated via commit 53ccb07

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 5 minutes and 13 seconds
Installing different package versions 11 minutes and 0 seconds
Running and plotting the test cases 2 minutes and 44 seconds

@MichaelChirico
Copy link
Member

Would the simple example you shared be useful in the test suite?

data.table::fread(text = paste0(
 strrep("mary had a little lamb\n", 100),
 strrep("a", 500), "\n", "a")
)

@ben-schwen
Copy link
Member

Would the simple example you shared be useful in the test suite?

data.table::fread(text = paste0(
 strrep("mary had a little lamb\n", 100),
 strrep("a", 500), "\n", "a")
)

I guess not since the error only surfaces with ASAN IINM

@aitap
Copy link
Contributor Author

aitap commented Nov 4, 2025

We have AddressSanitizer in GitLab CI. I don't mind adding a test if it is warranted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants