Faster commit for fread test by tdhock · Pull Request #7022 · Rdatatable/data.table

tdhock · 2025-05-26T13:11:57Z

@MichaelChirico suggested adding a performance test for the improvement in #6925 (comment)
the test cases already existed; I added a Faster commit.

codecov · 2025-05-26T13:18:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.69%. Comparing base (8647d44) to head (f11c022).
Report is 4 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7022   +/-   ##
=======================================
  Coverage   98.69%   98.69%           
=======================================
  Files          79       79           
  Lines       14677    14678    +1     
=======================================
+ Hits        14486    14487    +1     
  Misses        191      191

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-05-26T13:27:03Z

Generated via commit f11c022

Download link for the artifact containing the test results: ↓ atime-results.zip

Task	Duration
R setup and installing dependencies	4 minutes and 23 seconds
Installing different package versions	9 minutes and 6 seconds
Running and plotting the test cases	2 minutes and 7 seconds

tdhock · 2025-05-26T13:35:47Z

I don't see any difference on CI

I do see is locally though (see below) so I guess it is highly system-dependent (disk response time).

Code:

git2r::checkout("~/R/data.table", "fread-file-perf-test-version-Faster")
tinfo=atime::atime_pkg_test_info("~/R/data.table")
tres=eval(tinfo$test.call[["fread(colClasses='Date') improved in #6107"]])
plot(tres)

.ci/atime/tests.R

MichaelChirico · 2025-05-26T22:47:24Z

.ci/atime/tests.R

      tmp_csv = tempfile()
      fwrite(DT, tmp_csv)
    },
+    Faster = "60a01fa65191c44d7997de1843e9a1dfe5be9f72", # First commit of the PR (https://github.com/Rdatatable/data.table/pull/6925/commits) that reduced time usage


maybe a name like FasterFS/FasterDiskIO to convey the circumstances under which this might be faster? WDYT about a targeted benchmark that might draw out the difference a bit more, e.g. "read a sharded file", something like

setwd(tempdir()) dir.create(td<-tempfile()) setwd(td) for (ii in 1:100) fwrite(iris, ii) lapply(list.files(), fread)

interesting idea

tdhock · 2025-05-26T23:07:47Z

This is the result I get on my system, we can see a constant factor difference.

Code:

git2r::checkout("~/R/data.table", "fread-file-perf-test-version-Faster")
tinfo=atime::atime_pkg_test_info("~/R/data.table")
tres=eval(tinfo$test.call[["fread disk overhead improved in #6925"]])
plot(tres)

tdhock · 2025-05-26T23:31:28Z

still difficult to see on CI

MichaelChirico

Great, this looks more faithful as a test -- scaling the number of fread() calls --> scale the number of file.info() calls. Even if this doesn't catch the specific improvement we were after, having something like this will still be a useful performance regression test.

Faster commit for fread test

9992387

tdhock requested a review from Anirban166 as a code owner May 26, 2025 13:11

tdhock mentioned this pull request May 26, 2025

Updating fread's file.info argument to avoid uname and udomain lookup. #6925

Merged

MichaelChirico reviewed May 26, 2025

View reviewed changes

.ci/atime/tests.R Show resolved Hide resolved

MichaelChirico reviewed May 26, 2025

View reviewed changes

tdhock added 3 commits May 27, 2025 01:02

replace fread test

0111830

FasterIO

61b067e

reduce N

c4ba58a

test case comments for context

f11c022

MichaelChirico approved these changes May 26, 2025

View reviewed changes

MichaelChirico merged commit db834d4 into master May 26, 2025
11 checks passed

jangorecki deleted the fread-file-perf-test-version-Faster branch September 27, 2025 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster commit for fread test#7022

Faster commit for fread test#7022
MichaelChirico merged 5 commits intomasterfrom
fread-file-perf-test-version-Faster

tdhock commented May 26, 2025

Uh oh!

codecov bot commented May 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 26, 2025 •

edited

Loading

Uh oh!

tdhock commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

MichaelChirico May 26, 2025

Uh oh!

tdhock May 26, 2025

Uh oh!

tdhock commented May 26, 2025 •

edited

Loading

Uh oh!

tdhock commented May 26, 2025

Uh oh!

MichaelChirico left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tdhock commented May 26, 2025

Uh oh!

codecov bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdhock commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MichaelChirico May 26, 2025

Choose a reason for hiding this comment

Uh oh!

tdhock May 26, 2025

Choose a reason for hiding this comment

Uh oh!

tdhock commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdhock commented May 26, 2025

Uh oh!

MichaelChirico left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented May 26, 2025 •

edited

Loading

github-actions bot commented May 26, 2025 •

edited

Loading

tdhock commented May 26, 2025 •

edited

Loading

tdhock commented May 26, 2025 •

edited

Loading