Skip to content

Faster commit for fread test#7022

Merged
MichaelChirico merged 5 commits intomasterfrom
fread-file-perf-test-version-Faster
May 26, 2025
Merged

Faster commit for fread test#7022
MichaelChirico merged 5 commits intomasterfrom
fread-file-perf-test-version-Faster

Conversation

@tdhock
Copy link
Member

@tdhock tdhock commented May 26, 2025

@MichaelChirico suggested adding a performance test for the improvement in #6925 (comment)
the test cases already existed; I added a Faster commit.

@codecov
Copy link

codecov bot commented May 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.69%. Comparing base (8647d44) to head (f11c022).
Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7022   +/-   ##
=======================================
  Coverage   98.69%   98.69%           
=======================================
  Files          79       79           
  Lines       14677    14678    +1     
=======================================
+ Hits        14486    14487    +1     
  Misses        191      191           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented May 26, 2025

Comparison Plot

Generated via commit f11c022

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 4 minutes and 23 seconds
Installing different package versions 9 minutes and 6 seconds
Running and plotting the test cases 2 minutes and 7 seconds

@tdhock
Copy link
Member Author

tdhock commented May 26, 2025

I don't see any difference on CI
image

I do see is locally though (see below) so I guess it is highly system-dependent (disk response time).
image
Code:

git2r::checkout("~/R/data.table", "fread-file-perf-test-version-Faster")
tinfo=atime::atime_pkg_test_info("~/R/data.table")
tres=eval(tinfo$test.call[["fread(colClasses='Date') improved in #6107"]])
plot(tres)

tmp_csv = tempfile()
fwrite(DT, tmp_csv)
},
Faster = "60a01fa65191c44d7997de1843e9a1dfe5be9f72", # First commit of the PR (https://github.com/Rdatatable/data.table/pull/6925/commits) that reduced time usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a name like FasterFS/FasterDiskIO to convey the circumstances under which this might be faster? WDYT about a targeted benchmark that might draw out the difference a bit more, e.g. "read a sharded file", something like

setwd(tempdir())
dir.create(td<-tempfile())
setwd(td)
for (ii in 1:100) fwrite(iris, ii)
lapply(list.files(), fread)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea

@tdhock
Copy link
Member Author

tdhock commented May 26, 2025

This is the result I get on my system, we can see a constant factor difference.
image
Code:

git2r::checkout("~/R/data.table", "fread-file-perf-test-version-Faster")
tinfo=atime::atime_pkg_test_info("~/R/data.table")
tres=eval(tinfo$test.call[["fread disk overhead improved in #6925"]])
plot(tres)

@tdhock
Copy link
Member Author

tdhock commented May 26, 2025

still difficult to see on CI
image

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this looks more faithful as a test -- scaling the number of fread() calls --> scale the number of file.info() calls. Even if this doesn't catch the specific improvement we were after, having something like this will still be a useful performance regression test.

@MichaelChirico MichaelChirico merged commit db834d4 into master May 26, 2025
11 checks passed
@jangorecki jangorecki deleted the fread-file-perf-test-version-Faster branch September 27, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants