Skip to content

Conversation

@Drvi
Copy link
Collaborator

@Drvi Drvi commented Sep 16, 2025

@btime (try runtests("test", failures_first=true, name="aaa") catch end) # in RAICode, 7,1 threads
 master: 175.964 ms (2881429 allocations: 226.19 MiB)
 PR:     101.575 ms (2823433 allocations: 219.54 MiB)

@Drvi Drvi marked this pull request as draft September 16, 2025 10:28
@Drvi Drvi marked this pull request as ready for review September 16, 2025 11:46
Copy link
Member

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

end
end

walkdir_channel = Channel{Tuple{String, FileNode}}(1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1024? (Please add a comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason it has to be 1024 exactly. Usually one would use Inf (i.e. 9223372036854775807) as the limit, but realistically, you don't want this to be your limit, this number of elements would OOM us, surely. 1024 seemed like a more reasonable limit.

@spawn walkdir_task(
$walkdir_channel, $project_root, $root_node, $ti_filter, $paths, $projectfile, $report, $verbose_results
)
for _ in 1:clamp(2*(nthreads()-(nthreads() == 1)), 1, 16) # 1 to 16 tasks, 1 if single-threaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this formula? why 2x nthreads? Since this is no longer a function of number of files, i guess we need a cap... but shouldn't the cap be a function of the number of threads (rather than fixed to 16)?

Also, what's the worst case scenario for this new formula? how would it compare to the old formula of a task per file?

we should document somewhere (as a comment here?) that we spawn N include_tasks for perf reasons, and then explain how the formula was determined

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is really tricky and this formula probably isn't optimal... my intuition is that I usually get better performance when I have more tasks than cores (hence the 2x), but in this case, unfortunately, each task is going make a lot of dynamic allocations (parsing allocates and eval-ing allocates a lot) and this means GC is likely to run and when GC runs, all threads are going to be waiting, so there is a break-even point where adding more tasks doesn't help performance because it makes GC pauses worse.

I didn't really experiment with the formula as it seemed "good enough"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants