-
Notifications
You must be signed in to change notification settings - Fork 37
Atomic accumulators #1137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: breaking
Are you sure you want to change the base?
Atomic accumulators #1137
Conversation
* Make InitContext work with OnlyAccsVarInfo * Do not convert NamedTuple to Dict * remove logging * Enable InitFromPrior and InitFromUniform too * Fix `infer_nested_eltype` invocation
Benchmark Report for Commit aceec5dComputer InformationBenchmark Results |
That's also assuming that people know of and use the opt out. I bet a lot of people wouldn't realise it's there. I wonder if the overheads for atomic accs might be mostly because of |
Warning
This PR is not really meant for merging, it's mainly for discussion.
This PR replaces ThreadSafeVarInfo (which is essentially a vector of accumulators) with atomic accumulators, that is,
mutable structaccumulators whose fields can be added to atomically.(This idea was previously discussed in #1078, and quite a few of the ideas there will reappear here.)
Why?
Central to #1132 is the idea of removing Metadata. Since metadata has been removed, that means that at least for (Fast)LDF purposes, there is no longer anything blocking threaded assume tilde-statements.*
Here is a proof of principle. I want to be super clear in saying that the code changes in this PR are not needed to enable threaded assume. ThreadSafeVarInfo already works correctly, and the snippet below can be run on either #1132 or this PR. Benchmarks are provided below for both.
* There is one blocker, which is that to generate a FastLDF, you need a metadata to read the ranges and linked statuses. But you can't generate the metadata if there are threaded assume statements. The snippet here works around this by manually constructing the ranges. Solving this properly would require some surgery on
get_range_and_linked, but should be doable (and indeed I believe that we should ultimately decoupleget_range_and_linkedfrom a metadata).Performance
(This code is very poorly written, btw, it would be much better if
expected_logpdfandget_ranges_and_linkeddispatched on the model function. And yes the trivial model has a bogusy=nothingjust to make those functions work.)TSVI code is run with #1132, atomic accs code with this PR. Julia 1.11.7. Number of threads as shown below.
TSVI or atomic accs?
Based on the above benchmarks, it looks like TSVI is faster when something threadsafe needs to be done (especially for gradients). But when there's no threadsafe evaluation needed (trivial model), TSVI introduces way more overhead.
It's not shown above, but I also tried making the trivial model much larger with fifty
Normal()variables (in serial). TSVI again introduced way more overhead for evaluation (3.5x more than atomic accs), but oddly enough TSVI gradient was a slight bit faster.Here's a table with some pros and cons:
FastLDFthreadidindexingThreads.@threads, and we're quite lucky to not have run into correctness issues so far with TSVI. But that might come back to bite us one day.convert_eltypeissuefloat_type_with_fallback(eltype(params))thing.