-
Notifications
You must be signed in to change notification settings - Fork 6
Fix for Distributed use of WrappedRNG #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Distributed use of WrappedRNG #31
Conversation
Use reinterpret instead of unsafe_wrap for WrappedRNG field vals. Correctly provides reinterpreted view into WrappedRNG field cache on Distributed workers.
Test Distributed smallcrushJulia which internally uses pmap
|
I think test failures are because this PR introduces tests using Options here:
Update: went with option 3, checking |
On Julia versions <= v1.6, Xoshiro is not defined. In this case, use MersenneTwister instead.
| end | ||
| cache = Vector{T}(undef, cache_size) | ||
| fillcache(WrappedRNG{T, RNG}(rng, cache, fillarray, | ||
| unsafe_wrap(Array, convert(Ptr{UInt32}, pointer(cache)), sizeof(cache)÷sizeof(UInt32)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why this pointer would become invalid. It is used right away so I don't see where a pointer would be transferred between processes. Is there something I'm missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem arises when the cache needs to be refilled. Presumably, the intention is for vals to be a reinterpreted view of cache with the same underlying data (so when cache is updated, vals also refers to this updated cache).
I don't have a deep understanding of Julia internals, but here's my best guess of what's happening.
I suspect using unsafe_wrap with pointers creates a "normal" Vector but with its backing memory assigned to cache. This works locally. However, when WrappedRNG is serialized to remote workers, I suspect vals is copied normally and the "view" relationship to cache is lost. At this point vals is a reinterpreted copy (not view) of cache and the problem won't immediately occur. But when the cache is refilled, vals will be unchanged and not represent the data in cache.
The fix here is using reinterpret(UInt32, cache) which creates a special type. When that type is serialized to remote workers, the reinterpreted view into cache is preserved, rather than just copying the underlying data.
An alternative fix might be to reassign vals using unsafe_wrap in fillcache. I'm happy to try that fix if you prefer. However I think the explicit use of reinterpret (and the type ReinterpretArray) avoids Julia internals, is more efficient and better documents the semantic intention.
Here's some test code that shows the problem.
using Distributed
addprocs(1)
@everywhere begin
using Random
using RNGTest
function test(wrng::RNGTest.WrappedRNG)
t1 = reinterpret(UInt32, wrng.cache) == wrng.vals
v1 = copy(wrng.vals)
RNGTest.fillcache(wrng)
t2 = reinterpret(UInt32, wrng.cache) == wrng.vals
t3 = wrng.vals != v1
return t1, t2, t3
end
end
rng = Xoshiro()
wrng = RNGTest.wrap(rng, UInt64)
test(wrng)
remotecall_fetch(test, 2, wrng)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. So the vals array should always alias the cache array but that aliasing breaks when moving between processes. Indeed, using reinterpret seems like the right solution.
| cache::Vector{T} | ||
| fillarray::Bool | ||
| vals::Vector{UInt32} | ||
| vals::Union{Vector{UInt32}, Base.ReinterpretArray{UInt32, 1, T, Vector{T}, false}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be made a ReinterpretArray unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping so too, but when the cache itself is a Vector{UInt32}, reinterpreting it just returns the cache.
cache = zeros(UInt32, 100)
reinterpret(UInt32, cache) === cache
I'm not sure how to make this cleaner.
Issue #30
Use
reinterpretinstead ofunsafe_wrapwhen initializingvalsfield ofWrappedRNGI think
unsafe_wrapuses pointers which become invalid on workers?In any case, it seems to work and be clearer, at least to me.