Skip to content

Conversation

@GregPlowman
Copy link
Contributor

Issue #30

Use reinterpret instead of unsafe_wrap when initializing vals field of WrappedRNG
I think unsafe_wrap uses pointers which become invalid on workers?
In any case, it seems to work and be clearer, at least to me.

Use reinterpret instead of unsafe_wrap for WrappedRNG field vals.
Correctly provides reinterpreted view into WrappedRNG field cache on Distributed workers.
Test Distributed smallcrushJulia which internally uses pmap
@GregPlowman
Copy link
Contributor Author

GregPlowman commented Jan 16, 2025

I think test failures are because this PR introduces tests using Xoshiro, but some tests run with Julia 1.6.

Options here:

  • Update testing framework to LTS Julia v1.10
  • Update test to use MersenneTwister instead of Xoshiro
  • Conditionally use RNG based on Julia version

Update: went with option 3, checking isdefined(Random, :Xoshiro)

On Julia versions <= v1.6, Xoshiro is not defined.
In this case, use MersenneTwister instead.
end
cache = Vector{T}(undef, cache_size)
fillcache(WrappedRNG{T, RNG}(rng, cache, fillarray,
unsafe_wrap(Array, convert(Ptr{UInt32}, pointer(cache)), sizeof(cache)÷sizeof(UInt32)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this pointer would become invalid. It is used right away so I don't see where a pointer would be transferred between processes. Is there something I'm missing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem arises when the cache needs to be refilled. Presumably, the intention is for vals to be a reinterpreted view of cache with the same underlying data (so when cache is updated, vals also refers to this updated cache).

I don't have a deep understanding of Julia internals, but here's my best guess of what's happening.

I suspect using unsafe_wrap with pointers creates a "normal" Vector but with its backing memory assigned to cache. This works locally. However, when WrappedRNG is serialized to remote workers, I suspect vals is copied normally and the "view" relationship to cache is lost. At this point vals is a reinterpreted copy (not view) of cache and the problem won't immediately occur. But when the cache is refilled, vals will be unchanged and not represent the data in cache.

The fix here is using reinterpret(UInt32, cache) which creates a special type. When that type is serialized to remote workers, the reinterpreted view into cache is preserved, rather than just copying the underlying data.

An alternative fix might be to reassign vals using unsafe_wrap in fillcache. I'm happy to try that fix if you prefer. However I think the explicit use of reinterpret (and the type ReinterpretArray) avoids Julia internals, is more efficient and better documents the semantic intention.

Here's some test code that shows the problem.

using Distributed
addprocs(1)

@everywhere begin
    using Random
    using RNGTest

    function test(wrng::RNGTest.WrappedRNG)
        t1 = reinterpret(UInt32, wrng.cache) == wrng.vals
        v1 = copy(wrng.vals)
        RNGTest.fillcache(wrng)
        t2 = reinterpret(UInt32, wrng.cache) == wrng.vals
        t3 = wrng.vals != v1
        return t1, t2, t3
    end
end

rng = Xoshiro()
wrng = RNGTest.wrap(rng, UInt64)

test(wrng)
remotecall_fetch(test, 2, wrng)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. So the vals array should always alias the cache array but that aliasing breaks when moving between processes. Indeed, using reinterpret seems like the right solution.

cache::Vector{T}
fillarray::Bool
vals::Vector{UInt32}
vals::Union{Vector{UInt32}, Base.ReinterpretArray{UInt32, 1, T, Vector{T}, false}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be made a ReinterpretArray unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping so too, but when the cache itself is a Vector{UInt32}, reinterpreting it just returns the cache.

cache = zeros(UInt32, 100)
reinterpret(UInt32, cache) === cache

I'm not sure how to make this cleaner.

@andreasnoack andreasnoack merged commit 6bbdad1 into JuliaRandom:master Jan 23, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants