Skip to content
Discussion options

You must be logged in to vote

The current implementation of flowMC does not utilize multiple GPUs. The chains and network are created on the default device, and I am not sure how you are sharding the data onto multiple devices. You can see from the error message jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 156.27GiB (167788937216B) on device ordinal 0
Jax is trying to allocate 156GB of RAM on your first GPU, which is more than what it can handle. It seems your computation is a bit too big for your device. I would advice looking into reducing the memory footprint for now

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@kazewong
Comment options

Answer selected by Qazalbash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants