Asynchronous dispatch in MPMDs? #6678

epignatelli · 2021-05-07T08:49:20Z

epignatelli
May 7, 2021

One usual application of asynchronous computation in reinforcement learning is asynchronous methods, e.g. A3C and IMPALA.
Multiple actors are collecting experience in multiple environments concurrently using the GPU, and data is often gathered either on the cpu or on separate, master GPU.
I was trying to implement these using MPMD in jax, but I am not sure what I am writing work as I thought.

What rules does the asynch dispatch in jax follow for MPMD?
In particular, if I have two python calls, each of which makes two xla calls to two separate devices, will they be executed asynchronously?

I tried to run some tests using the code below, and it seems like the second GPU is waiting for the first GPU?

Here's the colab equivalent:
https://colab.research.google.com/drive/1TFPGndv6UaGsL_S0s7h1Bh5lfZYhwtB5?usp=sharing

import jax
import jax.numpy as jnp

def a(x):
    x = jax.lax.fori_loop(0, 100, lambda i, x: jnp.sqrt(jnp.log(x * x)), x)
    return x

def c(x1, x2):
    return x1

devices = jax.devices()
a_jit_dev_0 = jax.jit(a, device=devices[0])
a_jit_dev_1 = jax.jit(a, device=devices[1])
c_jit_dev_2 = jax.jit(c, device=devices[2])

x = jax.random.normal(jax.random.PRNGKey(0), (2000, 2000))
x_dev_0 = jax.device_put(x, devices[0])
x_dev_1 = jax.device_put(x, devices[1])

Control

%%timeit
ar = a_jit_dev_0(x_dev_0).block_until_ready()
br = a_jit_dev_1(x_dev_1).block_until_ready()

ar = a_jit_dev_0(x_dev_0)
br = a_jit_dev_1(x_dev_1)

%%timeit 
cr = c_jit_dev_2(ar, br).block_until_ready()

Multiple programs, single GPU, twice, then reduce on separate GPU

%%timeit
ar = a_jit_dev_0(x_dev_0)
br = a_jit_dev_0(x_dev_0)
cr = c_jit_dev_2(ar, br).block_until_ready()

Multiple programs, multiple GPUs, then reduce on a separate GPU

%%timeit
ar = a_jit_dev_0(x_dev_0)
br = a_jit_dev_1(x_dev_1)
cr = c_jit_dev_2(ar, br).block_until_ready()

Will the call to a_jit_dev_0 and a_jit_dev_1 be executed asynchronously?
From a quick tests it seems like the computation flickers between the two cpus.

epignatelli · 2021-06-09T10:21:58Z

epignatelli
Jun 9, 2021
Author

This does not answer the question directly, but maybe adds more information on the original issue of implementing actor-learner architectures in RL.

https://jax.readthedocs.io/en/latest/multi_process.html
(Jax docs are amazing)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Asynchronous dispatch in MPMDs? #6678

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Asynchronous dispatch in MPMDs? #6678

Uh oh!

Uh oh!

epignatelli May 7, 2021

Control

Multiple programs, single GPU, twice, then reduce on separate GPU

Multiple programs, multiple GPUs, then reduce on a separate GPU

Replies: 1 comment

Uh oh!

epignatelli Jun 9, 2021 Author

epignatelli
May 7, 2021

epignatelli
Jun 9, 2021
Author