Two-stage grads - implementing DETR #7446

sholtodouglas · 2021-08-02T14:46:02Z

sholtodouglas
Aug 2, 2021

I'm implementing DETR, and a key part of the loss function is dynamic linear sum assignment between the outputs and the labels (effectively sorting the labels to best match the outputs generated, because the labels have no required order). This sorting is non-differentiable, and would be required to be retraced each time - so when using TPUs the best solution I arrived at (which could well be blindly wrong) was to do the forward pass on the TPU cores, put the outputs back on cpu and match by distributing using Ray and 8 cpus, and perform the loss calculation back on the TPU.

  # model forward pass, traced and therefore very fast - on the 8 TPU cores
  outputs = pmap(forward)(params, inputs, rngs = dropout_rngs)
  # sort the labels, accelerate by using ray to distribute the matching process across 8 cpus (near linear speedup)
  labels = match_labels(outputs, labels)
  # calculate the loss
  loss, grad = pmap(jax.value_and_grad(loss_fn))(outputs, labels)

Now - this doesn't work because it gives the gradients w.r.t to the outputs, not the original parameters! However, we can't wrap the entire function in a grad call and pmap it - because the matching process won't work on device (nor does it seem to work to wrap it a with no jit call).

At the moment, the best solution would appear to be re-running the forward pass and loss function together using the now sorted labels (and the same dropout seed) - but this whole process is beginning to feel very inefficient, so I thought I'd sense check myself here. Is there a better way to do this? (The original pytorch implementation just takes .backward() from the loss, which traces back to the original forward pass).

Thanks :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Two-stage grads - implementing DETR #7446

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Two-stage grads - implementing DETR #7446

Uh oh!

sholtodouglas Aug 2, 2021

Replies: 0 comments

sholtodouglas
Aug 2, 2021