Jax JIT vs PyTorch JIT Trace vs TensorFlow Function #7630

kamalkraj · 2021-08-15T08:38:27Z

kamalkraj
Aug 15, 2021

Following code is an example from https://jax.readthedocs.io/en/latest/jax-101/03-vectorization.html

import jax
import jax.numpy as jnp

x = jnp.arange(5)
w = jnp.array([2., 3., 4.])

def convolve(x, w):
  output = []
  for i in range(1, len(x)-1):
    output.append(jnp.dot(x[i-1:i+2], w))
  return jnp.array(output)

convolve(x, w)
# DeviceArray([11., 20., 29.], dtype=float32)

If I add the @jit decorator to the convolve function it will run without any issue.

python list append is a side-effect function, right? So how exactly does this work?
If you implement the same with torch and torch.jit.trace the function won't work properly as Tracing will not record any control-flow like if-statements or loops

import torch

x = torch.arange(5,dtype=torch.float32)
w = torch.tensor([2.,3.,4])

def convolve(x, w):
  output = []
  for i in range(1, len(x)-1):
    output.append(torch.dot(x[i-1:i+2], w))
  return torch.tensor(output)

convolve(x, w)

# tensor([11., 20., 29.])

module = torch.jit.trace(convolve, (x,w))
'''
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:10: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  # Remove the CWD from sys.path while we load stuff.
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:10: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  # Remove the CWD from sys.path while we load stuff.
'''
module(x,w)
# tensor([11., 20., 29.])
x = torch.arange(10,15,dtype=torch.float32)
module(x,w)
# tensor([11., 20., 29.]) wrong output

Tensorflow

import tensorflow as tf

x = tf.range(5,dtype=tf.float32)
w = tf.Variable([2.,3.,4.])

@tf.function(jit_compile=True)
def convolve(x, w):
  output = []
  for i in range(1, len(x)-1):
    output.append(tf.tensordot(x[i-1:i+2], w,1))
  return tf.convert_to_tensor(output)

convolve(x, w)
# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([11., 20., 29.], dtype=float32)>

TensorFlow also does not recommend using any python side-effects similar to Jax
"Important: Any Python side-effects (appending to a list, printing with print, etc) will only happen once, when func is traced. To have side-effects executed into your tf.function they need to be written as TF ops:"

and if I call the convolve function with x with different length, I get a warning message from TF.

WARNING:tensorflow:6 out of the last 7 calls to <function convolve at 0x7f1221141cb0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

is Jax also retraces similar to TF when the x changes?

if refer to https://jax.readthedocs.io/en/latest/notebooks/thinking_in_jax.html#to-jit-or-not-to-jit
3rd point in Key Concepts:
"Not all JAX code can be JIT compiled, as it requires array shapes to be static & known at compile time."
In the JIT compilation convolve function x shape is not static, dynamic right?

@mattjj

Answered by jakevdp

Aug 15, 2021

Thanks for the question! JAX's tracing behavior here is perhaps a bit confusing, but essentially it will evaluate and flatten any Python control flow based on static quantities like array shapes. In the case of your convolve function. you can see this by calling jax.make_jaxpr, which prints the jaxpr for the function:

from jax import make_jaxpr
make_jaxpr(convolve)(x, w)

{ lambda  ; a b.
  let c = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] 0
      d = gather[ dimension_numbers=GatherDimensionNumbers(offset_dims=(0,), collapsed_slice_dims=(), start_index_map=(0,))
                  indices_are_sorted=True
                  slice_sizes=(3,)
        …

View full answer

jakevdp · 2021-08-15T14:22:11Z

jakevdp
Aug 15, 2021
Maintainer

Thanks for the question! JAX's tracing behavior here is perhaps a bit confusing, but essentially it will evaluate and flatten any Python control flow based on static quantities like array shapes. In the case of your convolve function. you can see this by calling jax.make_jaxpr, which prints the jaxpr for the function:

from jax import make_jaxpr
make_jaxpr(convolve)(x, w)

{ lambda  ; a b.
  let c = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] 0
      d = gather[ dimension_numbers=GatherDimensionNumbers(offset_dims=(0,), collapsed_slice_dims=(), start_index_map=(0,))
                  indices_are_sorted=True
                  slice_sizes=(3,)
                  unique_indices=True ] a c
      e = broadcast_in_dim[ broadcast_dimensions=(0,)
                            shape=(3,) ] d
      f = convert_element_type[ new_dtype=float32
                                weak_type=False ] e
      g = dot_general[ dimension_numbers=(((0,), (0,)), ((), ()))
                       precision=None
                       preferred_element_type=None ] f b
      h = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] 1
      i = gather[ dimension_numbers=GatherDimensionNumbers(offset_dims=(0,), collapsed_slice_dims=(), start_index_map=(0,))
                  indices_are_sorted=True
                  slice_sizes=(3,)
                  unique_indices=True ] a h
      j = broadcast_in_dim[ broadcast_dimensions=(0,)
                            shape=(3,) ] i
      k = convert_element_type[ new_dtype=float32
                                weak_type=False ] j
      l = dot_general[ dimension_numbers=(((0,), (0,)), ((), ()))
                       precision=None
                       preferred_element_type=None ] k b
      m = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] 2
      n = gather[ dimension_numbers=GatherDimensionNumbers(offset_dims=(0,), collapsed_slice_dims=(), start_index_map=(0,))
                  indices_are_sorted=True
                  slice_sizes=(3,)
                  unique_indices=True ] a m
      o = broadcast_in_dim[ broadcast_dimensions=(0,)
                            shape=(3,) ] n
      p = convert_element_type[ new_dtype=float32
                                weak_type=False ] o
      q = dot_general[ dimension_numbers=(((0,), (0,)), ((), ()))
                       precision=None
                       preferred_element_type=None ] p b
      r = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] g
      s = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] l
      t = broadcast_in_dim[ broadcast_dimensions=(  )
                            shape=(1,) ] q
      u = concatenate[ dimension=0 ] r s t
  in (u,) }

You'll see there is no for loop in there, but every expression implied by the for loop appears explicitly. This is what I mean by JAX tracing flattening the control flow: it simply evaluates the Python function, recording the operations that take place, and building an expression with them.

Based on this, I think the answers to your other questions may be more clear:

python list append is a side-effect function, right? So how exactly does this work?

Typically in funcitonal programming "side-effects" refer to changes to global values accessed outside the function. Here the entire life-cycle of the list output is within the scope of the jitted function, so there are no opportunities for side-effects.

is Jax also retraces similar to TF when the x changes?

Yes, jitted functions in JAX will be re-traced when faced with inputs of a new shape: this is true regardless of the content of the function.

3rd point in Key Concepts:
"Not all JAX code can be JIT compiled, as it requires array shapes to be static & known at compile time."
In the JIT compilation convolve function x shape is not static, dynamic right?

"Static" in this context refers to the staticness of a quantity within a single function call. The shape of x is static and known at compile time. If you call the function again with a different x, that new shape is also static and known at compile time, because the function is re-traced and re-compiled.

An example of a non-static shape would be something like this:

@jit
def broken(x):
  return jnp.arange(x[0])

This attemps to return an array whose shape depends on the first value of x, which is not static, and therefore this JIT compilation will fail.

Does that answer your question?

1 reply

kamalkraj Aug 15, 2021
Author

Yes. Thank you so much for the detailed and quick explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jax JIT vs PyTorch JIT Trace vs TensorFlow Function #7630

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Jax JIT vs PyTorch JIT Trace vs TensorFlow Function #7630

Uh oh!

Uh oh!

kamalkraj Aug 15, 2021

Replies: 1 comment · 1 reply

Uh oh!

jakevdp Aug 15, 2021 Maintainer

Uh oh!

kamalkraj Aug 15, 2021 Author

kamalkraj
Aug 15, 2021

Replies: 1 comment 1 reply

jakevdp
Aug 15, 2021
Maintainer

kamalkraj Aug 15, 2021
Author