Another occurence of NaNs case #19027

svarunid · 2023-12-17T17:31:20Z

svarunid
Dec 17, 2023

I'm trying to implement the Transformer architecture from scratch. I find three issues while training:

jax.disable_jit() does not remove implicit jit compilations.
Why does jax.nn.softmax calls _softmax_deprecated by default?
I'm encountering NaNs in in subtraction inside _softmax_deprecated: unnormalized = jnp.exp(x - lax.stop_gradient(x_max))

I'll attach code for your reference if needed:

class SelfAttention(eqx.Module):
    def __call__(self, query, key, value, mask):
        scaled_dot_prod = query @ jnp.transpose(key, (0, 2, 1)) / jnp.sqrt(query.shape[-1])
        scaled_dot_prod = mask + scaled_dot_prod
        return (jax.nn.softmax(scaled_dot_prod) @ value)

def create_mask(arr):
    return jnp.where(arr == 0, np.NINF, 0)

def loss(model, X, y, X_mask, y_mask, labels):
    y_pred = jnp.log(predict(model, X, y, X_mask, y_mask))
    y_pred = jnp.where(labels==0, 0, jnp.take(y_pred, labels, axis=-1))
    count = jnp.count_nonzero(y_pred)
    return -jnp.sum(y_pred)/count

with jax.disable_jit():
    for e in range(EPOCHS):
        total_loss = 0
        num_batches = 0
        total_tokens = 0
        for i, (Xbt, ybt, labelbt) in enumerate(dataloader(Xtr, ytr, SEQ_LEN)):
            total_tokens += len([token for seq in labelbt for token in list(filter(lambda x: x!=0, seq))])
            Xbt, ybt, labelbt = [jnp.array(x) for x in (Xbt, ybt, labelbt)]
            Xmask, ymask = [create_mask(x) for x in (Xbt, ybt)]

            model, opt_state, batch_loss = step(model, opt_state, Xbt, ybt, Xmask, ymask, labelbt)
            total_loss += batch_loss
            num_batches += 1

            if num_batches % 20 == 0:
                print(f"Batches trained: {num_batches} | Avg. Batch loss: {total_loss/num_batches}")

        epoch_loss = total_loss / num_batches
        print(f"Epoch {e} | loss: {epoch_loss}")

Error:

def _softmax_deprecated(
    478     x: ArrayLike,
    479     axis: Optional[Union[int, tuple[int, ...]]] = -1,
    480     where: Optional[ArrayLike] = None,
    481     initial: Optional[ArrayLike] = None) -> Array:
    482   x_max = jnp.max(x, axis, where=where, initial=initial, keepdims=True)
--> 483   unnormalized = jnp.exp(x - lax.stop_gradient(x_max))
    484   result = unnormalized / jnp.sum(unnormalized, axis, where=where, keepdims=True)
    485   if where is not None:

FloatingPointError: invalid value (nan) encountered in jit(sub)

Answered by svarunid

Dec 19, 2023

I resolved my issue. The error was occuring due to a bug in data preprocessing. Hence, the forward pass didn't go well.

View full answer

jakevdp · 2023-12-18T19:18:41Z

jakevdp
Dec 18, 2023
Maintainer

Hello - I think I answered your question on StackOverflow this morning here: https://stackoverflow.com/a/77680900/2937831

One question: could you say more about what leads you to believe that disable_jit is not disabling implicit JIT computations? If that is the case, I would consider it a bug. Can you give a concise reproduction that demonstrates that issue?

2 replies

svarunid Dec 19, 2023
Author

From the above error, FloatingPointError: invalid value (nan) encountered in jit(sub) the jit(sub) part should be disabled while using jax.disable_jit right? Am I wrong?

svarunid Dec 19, 2023
Author

I resolved my issue. The error was occuring due to a bug in data preprocessing. Hence, the forward pass didn't go well.

Answer selected by svarunid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Another occurence of NaNs case #19027

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Another occurence of NaNs case #19027

Uh oh!

Uh oh!

svarunid Dec 17, 2023

Replies: 1 comment · 2 replies

Uh oh!

jakevdp Dec 18, 2023 Maintainer

Uh oh!

svarunid Dec 19, 2023 Author

Uh oh!

svarunid Dec 19, 2023 Author

svarunid
Dec 17, 2023

Replies: 1 comment 2 replies

jakevdp
Dec 18, 2023
Maintainer

svarunid Dec 19, 2023
Author

svarunid Dec 19, 2023
Author