Zero Gradient in Physics/NN Hybrid Model #9981

oconnoob · 2022-03-21T17:02:15Z

oconnoob
Mar 21, 2022

Hi there,

Currently, I'm trying to solve a physics/NN hybrid model problem using JAX.

N.b. here's a link to a Colab notebook containing the relevant code

Problem setup

I have physical equations that calculate the distance a projectile travels given initial velocity and angle, and what I'd like to do is train a prediction network to provide these control parameters given a target distance (as described here).

The Issue

I used a simple MLP with ReLU activation, except for the last layer which has no activation. I set velocities to be in the range (0,10] m/s and launch angles to be in the range [1e-6, pi/2] radians; however, my prediction network was outputting control parameters that were negative.

I tried adding ReLU as activation to the last layer and was getting a 0 gradient, and then I tried manually squeezing the predictions to be in this range in the loss function, also getting a 0 gradient.

What I Tried

I understand that a 0 gradient is sometimes gotten when the output of a function does not depend on the inputs. I have made sure that all dtypes are float32, but I cannot figure out why squeezing the values causes a 0 gradient.

If anybody understands the problem (or another way to ensure my prediction network only predicts in the given range) that would be greatly appreciated! Thank you!

jakevdp · 2022-03-21T17:12:42Z

jakevdp
Mar 21, 2022
Maintainer

Hi - thanks for the question. I looked through the code, and my suspicion is that the issue comes from the fact that your loss function relies on squeezing your predictions into a certain range, such that the gradient of the output with respect to the input will be zero outside this range (i.e. a change to the input value does not affect the output value in this region).

This is similar to the issue discussed here, where gradients are zero in the presence of jnp.greater.

I'd suggest that rather than a hard step-function cutoff to keep your values in range, you might try a soft cutoff, perhaps using something like a sigmoid window. Then the loss function will be smooth and differentiable and still yield values in or near the desired range.

Another thought, though: if the non-clipped loss function is consistently yielding values outside a physically reasonable range, it might be that there is some sort of bug in the problem setup.

0 replies

YouJiacheng · 2022-03-21T18:40:10Z

YouJiacheng
Mar 21, 2022

Most exist works use tanh to ensure outputs are in the given range.
BTW, please carefully initialize your MLP, or it will blow up/vanish. (you can just print output to adjust initialization)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zero Gradient in Physics/NN Hybrid Model #9981

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Zero Gradient in Physics/NN Hybrid Model #9981

Uh oh!

oconnoob Mar 21, 2022

Problem setup

The Issue

What I Tried

Replies: 2 comments

Uh oh!

Uh oh!

jakevdp Mar 21, 2022 Maintainer

Uh oh!

YouJiacheng Mar 21, 2022

oconnoob
Mar 21, 2022

jakevdp
Mar 21, 2022
Maintainer

YouJiacheng
Mar 21, 2022