Replies: 2 comments
-
Hi - thanks for the question. I looked through the code, and my suspicion is that the issue comes from the fact that your loss function relies on squeezing your predictions into a certain range, such that the gradient of the output with respect to the input will be zero outside this range (i.e. a change to the input value does not affect the output value in this region). This is similar to the issue discussed here, where gradients are zero in the presence of I'd suggest that rather than a hard step-function cutoff to keep your values in range, you might try a soft cutoff, perhaps using something like a sigmoid window. Then the loss function will be smooth and differentiable and still yield values in or near the desired range. Another thought, though: if the non-clipped loss function is consistently yielding values outside a physically reasonable range, it might be that there is some sort of bug in the problem setup. |
Beta Was this translation helpful? Give feedback.
-
Most exist works use tanh to ensure outputs are in the given range. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
Currently, I'm trying to solve a physics/NN hybrid model problem using JAX.
N.b. here's a link to a Colab notebook containing the relevant code
Problem setup
I have physical equations that calculate the distance a projectile travels given initial velocity and angle, and what I'd like to do is train a prediction network to provide these control parameters given a target distance (as described here).
The Issue
I used a simple MLP with ReLU activation, except for the last layer which has no activation. I set velocities to be in the range (0,10] m/s and launch angles to be in the range [1e-6, pi/2] radians; however, my prediction network was outputting control parameters that were negative.
I tried adding ReLU as activation to the last layer and was getting a 0 gradient, and then I tried manually squeezing the predictions to be in this range in the loss function, also getting a 0 gradient.
What I Tried
I understand that a 0 gradient is sometimes gotten when the output of a function does not depend on the inputs. I have made sure that all dtypes are float32, but I cannot figure out why squeezing the values causes a 0 gradient.
If anybody understands the problem (or another way to ensure my prediction network only predicts in the given range) that would be greatly appreciated! Thank you!
Beta Was this translation helpful? Give feedback.
All reactions