Skip to content
Discussion options

You must be logged in to vote

By definition, the transpose rule cannot depend on the value x. Given that you're implementing a Heaviside function, I think there are some problems with the way you've defined the derivative rules. First of all, the gradient of a heaviside function is everywhere zero. The only point where this might come into question is at x = 0, where one might argue that the gradient is infinite (jnp.inf), or perhaps undefined (jnp.nan); however zero is a reasonable result here for reasons discussed in Why are gradients zero for functions based on sort order?. So your JVP rule might look like this:

def _heaviside_jvp(primals, tangents, *, alpha):
  x, = primals
  xt, = tangents
  primal_outs = heavisi…

Replies: 3 comments 4 replies

Comment options

You must be logged in to vote
3 replies
@chaoming0625
Comment options

@chaoming0625
Comment options

@jakevdp
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by chaoming0625
Comment options

You must be logged in to vote
1 reply
@jakevdp
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants