Skip to content
Discussion options

You must be logged in to vote

Great, your example makes things clearer.
In your example, $A$ represents the parameters of the network, $x$ represents a batch of inputs, $f(x)$ represents a batch of outputs. So $f(x)$ is the function mapping an input batch to an output batch. (I think it makes sense to assume $m=n$ here).
Essentially the quantity you want to compute is the diagonal of the Jacobian matrix. In the general case, computing a Jacobian will take $\mathcal{O}(n)$ Jacobian-vector or vector-Jacobian products, each of which are roughly the same order as evaluating the network on the batch. So e.g. $\mathcal{O}(n^2)$ memory, since each batch has $n$ inputs.

The key point is that for a typical network without batc…

Replies: 2 comments 7 replies

Comment options

You must be logged in to vote
4 replies
@yoeldr
Comment options

@davisyoshida
Comment options

@yoeldr
Comment options

@davisyoshida
Comment options

Comment options

You must be logged in to vote
3 replies
@yoeldr
Comment options

@C-J-Cundy
Comment options

@yoeldr
Comment options

Answer selected by yoeldr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants