Skip to content
Discussion options

You must be logged in to vote

Thanks so much for this question! I pulled @froystig in to think about this, and to teach me what Kaczmarz is. We wrote this comment together!

Let's change notation and clarify the problem statement: let's say we have h = f . g where g: R^d -> R^n is the prediction function, with d the dimension of the parameter and n the number of classes (suppressing the dependence on input data x for convenience), and f : R^n -> R is the loss function. (Notice f shouldn't have an input of dimension d, i.e. the weights, a typo in the OP we think!) Say w ∈ R^d is the current parameter. Notice that ∇f(g(w)) ∈ R^n, and ∂g(w) ∈ R^{n x d}, where the latter is just notation for the Jacobian matrix.

To draw an…

Replies: 4 comments 9 replies

Comment options

You must be logged in to vote
2 replies
@yaroslavvb
Comment options

@froystig
Comment options

Comment options

You must be logged in to vote
7 replies
@jekbradbury
Comment options

@typedfemale
Comment options

@froystig
Comment options

@yaroslavvb
Comment options

@yaroslavvb
Comment options

Answer selected by yaroslavvb
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants