-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Hi
I've mailed below last week monday, maybe your team is very busy,
I wish you could answer below questions, actually this is similar to issue #46.
Before asking my questions, I appreciate you and your team's accomplishments especially about model compression(svd-llm v2).
It is a brilliant idea and a very useful model compression method.
For me I could get a lot of ideas from your paper.
Here are my questions.
In your paper, written like below,
Theorem 3.1. If Us, Ss, Vs are obtained by SVD decomposition of XXT and Uws, Sws, Vws are obtained by SVD decomposition of W Γ Us Γ β Ss,
the compressed weight matrix Wβ² = Uws Γ Trunc.(Sws) Γ Vws Γ β Ss β1 Γ U β1 s ensures the theoretical minimum truncation loss.
Proof. Since XXT is the symmetric matrix, suppose that the singular vectors and values of input activation X is Ux, Sx, Vx,
we have Us = Ux and β Ss = Sx. Suppose S = Us Γ β Ss, thus S β1 = β Ss β1 Γ U β1 s , and we have:
- S = X * X.T and X = Ux * Sx * Vx.T
Since (Ux * Sx * Vx.T) * (Vx.T * Sx.T * Ux.T) become Ux * Sx^2 *Ux.T
But in your implementation,
def hook(module, input, output):
inp = input[0].detach().float()
if inp.dim() == 2: # for opt
inp = inp.unsqueeze(0)
adds = torch.matmul(inp.transpose(1, 2), inp)
I guess "torch.matmul(inp.transpose(1, 2), inp)" is X.T * X, that is Vx * Sx ^ 2 *Vx.T.
Is it different from X * X.T in paper??
For matching matrix shape, implementation is reasonable but it is different from paper's theorem.
Could you explain this even briefly??
-
This is the most curious question,
Definitely, S = Ux * Sx ^2 * Ux.T, but theorem 3.1 S = Us * square root(Ss)
I could understand Us = Ux and Ss = Sx^2, but Ux.T disappeared from S and why square root is applied to Ss.
Is there any reason Ux.T disappeared and square root(Ss) appeared ?? -
When any matrix A multiply orthogonal matrix Q,
Frobenius norm of ||A * Q|| = Frobenius norm of ||A||.
But in your paper, Frobenius norm of ||A * S-1 * X|| = Frobenius norm of ||S-1 * X|| written.
I guess Frobenius norm of ||A * S-1 *X|| = Frobenius norm of ||A|| is right.
It would be nice if you give me an explanation, if I am wrong.
I could not get any help from others.
I would really appreciate it if you answer me.
Hope for your research success and good health.