A few questions about paper(similar to #46)

Hi

I've mailed below last week monday, maybe your team is very busy, 
I wish you could answer below questions, actually this is similar to issue  #46.

Before asking my questions, I appreciate you and your team's accomplishments especially about model compression(svd-llm v2).
It is a brilliant idea and a very useful model compression method. 
For me I could get a lot of ideas from your paper. 

Here are my questions. 

**In your paper, written like below,
  Theorem 3.1. If Us, Ss, Vs are obtained by SVD decomposition of XXT and Uws, Sws, Vws are obtained by SVD decomposition of W × Us × √ Ss, 
the compressed weight matrix W′ = Uws × Trunc.(Sws) × Vws × √ Ss −1 × U −1 s ensures the theoretical minimum truncation loss. 
Proof. Since XXT is the symmetric matrix, suppose that the singular vectors and values of input activation X is Ux, Sx, Vx, 
we have Us = Ux and √ Ss = Sx. Suppose S = Us × √ Ss, thus S −1 = √ Ss −1 × U −1 s , and we have:**  

1. S = X * X.T and X = Ux * Sx * Vx.T
Since (Ux * Sx * Vx.T) * (Vx.T * Sx.T * Ux.T) become Ux * Sx^2 *Ux.T
But in your implementation,

def hook(module, input, output):
    inp = input[0].detach().float()
    if inp.dim() == 2: # for opt
        inp = inp.unsqueeze(0)
    adds = torch.matmul(inp.transpose(1, 2), inp) 

I guess "torch.matmul(inp.transpose(1, 2), inp)" is X.T * X, that is Vx * Sx ^ 2 *Vx.T. 
Is it different from X * X.T in paper??
For matching matrix shape, implementation is reasonable but it is different from paper's theorem. 
Could you explain this even briefly??

2. This is the most curious question,
Definitely, S = Ux * Sx ^2 * Ux.T, but theorem 3.1 S = Us * square root(Ss)
I could understand Us = Ux and Ss = Sx^2, but Ux.T disappeared from S and why square root is applied to Ss.
Is there any reason Ux.T disappeared and square root(Ss) appeared ??

3. When any matrix A multiply orthogonal matrix Q, 
Frobenius norm of ||A * Q|| = Frobenius norm of ||A||.
But in your paper, Frobenius norm of ||A * S-1 * X|| = Frobenius norm of ||S-1 * X|| written.
I guess Frobenius norm of ||A * S-1 *X|| = Frobenius norm of ||A|| is right. 
It would be nice if you give me an explanation, if I am wrong. 

I could not get any help from others. 
I would really appreciate it if you answer me. 

Hope for your research success and good health. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A few questions about paper(similar to #46) #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A few questions about paper(similar to #46) #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions