Skip to content

A few questions about paper(similar to #46)Β #47

@ArtabanMind

Description

@ArtabanMind

Hi

I've mailed below last week monday, maybe your team is very busy,
I wish you could answer below questions, actually this is similar to issue #46.

Before asking my questions, I appreciate you and your team's accomplishments especially about model compression(svd-llm v2).
It is a brilliant idea and a very useful model compression method.
For me I could get a lot of ideas from your paper.

Here are my questions.

In your paper, written like below,
Theorem 3.1. If Us, Ss, Vs are obtained by SVD decomposition of XXT and Uws, Sws, Vws are obtained by SVD decomposition of W Γ— Us Γ— √ Ss,
the compressed weight matrix Wβ€² = Uws Γ— Trunc.(Sws) Γ— Vws Γ— √ Ss βˆ’1 Γ— U βˆ’1 s ensures the theoretical minimum truncation loss.
Proof. Since XXT is the symmetric matrix, suppose that the singular vectors and values of input activation X is Ux, Sx, Vx,
we have Us = Ux and √ Ss = Sx. Suppose S = Us Γ— √ Ss, thus S βˆ’1 = √ Ss βˆ’1 Γ— U βˆ’1 s , and we have:

  1. S = X * X.T and X = Ux * Sx * Vx.T
    Since (Ux * Sx * Vx.T) * (Vx.T * Sx.T * Ux.T) become Ux * Sx^2 *Ux.T
    But in your implementation,

def hook(module, input, output):
inp = input[0].detach().float()
if inp.dim() == 2: # for opt
inp = inp.unsqueeze(0)
adds = torch.matmul(inp.transpose(1, 2), inp)

I guess "torch.matmul(inp.transpose(1, 2), inp)" is X.T * X, that is Vx * Sx ^ 2 *Vx.T.
Is it different from X * X.T in paper??
For matching matrix shape, implementation is reasonable but it is different from paper's theorem.
Could you explain this even briefly??

  1. This is the most curious question,
    Definitely, S = Ux * Sx ^2 * Ux.T, but theorem 3.1 S = Us * square root(Ss)
    I could understand Us = Ux and Ss = Sx^2, but Ux.T disappeared from S and why square root is applied to Ss.
    Is there any reason Ux.T disappeared and square root(Ss) appeared ??

  2. When any matrix A multiply orthogonal matrix Q,
    Frobenius norm of ||A * Q|| = Frobenius norm of ||A||.
    But in your paper, Frobenius norm of ||A * S-1 * X|| = Frobenius norm of ||S-1 * X|| written.
    I guess Frobenius norm of ||A * S-1 *X|| = Frobenius norm of ||A|| is right.
    It would be nice if you give me an explanation, if I am wrong.

I could not get any help from others.
I would really appreciate it if you answer me.

Hope for your research success and good health.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions