Suggest: Add Bayesian optimization support for ratio search by trotsky1997 · Pull Request #104 · mit-han-lab/llm-awq

trotsky1997 · 2023-10-26T15:57:04Z

No description provided.

casper-hansen · 2023-10-27T10:28:28Z

Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?

trotsky1997 · 2023-10-27T10:53:47Z

Hi @trotsky1997, this looks very interesting! Have you conducted any experiments to measure perplexity after using Bayesian optimization?
You can check my result in
https://trotsky1997.notion.site/f49dcb79ab6245a7b689beed086e4c7b?pvs=4

casper-hansen · 2023-10-28T16:59:15Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

trotsky1997 · 2023-10-29T16:24:03Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

that's very easy to modify, just add a new parameter called ratio_b to get_loss function, and replace 1-ratio with ratio_b, than define a new parameter ratio_b with its boundary in parameter definition.

trotsky1997 · 2023-10-29T16:25:53Z

        @scheduler.serial
        def get_loss(ratio,ratio_b):
            nonlocal best_error,best_ratio,best_scales
            ratio = ratio * 1 / n_grid
            scales = (x_max.pow(ratio) / w_max.pow(ratio_b)
                      ).clamp(min=1e-4).view(-1)
            scales = scales / (scales.max() * scales.min()).sqrt()
            for fc in linears2scale:
                fc.weight.mul_(scales.view(1, -1).to(fc.weight.device))
                fc.weight.data = w_quantize_func(
                    fc.weight.data) / (scales.view(1, -1))
            out = block(x, **kwargs)
            if isinstance(out, tuple):
                out = out[0]

            loss = (org_out - out).float().pow(2).mean().item()  # float prevents overflow
            history.append(loss)
            is_best = loss < best_error
            if is_best:
                best_error = loss
                best_ratio = ratio
                best_scales = scales
            block.load_state_dict(org_sd)
            return loss

        param_space = dict(ratio=uniform(0, 1),ratio_b=uniform(0, 1))

trotsky1997 · 2023-10-29T16:30:24Z

@trotsky1997 does this code include different alpha value for X and W? You observed better perplexity with it.

I have talked with Dr.Tang, it perform a little better than gs in vicuna, but just the same as gs in llama2-7b.

trotsky1997@qq.com added 2 commits October 26, 2023 20:40

bayesian'

66bbc36

fix

838085e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest: Add Bayesian optimization support for ratio search #104

Suggest: Add Bayesian optimization support for ratio search #104
trotsky1997 wants to merge 2 commits intomit-han-lab:mainfrom
trotsky1997:zhangdi

trotsky1997 commented Oct 26, 2023

Uh oh!

casper-hansen commented Oct 27, 2023

Uh oh!

trotsky1997 commented Oct 27, 2023

Uh oh!

casper-hansen commented Oct 28, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

trotsky1997 commented Oct 26, 2023

Uh oh!

casper-hansen commented Oct 27, 2023

Uh oh!

trotsky1997 commented Oct 27, 2023

Uh oh!

casper-hansen commented Oct 28, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

trotsky1997 commented Oct 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants