More gpu memory saving for llama #20

sfc-gh-zhwang · 2023-10-01T06:20:25Z

Get rid of the qkv_buf_tmp_ -> qkv_buf_ with repeat_kv hack in llamacontextdecoding to save gpu memory.

sfc-gh-zhwang added 18 commits September 30, 2023 19:19

commit

aa9176c

commit

b3e68ec

commit

7645f04

commit

2bdbed5

commit

aaec0de

commit

bbf8791

commit

d81a7df

commit

ed1e2c7

commit

8f02927

commit

c4705f6

commit

5c338ff

commit

21167b2

commit

6c4524b

commit

3359362

commit

6d14988

commit

7aa3f45

commit

b358534

commit

5523f1e

sfc-gh-zhwang changed the title ~~Zhwang/more mem~~ More gpu memory saving for llama Oct 1, 2023

sfc-gh-zhwang added 6 commits September 30, 2023 23:23

commit

81a856c

commit

8aa0208

commit

c74dd65

commit

24d6602

commit

c051547

commit

0854463

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More gpu memory saving for llama #20

More gpu memory saving for llama #20

Uh oh!

sfc-gh-zhwang commented Oct 1, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

More gpu memory saving for llama #20

Are you sure you want to change the base?

More gpu memory saving for llama #20

Uh oh!

Conversation

sfc-gh-zhwang commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sfc-gh-zhwang commented Oct 1, 2023 •

edited

Loading