Skip to content

Commit ebcd34b

Browse files
committed
Update on "Fix Cuda out of memory issue for eager runner"
This PR updates the eager runner to disable grad and save memory usage. It also update the prompt format to not include bos. Differential Revision: [D65962743](https://our.internmc.facebook.com/intern/diff/D65962743/) [ghstack-poisoned]
2 parents 3d89512 + 7106f4b commit ebcd34b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/models/llama/runner/generation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ def chat_completion(
199199
temperature=temperature,
200200
top_p=top_p,
201201
echo=True,
202-
pos_base=len(tokens),
202+
pos_base=len(tokens) - 1 if len(tokens) > 0 else 0
203203
)
204204
tokens.extend(new_tokens)
205205
prompt = input("Me: ")

0 commit comments

Comments
 (0)