-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
Thanks for your great work!
I looked into the prompt_reuse script.
It basically first feed the "INITIAL_PROMPT" through the model:
inputs = tokenizer(INITIAL_PROMPT, return_tensors="pt").to("cuda")
with torch.no_grad():
prompt_cache = model(**inputs, past_key_values = prompt_cache).past_key_values
Then, to use this cache with another prompt suffix, they use:
prompt = "Why are french people obsessed with french?"
new_inputs = tokenizer(INITIAL_PROMPT + prompt, return_tensors="pt").to("cuda")
past_key_values = copy.deepcopy(prompt_cache)
outputs = model.generate(**new_inputs, past_key_values=past_key_values,max_new_tokens=20)
However, I am trying to understand if the INITIAL_PROMPT tokens have a meaning, other than being "position_ids" placeholder. Like, let's say we would change INITIAL_PROMPT to be some random prompt with same token length. Would we expect other results? I assume that since these tokens KV are taken from the cache, they're only placeholders.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels