-
Notifications
You must be signed in to change notification settings - Fork 748
Closed
Description
🚀 The feature, motivation and pitch
Instead of exposing input_pos in generate_from_pos() API, we should redesign the API to hide the input_pos argument as an internal state.
We should support these features:
- generate with an input prompt -> uses the current context, creates the response adds it to context, and adjusts start position of KV caching internally
- Add context - used to hydrate KV cache for loading historical chat, adjusts start position internally when generate is called after it
- clear context - remove prefilled tokens and reset start position
To be more specific,
- Add a private field
pos_and manage it in all APIs. - Keep the
generate()API, but instead of assuming a start pos of 0, use thepos_field. - Add
prefill()API to be able to take chat history. - Add
reset()API to resetpos_to 0.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done