Android example long prompt cache? #1077
scsonic
started this conversation in
New features / APIs
Replies: 2 comments
-
@aciddelgado 's latest PR adds a 'rewind' functionality so you can effectively cache the prompt by rewinding the generator back to the prompt position every iteration. It was just checked in, so it's not in a release yet. |
Beta Was this translation helpful? Give feedback.
0 replies
-
its look like the function I need |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently using the Android Phi3 example.
If my system prompt is very long, it takes more than 90 seconds to process.
I’m wondering if there’s a way to cache the result of this 90-second processing.
When using llama.cpp for mobile,
it remembers the result, so next time, there’s no need to wait another 90 seconds.
or i can do something with classes in java api? import ai.onnxruntime.genai.[GeneratorParams, tokenizer, Model, Sequences]
is the KV cache in android working?
Beta Was this translation helpful? Give feedback.
All reactions