Visualize prompt caching performance in the trace view #4858
CNSeniorious000
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
|
This is a great suggestion. Why would you go for computing the cached context based on the observations of the last 5 minutes instead of using the "cached_tokens" value included in the LLM response? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the feature or potential improvement
We need to visually debug prompt caching performance. Specifically, we want the content prefix to remain consistent across requests. However, due to the complex templating process used in rendering the final prompt, it’s hard to identify differences between requests. I propose adding a "show cached prefix" toggle in the trace and playground views. When activated, the cached segments would be dimmed, and a counter would indicate the number of tokens in the cached part. The cached prefix can be calculated by comparing across all "generation" observations from the last 5 minutes.
Additional information
The count result don't need to be as accurate as the value counted from the LLM provider. I just need a brief view of which part of the prompt began changed in the new generation request.
Beta Was this translation helpful? Give feedback.
All reactions