Visualize prompt caching performance in the trace view #4858

CNSeniorious000 · 2024-12-31T13:09:24Z

CNSeniorious000
Dec 31, 2024

Describe the feature or potential improvement

We need to visually debug prompt caching performance. Specifically, we want the content prefix to remain consistent across requests. However, due to the complex templating process used in rendering the final prompt, it’s hard to identify differences between requests. I propose adding a "show cached prefix" toggle in the trace and playground views. When activated, the cached segments would be dimmed, and a counter would indicate the number of tokens in the cached part. The cached prefix can be calculated by comparing across all "generation" observations from the last 5 minutes.

Additional information

The count result don't need to be as accurate as the value counted from the LLM provider. I just need a brief view of which part of the prompt began changed in the new generation request.

marcklingen · 2025-01-02T12:22:50Z

marcklingen
Jan 2, 2025
Maintainer

This is a great suggestion.

Why would you go for computing the cached context based on the observations of the last 5 minutes instead of using the "cached_tokens" value included in the LLM response?

1 reply

CNSeniorious000 Jan 2, 2025
Author

Because I don't just want to see a number; I want to see exactly which part of my entire prompt has changed, leading to a cache miss starting from this token.

This idea struck me because, in one of my projects — where the LLM autonomously executes code, with the prompt beginning with a series of few-shot examples containing both the code and its execution results — I noticed that the cached_token count was unexpectedly low. It wasn’t until later that I realized the few-shot section was being re-executed with every request, causing some hashes to change (since Python’s default repr for objects includes their memory address id).

In the end, I had to manually log the full prompt and use the diff view in my IDE to identify the issue. If I could intuitively spot this on the observation platform, that would be fantastic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Visualize prompt caching performance in the trace view #4858

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Langfuse

Visualize prompt caching performance in the trace view #4858

Uh oh!

CNSeniorious000 Dec 31, 2024

Describe the feature or potential improvement

Additional information

Replies: 1 comment · 1 reply

Uh oh!

marcklingen Jan 2, 2025 Maintainer

Uh oh!

Uh oh!

CNSeniorious000 Jan 2, 2025 Author

CNSeniorious000
Dec 31, 2024

Replies: 1 comment 1 reply

marcklingen
Jan 2, 2025
Maintainer

CNSeniorious000 Jan 2, 2025
Author