Best practices for tracing an app with history and tool usage #3764

machulav · 2024-10-15T21:27:34Z

machulav
Oct 15, 2024

Hi,

We've developed a chat assistant using the OpenAI Assistant API, which incorporates memory and accesses multiple tools. Users can start a new thread and interact with the assistant, and when needed, the assistant can execute specific tools via Function Calling.

We're now exploring how to monitor and test this application using Langfuse. However, I haven't found clear best practices for organizing traces in an application that involves both history and tool usage.

I'm considering the following approach.

For each user message, we create a trace that includes:

The user's message
Full history of the thread (all previous user and assistant messages, as well as tool execution results)
The assistant's output (LLM response)

With this setup, every user message generates a trace containing all the information the assistant considered to create its response. So, we can easily add the needed traces to the datasets and, without any modifications, run against the application in test mode to get accurate valuations. Additionally, having the complete data for each interaction could help us build more effective evaluation configurations and achieve more reliable scores.

I'm not very experienced with tracing, but based on what I've learned, this seems like the best approach so far.

Are there any best practices for tracing an app with history and tool usage?

I'd greatly appreciate any feedback or suggestions!

Answered by marcklingen

Oct 16, 2024

Langfuse supports chatml for rendering messages in generation input/output, this should make the history more readable.

I'd structure the traces by making use of nested observations, learn more about this here: https://langfuse.com/docs/tracing

In the Python example we fetch the message history separately and add it to the trace as this is relevant for debugging. I'd suggest to do the same in JS.

View full answer

@machulav · 2024-10-15T21:29:25Z

dosubot[bot]
bot Oct 15, 2024

Hey @machulav! I'm here to help you with any bugs, questions, or contributions you have. Let's dive into the issue together!

To effectively trace an app with history and tool usage, here are some best practices you can follow:

Volume Tracking: Track all LLM calls and their content, attaching relevant metadata for both prompts and completions. This ensures you have a comprehensive view of all interactions.
Cost Monitoring: Record token counts and pricing to compute the cost of each call. If you're using self-hosted models, track GPU seconds and pricing as well.
Latency Measurement: Measure latency for every call to analyze which steps add latency and improve user experience.
Quality Assessment: Solicit user feedback, conduct manual evaluations, and score outputs using model-based evaluations to assess the quality of interactions.
Error Monitoring: Monitor for timeouts and HTTP errors, such as rate limits, to identify systemic issues.
Tagging Traces: Use tags to categorize and filter traces. Tags can be added when traces are created using SDKs or from the UI, and are useful for filtering and grouping traces by specific criteria, such as app versions, LLM techniques, or environments.

Implementing these practices involves defining goals, incorporating tracking, inspecting and debugging through a visual UI, and analyzing metrics like cost, latency, and quality over time. These steps help in understanding user interactions and improving the application's performance. Additionally, using tags can enhance trace management and analysis by allowing you to filter and categorize traces effectively.

Langfuse provides observability, analytics, and feedback collection capabilities, which can be particularly useful for these tasks. It allows you to monitor usage, track latency, assess quality based on user feedback, and identify errors ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

0 replies

marcklingen · 2024-10-15T21:30:36Z

marcklingen
Oct 15, 2024
Maintainer

This is the right approach and we have written about best practices here: https://langfuse.com/faq/all/openai-assistant-api

0 replies

machulav · 2024-10-16T20:49:54Z

machulav
Oct 16, 2024
Author

Hi @marcklingen,

Thank you for sharing the page! I did take a look at it, but unfortunately, it doesn’t seem to provide the specific information I’m looking for.

While we’re primarily using TypeScript, I’ve been exploring the Python documentation to gain a clearer understanding.
The Python documentation focuses mostly on decorators, but I couldn’t find guidance on a few key points:

What is considered the best practice for structuring data in traces? What data should be included?
How should we structure traces, spans, and generations? What elements are important, and why?
How can the traces with JSON be made easier to read and analyze? For example, how to format chat history in a trace in such a way that it's shown as text on UI, but JSON is still available for programmatic access. I saw it on some screens but didn't find a way to make it possible while creating traces.

It would be incredibly helpful to have a guide or cookbook on how to monitor and test assistants that use memory, functions, and RAG, with practical examples.

Thank you again, and I appreciate any additional insights you might have!

4 replies

marcklingen Oct 16, 2024
Maintainer

Langfuse supports chatml for rendering messages in generation input/output, this should make the history more readable.

I'd structure the traces by making use of nested observations, learn more about this here: https://langfuse.com/docs/tracing

In the Python example we fetch the message history separately and add it to the trace as this is relevant for debugging. I'd suggest to do the same in JS.

Answer selected by marcklingen

marcklingen Oct 29, 2024
Maintainer

@machulav did this resolve the question for you? Happy to help in case you have follow-up questions

machulav Oct 29, 2024
Author

@marcklingen thanks for your response and follow-up! I still have it in my backlog and need to review it thoroughly.

Do you have any examples you can share of how to use chatml and log chat history using TS?

I’ll get back to you with any further questions.

marcklingen Nov 1, 2024
Maintainer

yes, see example implementation here (this is the backend route of langfuse.com/demo): https://github.com/langfuse/langfuse-docs/blob/main/pages/api/qa-chatbot.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Best practices for tracing an app with history and tool usage #3764

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Best practices for tracing an app with history and tool usage #3764

Uh oh!

Uh oh!

machulav Oct 15, 2024

Replies: 3 comments · 4 replies

Uh oh!

dosubot[bot] bot Oct 15, 2024

Uh oh!

marcklingen Oct 15, 2024 Maintainer

Uh oh!

Uh oh!

machulav Oct 16, 2024 Author

Uh oh!

marcklingen Oct 16, 2024 Maintainer

Uh oh!

marcklingen Oct 29, 2024 Maintainer

Uh oh!

machulav Oct 29, 2024 Author

Uh oh!

marcklingen Nov 1, 2024 Maintainer

machulav
Oct 15, 2024

Replies: 3 comments 4 replies

dosubot[bot]
bot Oct 15, 2024

marcklingen
Oct 15, 2024
Maintainer

machulav
Oct 16, 2024
Author

marcklingen Oct 16, 2024
Maintainer

marcklingen Oct 29, 2024
Maintainer

machulav Oct 29, 2024
Author

marcklingen Nov 1, 2024
Maintainer