Various aspects of LlamaIndex support #4637

omrihar · 2024-12-08T14:04:18Z

omrihar
Dec 8, 2024

Me and my team have been having a hard time choosing an observability framework for our projects, and since Langfuse is a top contender, and since there are still some problems hampering our adoption of Langfuse, I thought I would ask them here to see if they are already resolved / planned to be resolved, and generally, get to know whether it's worth waiting for some issues to be fixed or move on for now.

I have examined both self-hosted and cloud langfuse, and mostly I have issues with LlamaIndex support, most notably LlamIndex Workflows, but not exclusively.

Issues are:

LlamaIndex Workflow support

In our code we use LlamaIndex workflows, and as much as I tried, I didn't manage to get this completely working in Langfuse. It seems that some generations or spans are dropped, I must explicitly create a span id for the generation to work, and it feels very unstable. I know there is an open discussion here https://github.com/orgs/langfuse/discussions/3717 and an open issue here #3736 but I would like to know whether this is something you're actively workin on or is a more difficult issue to solve.

An additional issue with Workflows I found, was that when it did work, it didn't work properly. For example, given the code below:

from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent
from llama_index.llms.bedrock_converse import BedrockConverse
from langfuse.llama_index import LlamaIndexInstrumentor

instrumentor = LlamaIndexInstrumentor()
instrumentor.start()

class MW(Workflow):

    @step
    async def entrypoint(self, ev: StartEvent) -> StopEvent:
        query = ev.query

        MODEL = "anthropic.claude-3-5-sonnet-20240620-v1:0"
        llm = BedrockConverse(model=MODEL)

        resp = llm.complete(query)

        return StopEvent(result=resp)

mw = MW()
await mw.run(query="What are you?")

Consequtive runs of the code (in a notebook) will all be collected in a single same trace (called MW), with wrongly
reported token count (see next item). One attept to solve the token count was to use LangChain LLMs in the workflow, but this completely broke the Workflow structure in Langfuse.

LlamaIndex Bedrock usage not reported correctly

We're using AWS Bedrock as a foundation model supplier, and using the BedrockConverse class as LLMs. When doing so, the input and output tokens are incorrectly reported (usually some number apperas as input token count, and 0 as output) even though the API returns usage and the token count is captured there. I'm not sure if this is a langfuse issue or a llama-index upstream issue, but it is definitely an inconvenience at the very least.
By the way, this is not a problem when using LangChain LLMs with Bedrock, that does report usage correctly in Langfuse.

Prompt management and prompt usage tracking

One of the most compelling reason for us to go with Langfuse is the very nice prompt management you offer, and especially the ability to identify generations with the prompts. As is written in the documentation, this is currently not supported with LlamaIndex, but there is no indication whether this is something you intend to implement in the future. So the question is simple: do you intend to introduce a mechanism to track prompt usage with LlamaIndex? Or is there some reason why this is very difficult / not feasible?
If there is a way, I don't mind repoting the usage using a langfuse_context.update_xxx method...

Any help and experiences by others using Bedrock / Llama-Index workflow and Langfuse would be greatly appreciated!

Answered by hassiebp

Dec 10, 2024

Hi Omri, thanks a lot for the detailed writeup!

LlamaIndex Workflows
As you noticed, we have an open issue for better Workflow support. Improvements are prioritized and should land in the coming releases. Link to issue

LlamaIndex Bedrock Usage
I have created a separate issue that will be addressed together with the one above. Link to issue

Prompt management and prompt usage tracking
The instrumentation module by LlamaIndex allows tracing various executions with potentially multiple generations and nested execution steps, such that linking a prompt to one specific generation in the generic execution is not easy to achieve. What would be possible is to allow passing a prompt via instrumento…

View full answer

@dosu · 2024-12-08T14:06:32Z

dosubot[bot]
bot Dec 8, 2024

I found a relevant discussion and issue that might address some of your concerns:

Discussion on LlamaIndex Workflow Issues: There is an ongoing discussion about the challenges with integrating Langfuse with LlamaIndex Workflows, including issues with asynchronous tracing, dropped spans, and token reporting inaccuracies when using AWS Bedrock. The discussion suggests using a callback-based integration for production until the instrumentation module stabilizes ^[1].
Open Issue on LlamaIndex Workflow Support: An open issue highlights the instability of LlamaIndex Workflow support in Langfuse, with spans being dropped and incorrect token reporting when using AWS Bedrock. This issue is recognized but not yet resolved ^[1].

Regarding your specific questions:

LlamaIndex Workflow Support: The instability and issues with dropped spans are known problems. The recommendation is to ensure proper setup for asynchronous tracing and consider using a callback-based integration for now.
Token Reporting with AWS Bedrock: The incorrect token reporting when using AWS Bedrock with LlamaIndex is a recognized issue. It does not occur with LangChain LLMs, indicating a specific problem with the integration.
Prompt Management and Tracking: Currently, prompt management and tracking are not supported with LlamaIndex, and there is no clear indication from Langfuse on future implementation plans.

These issues are actively discussed, and it might be worth monitoring the progress on the open issue and discussion for updates.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

0 replies

hassiebp · 2024-12-10T10:23:07Z

hassiebp
Dec 10, 2024
Maintainer

Hi Omri, thanks a lot for the detailed writeup!

LlamaIndex Workflows
As you noticed, we have an open issue for better Workflow support. Improvements are prioritized and should land in the coming releases. Link to issue

LlamaIndex Bedrock Usage
I have created a separate issue that will be addressed together with the one above. Link to issue

Prompt management and prompt usage tracking
The instrumentation module by LlamaIndex allows tracing various executions with potentially multiple generations and nested execution steps, such that linking a prompt to one specific generation in the generic execution is not easy to achieve. What would be possible is to allow passing a prompt via instrumentor.observe() and link the first generation in the trace to the prompt, but this could very well be a oversimplification. Is the above workflow the use-case you have for prompt linking?

1 reply

omrihar Dec 12, 2024
Author

Thanks for the reply @hassiebp , I'm currently only using LlamaIndex with their LLM itself (so no fancy components) because I want to use the LlamaIndex workflows and they appear to be better for traceability if I use LlamaIndex LLMs rather than LangChain ones. So linking the prompt to the top-level trace is not an issue for me, at least for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Various aspects of LlamaIndex support #4637

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Various aspects of LlamaIndex support #4637

Uh oh!

omrihar Dec 8, 2024

LlamaIndex Workflow support

LlamaIndex Bedrock usage not reported correctly

Prompt management and prompt usage tracking

Replies: 2 comments · 1 reply

Uh oh!

dosubot[bot] bot Dec 8, 2024

Uh oh!

hassiebp Dec 10, 2024 Maintainer

Uh oh!

omrihar Dec 12, 2024 Author

omrihar
Dec 8, 2024

Replies: 2 comments 1 reply

dosubot[bot]
bot Dec 8, 2024

hassiebp
Dec 10, 2024
Maintainer

omrihar Dec 12, 2024
Author