-
|
Me and my team have been having a hard time choosing an observability framework for our projects, and since Langfuse is a top contender, and since there are still some problems hampering our adoption of Langfuse, I thought I would ask them here to see if they are already resolved / planned to be resolved, and generally, get to know whether it's worth waiting for some issues to be fixed or move on for now. I have examined both self-hosted and cloud langfuse, and mostly I have issues with LlamaIndex support, most notably LlamIndex Workflows, but not exclusively. Issues are: LlamaIndex Workflow supportIn our code we use LlamaIndex workflows, and as much as I tried, I didn't manage to get this completely working in Langfuse. It seems that some generations or spans are dropped, I must explicitly create a span id for the generation to work, and it feels very unstable. I know there is an open discussion here https://github.com/orgs/langfuse/discussions/3717 and an open issue here #3736 but I would like to know whether this is something you're actively workin on or is a more difficult issue to solve. An additional issue with Workflows I found, was that when it did work, it didn't work properly. For example, given the code below: from llama_index.core.workflow import Workflow, step, StartEvent, StopEvent
from llama_index.llms.bedrock_converse import BedrockConverse
from langfuse.llama_index import LlamaIndexInstrumentor
instrumentor = LlamaIndexInstrumentor()
instrumentor.start()
class MW(Workflow):
@step
async def entrypoint(self, ev: StartEvent) -> StopEvent:
query = ev.query
MODEL = "anthropic.claude-3-5-sonnet-20240620-v1:0"
llm = BedrockConverse(model=MODEL)
resp = llm.complete(query)
return StopEvent(result=resp)
mw = MW()
await mw.run(query="What are you?")Consequtive runs of the code (in a notebook) will all be collected in a single same trace (called MW), with wrongly LlamaIndex Bedrock usage not reported correctlyWe're using AWS Bedrock as a foundation model supplier, and using the BedrockConverse class as LLMs. When doing so, the input and output tokens are incorrectly reported (usually some number apperas as input token count, and 0 as output) even though the API returns usage and the token count is captured there. I'm not sure if this is a langfuse issue or a llama-index upstream issue, but it is definitely an inconvenience at the very least. Prompt management and prompt usage trackingOne of the most compelling reason for us to go with Langfuse is the very nice prompt management you offer, and especially the ability to identify generations with the prompts. As is written in the documentation, this is currently not supported with LlamaIndex, but there is no indication whether this is something you intend to implement in the future. So the question is simple: do you intend to introduce a mechanism to track prompt usage with LlamaIndex? Or is there some reason why this is very difficult / not feasible? Any help and experiences by others using Bedrock / Llama-Index workflow and Langfuse would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
I found a relevant discussion and issue that might address some of your concerns:
Regarding your specific questions:
These issues are actively discussed, and it might be worth monitoring the progress on the open issue and discussion for updates. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
|
Hi Omri, thanks a lot for the detailed writeup! LlamaIndex Workflows LlamaIndex Bedrock Usage Prompt management and prompt usage tracking |
Beta Was this translation helpful? Give feedback.
Hi Omri, thanks a lot for the detailed writeup!
LlamaIndex Workflows
As you noticed, we have an open issue for better Workflow support. Improvements are prioritized and should land in the coming releases. Link to issue
LlamaIndex Bedrock Usage
I have created a separate issue that will be addressed together with the one above. Link to issue
Prompt management and prompt usage tracking
The instrumentation module by LlamaIndex allows tracing various executions with potentially multiple generations and nested execution steps, such that linking a prompt to one specific generation in the generic execution is not easy to achieve. What would be possible is to allow passing a prompt via
instrumento…