feat(langfuse): add cost and usage support for more generators and generally for embedders

**Is your feature request related to a problem? Please describe.**
Langfuse support costs and usage details only for generators and embeddings. The [tracer.py](https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/langfuse/src/haystack_integrations/tracing/langfuse/tracer.py) converts some whitelisted generators to the Langfuse type "generators" which results in working usage and cost tracking. Here the whitelist:

    _SUPPORTED_GENERATORS = [
        "AzureOpenAIGenerator",
        "OpenAIGenerator",
        "AnthropicGenerator",
        "HuggingFaceAPIGenerator",
        "HuggingFaceLocalGenerator",
        "CohereGenerator",
        "OllamaGenerator",
    ]
    _SUPPORTED_CHAT_GENERATORS = [
        "AmazonBedrockChatGenerator",
        "AzureOpenAIChatGenerator",
        "OpenAIChatGenerator",
        "AnthropicChatGenerator",
        "HuggingFaceAPIChatGenerator",
        "HuggingFaceLocalChatGenerator",
        "CohereChatGenerator",
        "OllamaChatGenerator",
        "GoogleGenAIChatGenerator",
    ]

---
However generators like mistral missing. So it wont be created as generator:

         elif context.component_type in _ALL_SUPPORTED_GENERATORS:
           return LangfuseSpan(self.tracer.start_as_current_observation(name=context.name, as_type="generation"))


Also embedders are completly created as type "span" resulting in ignoring cost and usage.


**Describe the solution you'd like**
Add compatible generators and embedders. I already did it for mistral models:  #2463 

**Describe alternatives you've considered**
I tryed to use openinference-ai as Langfuse [recommends](https://langfuse.com/integrations/frameworks/haystack) it, but the instrumentator suffers the under the same problems. So lets solve it for Haystack!

Another working way would be to use Langfuse Python SDK to add to Langfuse some model definitions. With the help of a REGEX it will calculate the costs based on usage, which will be extracted from meta data. A Pitfall is here a custom pricing models that dont rely on tokens. E.g pages (OCR) or per request (Data API). That results also that Langfuse calculates the cost for you. 

**Additional context**
For my pipeline I need cost and usage tracking for mistral models. I solved it locally and it needs only little code changes. The challenging part for a PR is to support any type of embedders from different providers. Make sure to extract the usage from meta data and put it into the fields Langfuse expects. Be aware that Langfuse does many things in the background which can lead to problems while debugging. Double check the results. 

In addition my pipelines has more components with cost and usage that I need to track. Langfuse is not having any support for this. The best solution so far was to wrongly flag anything in either generator and after this issue possibly as embedder. The intersting part is, that Langfuse has more type to offer that generator and embdders #2473 . However they drop the attribute like cost_details and usage_details. If they adjust it, tracer.py could need an update again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(langfuse): add cost and usage support for more generators and generally for embedders #2472

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(langfuse): add cost and usage support for more generators and generally for embedders #2472

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions