Skip to content

πŸ› Bug Report: TypeError: cannot pickle '_thread.RLock' object during on_tool_start in langchain instrumentorΒ #3518

@le-codeur-rapide

Description

@le-codeur-rapide

Which component is this bug for?

Langchain Instrumentation

πŸ“œ Description

Hello first of all thank you for the work on the langchain instrumentor ! I have been using it a lot and it is very useful !

I have an issue with the LangchainInstrumentor in certain conditions:
It looks to me that if the langchain chat model can get a Lock attribute if used in some ways. When on_tool_start of the LangchainInstrumentor is triggered, it display an error: TypeError: cannot pickle '_thread.RLock' object

πŸ‘Ÿ Reproduction steps

Here is a minimal example to reproduce the bug:

import logging
import operator
import os
from typing import Annotated, Any, List

from langchain.agents import AgentState, create_agent
from langchain.messages import HumanMessage
from langchain.tools import ToolRuntime, tool
from langchain_openai import AzureChatOpenAI
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
from pydantic import BaseModel, Field

logging.basicConfig(level=logging.INFO, force=True)

async def test_func():
    class State(AgentState):
        relevant_documents: Annotated[List[dict[str, Any]], operator.add]
        llm: AzureChatOpenAI

    LangchainInstrumentor().instrument()
    llm = AzureChatOpenAI(
            model="gpt-4.1",
            api_version=os.environ["AZURE_API_VERSION"],
            azure_endpoint=os.environ["AZURE_FOUNDRY_AI_ENDPOINT"],
            api_key=os.environ["FOUNDRY_AI_API_KEY"],
        )

    class CalendarEventsInput(BaseModel):
        calendar_options: str = Field(
            description="Options for filtering calendar events.",
        )

    @tool(args_schema=CalendarEventsInput)
    def async_get_calendar(runtime: ToolRuntime, calendar_options: str):
        """Use this tool to search for calendar events, """
        logging.info("tool called")
        return "asyncc"

    agent_executor = create_agent(
            model=llm,
            tools=[async_get_calendar],
            system_prompt="call async calendar",
            state_schema=State,
        )
    async for event in agent_executor.astream(
        input={
            "messages": HumanMessage(content="call async calendar for event in last 7 days"),
            "relevant_documents": [],
            "llm": llm,
        },
        stream_mode=["debug"]
    ):
        logging.info(event)
await test_func()

I get the error:

Traceback (most recent call last):
  File "/home/pvezia/lynxai/LynxeoAI%20API/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 811, in __del__
    if self.is_closed:
       ^^^^^^^^^^^^^^
  File "/home/pvezia/lynxai/LynxeoAI%20API/.venv/lib/python3.12/site-packages/httpx/_client.py", line 228, in is_closed
    return self._state == ClientState.CLOSED
           ^^^^^^^^^^^
AttributeError: 'SyncHttpxClientWrapper' object has no attribute '_state'

When I look at debug logs of the instrumentor, I see:

INFO:opentelemetry.instrumentation.langchain.callback_handler:OpenLLMetry failed to trace in on_tool_start, error: Traceback (most recent call last):
  File "/home/pvezia/lynxai/LynxeoAI%20API/.venv/lib/python3.12/site-packages/opentelemetry/instrumentation/langchain/utils.py", line 68, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^  
File "/home/pvezia/lynxai/LynxeoAI%20API/.venv/lib/python3.12/site-packages/opentelemetry/instrumentation/langchain/callback_handler.py", line 680, in on_tool_start
    json.dumps(
  [....]
  File "/home/pvezia/lynxai/LynxeoAI%20API/.venv/lib/python3.12/site-packages/opentelemetry/instrumentation/langchain/utils.py", line 30, in default
    return dataclasses.asdict(o)
           ^^^^^^^^^^^^^^^^^^^^^
  [...]
  File "/usr/lib/python3.12/copy.py", line 151, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle '_thread.RLock' object

πŸ‘ Expected behavior

It should not display an error when tracing the model

πŸ‘Ž Actual Behavior with Screenshots

,

πŸ€– Python Version

3.12

πŸ“ƒ Provide any additional context for the Bug.

When we look at the CallbackFilteredJSONEncoder function, we can see that for dataclasses:

if dataclasses.is_dataclass(o):
     return dataclasses.asdict(o)

The issue is that asdict tries to make a deepcopy of the attributes of the dataclass, which cause an error if one of the attributes is a Lock (you can't make a deepcopy of a Lock)

Maybe a good workaround would be to replace those lines by

if dataclasses.is_dataclass(o):
    return {f.name: self.default(getattr(o, f.name)) for f in dataclasses.fields(o)}

It would be a safer way to get the dict from the dataclass in the cases where the dataclass has some attributes that can't be deepcopied

Maybe I am missing something ? Do you have any ideas ?

Thank you !

πŸ‘€ Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions