Skip to content

LangChain/LangGraph span input.value / output.value as JSON (not repr) #2827

@sallyannarize

Description

@sallyannarize

Summary

Span attributes input.value and output.value were being set to the string representation (Python repr/str) of LangChain message objects instead of JSON. Downstream UIs (e.g. Arize) and replay tooling expect parseable JSON.

Root cause

  • Where: openinference-instrumentation-langchain in _tracer.py: _update_span uses _as_input(_convert_io(run.inputs)) and _as_output(_convert_io(run.outputs)). _convert_io uses _json_dumps() for the payload.
  • Why: _json_dumps() uses _OpenInferenceJSONEncoder. The encoder handled Pydantic (model_dump) and dataclasses but did not explicitly handle LangChain BaseMessage. When encoding failed (e.g. Run state like {"messages": [HumanMessage(...)]}), the code fell back to safe_json_dumps(obj), which uses json.dumps(..., default=str), so message objects became str(message) (repr-like: content='...' additional_kwargs={} ...).

Fix (Suggested)

  • In _OpenInferenceJSONEncoder.default():
    • Before other branches: if isinstance(obj, BaseMessage), serialize to a JSON-serializable dict using (in order) message_to_dict(obj), model_dump(), model_dump_json() + json.loads, Pydantic v1 dict(), or to_json().
  • Optional import: from langchain_core.messages.base import message_to_dict with ImportError fallback for older langchain_core.
  • New test: test_convert_io_langchain_messages_json_not_repr (state {"messages": [HumanMessage(...)]}json.loads(value) and content assertion).

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions