Skip to content

Bug: TypeError: 'in <string>' requires string as left operand, not list in _prepare_summarization_result with gemini #135

@moise-g

Description

@moise-g

Description

The RunningSummary.summary field is typed str but can contain Union[str, list[Union[str, dict]]] (as assigned from summary_response.content: BaseMessage.content), but the code in _prepare_summarization_result assumes it's always a string when performing in operations, causing a TypeError.

Error Details

Error Message:

TypeError: 'in <string>' requires string as left operand, not list

Location:
langmem/short_term/summarization.py, line 306 in _prepare_summarization_result

Failing Code:

and existing_summary.summary
in preprocessed_messages.existing_system_message.content

Root Cause

  1. RunningSummary.summary is set from summary_response.content (line 644 in summarization.py)
  2. BaseMessage.content is typed as Union[str, list[Union[str, dict]]] (from langchain_core.messages.base)
  3. Some language models (e.g., Google Gemini) return content as lists instead of strings
  4. The comparison operation at line 306 assumes existing_summary.summary is a string

Reproduction

from langchain_core.messages import SystemMessage, HumanMessage
from langmem.short_term.summarization import asummarize_messages, RunningSummary
from langchain.chat_models import init_chat_model

# Use a model that returns list-based content (e.g., Gemini)
llm = init_chat_model("gemini-2.5-flash", location="global", include_thoughts=True)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hello"),
    # ... more messages to trigger summarization
]

# Initial summarization works
result = await asummarize_messages(
    messages,
    model=llm,
    max_tokens=10000,
    max_tokens_before_summary=8000,
)

# On subsequent call with the running_summary, if the model returned
# list-based content, this will fail
result2 = await asummarize_messages(
    new_messages,
    running_summary=result.running_summary,  # summary field is a list
    model=llm,
    max_tokens=10000,
    max_tokens_before_summary=8000,
)
# TypeError: 'in <string>' requires string as left operand, not list

Stack Trace

File "langmem/short_term/summarization.py", line 651, in asummarize_messages
    return _prepare_summarization_result(
        preprocessed_messages=preprocessed_messages,
        existing_summary=running_summary,
        summary_response=summary_response,
        final_prompt=final_prompt,
    )
File "langmem/short_term/summarization.py", line 306, in _prepare_summarization_result
    and existing_summary.summary
        ^^^^^^^^^^^^^^^^^^^^^^^^
    in preprocessed_messages.existing_system_message.content
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'in <string>' requires string as left operand, not list

Proposed Solution

Option 1: Normalize content in RunningSummary

Add a helper function to normalize content to string when creating RunningSummary:

def _normalize_content_to_string(content: str | list[str | dict]) -> str:
    """Normalize message content to string format."""
    if isinstance(content, str):
        return content
    
    if isinstance(content, list):
        text_parts = []
        for item in content:
            if isinstance(item, str):
                text_parts.append(item)
            elif isinstance(item, dict) and item.get("type") == "text" and "text" in item:
                text_parts.append(str(item["text"]))
        return "".join(text_parts)
    
    return str(content)

# Update line 643-649 in summarization.py
running_summary = RunningSummary(
    summary=_normalize_content_to_string(summary_response.content),  # <-- normalize here
    summarized_message_ids=summarized_message_ids,
    last_summarized_message_id=preprocessed_messages.messages_to_summarize[-1].id,
)

Option 2: Fix the comparison at line 306

Handle both string and list types in the comparison:

# Before (line 305-307)
and existing_summary.summary
in preprocessed_messages.existing_system_message.content

# After
and _normalize_content_to_string(existing_summary.summary)
in _normalize_content_to_string(preprocessed_messages.existing_system_message.content)

Option 3: Strictly type RunningSummary.summary as str

Update the RunningSummary dataclass to explicitly type summary as str and enforce normalization at creation time.

Impact

This bug affects any usage of asummarize_messages with:

  • Language models that return structured content (Gemini, Claude with tool use, etc.)
  • Multiple summarization rounds (when running_summary is passed from previous calls)

Environment

  • langmem: 0.0.30
  • langchain: 0.3.27
  • langchain-google-vertexai: 2.1.2
  • Python version: 3.13
  • Affected models: Google Gemini, potentially others that return list-based content

Workaround

Users can normalize the running_summary before passing it back to asummarize_messages:

def normalize_running_summary(running_summary: RunningSummary | None) -> RunningSummary | None:
    if running_summary is None:
        return None
    
    # Normalize to string
    if isinstance(running_summary.summary, str):
        normalized_summary = running_summary.summary
    elif isinstance(running_summary.summary, list):
        text_parts = []
        for item in running_summary.summary:
            if isinstance(item, str):
                text_parts.append(item)
            elif isinstance(item, dict) and item.get("type") == "text":
                text_parts.append(str(item["text"]))
        normalized_summary = "".join(text_parts)
    else:
        normalized_summary = str(running_summary.summary)
    
    return RunningSummary(
        summary=normalized_summary,
        summarized_message_ids=running_summary.summarized_message_ids,
        last_summarized_message_id=running_summary.last_summarized_message_id,
    )

# Use the normalized version
normalized_summary = normalize_running_summary(result.running_summary)
result2 = await asummarize_messages(
    new_messages,
    running_summary=normalized_summary,
    ...
)

Additional Notes

A consistent normalization strategy throughout the langmem library would prevent similar issues in other operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions