Skip to content

Conversation

@ccurme
Copy link
Collaborator

@ccurme ccurme commented Nov 4, 2025

Here we use model profiles to allow summarization behavior to vary with model context window sizes.

We make these changes:

When is summarization triggered?

We add a trigger parameter which subsumes max_tokens_before_summary.

  • trigger=("tokens", 1000) is equivalent to max_tokens_before_summary=1000
  • We also support trigger=("messages", X) and trigger=("fraction", X). The latter will use a fraction of a model's context window if model profiles are available.
  • trigger=[("tokens", 1000), ("messages", 50)] will trigger summarization if either condition is met.

We retain runtime support for max_tokens_before_summary, although it will emit a deprecation warning and generate a typing error.

trigger=None (or max_tokens_before_summary=None) disables summarization, consistent with how max_tokens_before_summary was documented (and resolving #33701)

What context is summarized?

We add a keep parameter which subsumes messages_to_keep.

  • keep=("messages", 20) is equivalent to messages_to_keep=20.
  • We also support keep=("tokens", X) and keep=("fraction", X).

We default to ("messages", 20) as before.

Note: we also enable all of this context to be sent to the LLM for summarization.

  • Current behavior is to always trim to 4000 tokens.
  • We parametrize this number, allowing it to be None, in which case we don't trim before passing to the LLM.

@github-actions github-actions bot added langchain Related to the package `langchain` v1 Issue specific to LangChain 1.0 feature and removed feature labels Nov 4, 2025
Copy link
Collaborator

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Super excited about this improvement

total_tokens + max_output_tokens + self.buffer_tokens > max_input_tokens

have we checked w/ the applied AI team if it makes sense to default to having: self.buffer_tokens defaulting to 0? Related, do we want to max out at max_input_tokens? is there a downside to filling the window to the brim?

token_counter: TokenCounter = count_tokens_approximately,
summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
summary_prefix: str = SUMMARY_PREFIX,
*,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this up higher

self.summary_prompt = summary_prompt
self.summary_prefix = summary_prefix
self.buffer_tokens = buffer_tokens
self.trim_token_limit = trim_token_limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how necessary trim_token_limit is

It is a bit confusing as an argument (all thoughts I had)

  • does this mean the maximum number of tokens that we can trim?
  • does this mean the max number of tokens left after trimming?
  • when does trimming even occur, and why! Are we trimming instead of summarizing?

If someone runs into an error while summarizing, shouldn't they just lower the max_tokens_before_summary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass None to skip trimming entirely (risking summary model overflows if the history is too long).

Imo the best solution here is just "compact earlier" instead of "trimming at compaction time". Summarization call is like an extra few thousand tokens max maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this parameter to trim_tokens_to_summarize. The purpose of my adding this as a parameter is to allow us to circumvent the trimming entirely. The current behavior is to always trim what is sent to the LLM for summarization to 4000 tokens:

def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
"""Generate summary for the given messages."""
if not messages_to_summarize:
return "No previous conversation history."
trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)
if not trimmed_messages:
return "Previous conversation was too long to summarize."
try:
response = self.model.invoke(self.summary_prompt.format(messages=trimmed_messages))
return cast("str", response.content).strip()
except Exception as e: # noqa: BLE001
return f"Error generating summary: {e!s}"
def _trim_messages_for_summary(self, messages: list[AnyMessage]) -> list[AnyMessage]:
"""Trim messages to fit within summary generation limits."""
try:
return trim_messages(
messages,
max_tokens=_DEFAULT_TRIM_TOKEN_LIMIT,
token_counter=self.token_counter,
start_on="human",
strategy="last",
allow_partial=True,
include_system=True,
)
except Exception: # noqa: BLE001
return messages[-_DEFAULT_FALLBACK_MESSAGE_COUNT:]

so this was a minimal change that is not breaking but lets us disable this. lmk if that makes sense or I misunderstood you.

tokens_to_keep: float | None = None,
token_counter: TokenCounter = count_tokens_approximately,
summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
summary_prefix: str = SUMMARY_PREFIX,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is unused

@github-actions github-actions bot added feature and removed feature labels Nov 5, 2025
@github-actions github-actions bot removed the feature label Nov 5, 2025
@github-actions github-actions bot added feature and removed feature labels Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature langchain Related to the package `langchain` v1 Issue specific to LangChain 1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants