feat(langchain): (SummarizationMiddleware) support use of model context windows when triggering summarization #33825

ccurme · 2025-11-04T20:07:44Z

Here we use model profiles to allow summarization behavior to vary with model context window sizes.

We make these changes:

When is summarization triggered?

We add a trigger parameter which subsumes max_tokens_before_summary.

trigger=("tokens", 1000) is equivalent to max_tokens_before_summary=1000
We also support trigger=("messages", X) and trigger=("fraction", X). The latter will use a fraction of a model's context window if model profiles are available.
trigger=[("tokens", 1000), ("messages", 50)] will trigger summarization if either condition is met.

We retain runtime support for max_tokens_before_summary, although it will emit a deprecation warning and generate a typing error.

trigger=None (or max_tokens_before_summary=None) disables summarization, consistent with how max_tokens_before_summary was documented (and resolving #33701)

What context is summarized?

We add a keep parameter which subsumes messages_to_keep.

keep=("messages", 20) is equivalent to messages_to_keep=20.
We also support keep=("tokens", X) and keep=("fraction", X).

We default to ("messages", 20) as before.

Note: we also enable all of this context to be sent to the LLM for summarization.

Current behavior is to always trim to 4000 tokens.
We parametrize this number, allowing it to be None, in which case we don't trim before passing to the LLM.

sydney-runkle

This looks awesome! Super excited about this improvement

total_tokens + max_output_tokens + self.buffer_tokens > max_input_tokens

have we checked w/ the applied AI team if it makes sense to default to having: self.buffer_tokens defaulting to 0? Related, do we want to max out at max_input_tokens? is there a downside to filling the window to the brim?

sydney-runkle · 2025-11-04T20:39:01Z

libs/langchain_v1/langchain/agents/middleware/summarization.py

        token_counter: TokenCounter = count_tokens_approximately,
        summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
        summary_prefix: str = SUMMARY_PREFIX,
+        *,


let's move this up higher

nhuang-lc · 2025-11-04T20:56:05Z

libs/langchain_v1/langchain/agents/middleware/summarization.py

        self.summary_prompt = summary_prompt
        self.summary_prefix = summary_prefix
+        self.buffer_tokens = buffer_tokens
+        self.trim_token_limit = trim_token_limit


I'm not sure how necessary trim_token_limit is

It is a bit confusing as an argument (all thoughts I had)

does this mean the maximum number of tokens that we can trim?

does this mean the max number of tokens left after trimming?

when does trimming even occur, and why! Are we trimming instead of summarizing?

If someone runs into an error while summarizing, shouldn't they just lower the max_tokens_before_summary?

Pass None to skip trimming entirely (risking summary model overflows if the history is too long).

Imo the best solution here is just "compact earlier" instead of "trimming at compaction time". Summarization call is like an extra few thousand tokens max maybe?

I renamed this parameter to trim_tokens_to_summarize. The purpose of my adding this as a parameter is to allow us to circumvent the trimming entirely. The current behavior is to always trim what is sent to the LLM for summarization to 4000 tokens:

langchain/libs/langchain_v1/langchain/agents/middleware/summarization.py

Lines 221 to 249 in 915c446

def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:

"""Generate summary for the given messages."""

if not messages_to_summarize:

return "No previous conversation history."

trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)

if not trimmed_messages:

return "Previous conversation was too long to summarize."

try:

response = self.model.invoke(self.summary_prompt.format(messages=trimmed_messages))

return cast("str", response.content).strip()

except Exception as e: # noqa: BLE001

return f"Error generating summary: {e!s}"

def _trim_messages_for_summary(self, messages: list[AnyMessage]) -> list[AnyMessage]:

"""Trim messages to fit within summary generation limits."""

try:

return trim_messages(

messages,

max_tokens=_DEFAULT_TRIM_TOKEN_LIMIT,

token_counter=self.token_counter,

start_on="human",

strategy="last",

allow_partial=True,

include_system=True,

)

except Exception: # noqa: BLE001

return messages[-_DEFAULT_FALLBACK_MESSAGE_COUNT:]

so this was a minimal change that is not breaking but lets us disable this. lmk if that makes sense or I misunderstood you.

ccurme · 2025-11-05T22:30:54Z

libs/langchain_v1/langchain/agents/middleware/summarization.py

+        tokens_to_keep: float | None = None,
        token_counter: TokenCounter = count_tokens_approximately,
        summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
-        summary_prefix: str = SUMMARY_PREFIX,


this parameter is unused

ccurme added 8 commits November 4, 2025 11:43

update summarization condition

b26b3e7

add target_retention_pct

6d2990a

rename parameter

a966b9a

update docstring

8a6de4b

support configuration of trim_token_limit

ee61499

update test

2525c2f

add test cases

c19a0db

use binary search

6c49403

github-actions bot added langchain Related to the package `langchain` v1 Issue specific to LangChain 1.0 feature and removed feature labels Nov 4, 2025

ccurme added 2 commits November 4, 2025 15:29

do not override invoke in tests

c09a563

unrelated fix

b23ae7b

sydney-runkle reviewed Nov 4, 2025

View reviewed changes

nhuang-lc reviewed Nov 4, 2025

View reviewed changes

ccurme added 7 commits November 4, 2025 16:01

rename parameter

6ba36ce

move up separator

933b8ea

fix

3d8d1f7

rename parameter

8ee5eb9

update to tokens_before_summary param and add messages_before_summary

e208af1

update to tokens_to_keep param

079544f

remove summary_prefix param (unused)

9c977b1

ccurme commented Nov 5, 2025

View reviewed changes

nits

3f2bc1a

github-actions bot added feature and removed feature labels Nov 5, 2025

rename parameter

0306f57

github-actions bot removed the feature label Nov 5, 2025

github-actions bot added the feature label Nov 5, 2025

ccurme added 2 commits November 6, 2025 09:06

Merge branch 'master' into cc/summarization_middleware_profile

119ffa3

update to trigger and keep parameters

6ddb982

github-actions bot added feature and removed feature labels Nov 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(langchain): (SummarizationMiddleware) support use of model context windows when triggering summarization #33825

feat(langchain): (SummarizationMiddleware) support use of model context windows when triggering summarization #33825

ccurme commented Nov 4, 2025 •

edited

Loading

Uh oh!

sydney-runkle left a comment

Uh oh!

sydney-runkle Nov 4, 2025

Uh oh!

nhuang-lc Nov 4, 2025

Uh oh!

nhuang-lc Nov 4, 2025

Uh oh!

ccurme Nov 4, 2025

Uh oh!

ccurme Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	def _create_summary(self, messages_to_summarize: list[AnyMessage]) -> str:
	"""Generate summary for the given messages."""
	if not messages_to_summarize:
	return "No previous conversation history."

	trimmed_messages = self._trim_messages_for_summary(messages_to_summarize)
	if not trimmed_messages:
	return "Previous conversation was too long to summarize."

	try:
	response = self.model.invoke(self.summary_prompt.format(messages=trimmed_messages))
	return cast("str", response.content).strip()
	except Exception as e: # noqa: BLE001
	return f"Error generating summary: {e!s}"

	def _trim_messages_for_summary(self, messages: list[AnyMessage]) -> list[AnyMessage]:
	"""Trim messages to fit within summary generation limits."""
	try:
	return trim_messages(
	messages,
	max_tokens=_DEFAULT_TRIM_TOKEN_LIMIT,
	token_counter=self.token_counter,
	start_on="human",
	strategy="last",
	allow_partial=True,
	include_system=True,
	)
	except Exception: # noqa: BLE001
	return messages[-_DEFAULT_FALLBACK_MESSAGE_COUNT:]

feat(langchain): (SummarizationMiddleware) support use of model context windows when triggering summarization #33825

Are you sure you want to change the base?

feat(langchain): (SummarizationMiddleware) support use of model context windows when triggering summarization #33825

Conversation

ccurme commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

When is summarization triggered?

What context is summarized?

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

sydney-runkle Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nhuang-lc Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

nhuang-lc Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ccurme Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ccurme Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ccurme commented Nov 4, 2025 •

edited

Loading