What is summarization / is it working? #2183

fkohrt · 2024-03-23T20:43:47Z

fkohrt
Mar 23, 2024

I am using the Azure OpenAI models and have enabled summarization on the endpoint-level. I am not sure, however, where exactly something is being summarized, now that I enabled it. At first I thought this refers to automatic generation of chat titles, but they are all called New Chat for me. Where can I find the summarization feature and how can I tell whether it works?

danny-avila · 2024-03-23T22:00:28Z

danny-avila
Mar 23, 2024
Maintainer

We are using an adaptation on the "ConversationSummaryBufferMemory" strategy to summarize messages.

To learn more about this, see this article: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/

To summarize (lol), the summarization is triggered when the following conditions are met:

- You enable summarization
- There are messages that couldn't fit within half of the current model's token limit.

This worked well in the age of models with 4-8k context, when this was first implemented, operating within the "efficient" realm as shown in the article.

However, this needs to be revisited soon as we are now in the age of ever-increasing context windows (gpt-4-turbo with 128k and anthropic 200k+).

That means that we need to get to around 60-100k tokens for summarization to kick in. While this may alleviate costs from using the full context, it's sub-optimal.

I would also like to add an option for the user, through the config file, to decide what the summary context window should be, first on an endpoint-level then even on a model level.

2 replies

danny-avila Mar 23, 2024
Maintainer

The article highlights 2-3k range being optimal, but obviously this depends on your task:

Also adding a summary context window as a frontend option to enable it as a feature through presets/per-conversation.

fkohrt Mar 24, 2024
Author

Thank's, that was a very helpful explanation!

tcpipuk · 2025-03-09T21:25:14Z

tcpipuk
Mar 9, 2025

Claude Code has a feature where you can call /compact and it'll summarise everything up to this point and wipe the history.

It's a bit rough (when it's running rampant through files you need to run it often or you'll be paying dollars a minute) but it could be a useful feature here.

For example, you could spawn a thread based on the current chat up to this point, but the starting message is a summary from the assistant of everything that occurred up to this point, so the user can continue with their next request, etc.

Probably in practice you'd want to be able to customise the summarise prompt per agent (or let the user provide instructions at the point it's happening) so it knows what to focus on (e.g. how long should the summary be? if we're summarising a second time, how much should it care about the content of the last summary versus what's happened since?) but having compact/summarise as a thread option might be a better UX than trying to mysteriously summarise in the background?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is summarization / is it working? #2183

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

What is summarization / is it working? #2183

Uh oh!

fkohrt Mar 23, 2024

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

danny-avila Mar 23, 2024 Maintainer

Uh oh!

danny-avila Mar 23, 2024 Maintainer

Uh oh!

fkohrt Mar 24, 2024 Author

Uh oh!

Uh oh!

tcpipuk Mar 9, 2025

fkohrt
Mar 23, 2024

Replies: 2 comments 2 replies

danny-avila
Mar 23, 2024
Maintainer

danny-avila Mar 23, 2024
Maintainer

fkohrt Mar 24, 2024
Author

tcpipuk
Mar 9, 2025