Better map_reduce summaries by specifying summary length in prompt #6438

shkarlsson · 2023-06-19T18:16:17Z

shkarlsson
Jun 19, 2023

In my case, langchain.chains.summarize.load_summarize_chain(chain_type="map_reduce"...) are not that good and I believe it is because the first level summaries (the ones that the "summary of summaries" is based on) are very short.

In my case, some 15k tokens were split into 5 documents of about 3k tokens each. Each of these were summarized to about 200 tokens. That meant that the final summary (the "summary of summaries") was only based on 1000 tokens. If the llm had been told to summarize each of the 5 documents into 800 token long summaries, then the final summary could be based on 4000 tokens which should result in a much better end summary.

Therefore, I suggest adding an optional max_token_length parameter to load_summarize_chain so that, down the line, the prompt can include a requested specific amount of tokens "make a summary of below using at most [x] tokens". This should cause the summaries to "fill up" the available token space for the final summaries.

Before diving into this I would like to hear your thoughts on this, and maybe some tips on how to implement it (especially where to find the function that partitions longer documents into smaller ones).

zmtomorrow · 2023-08-24T13:36:08Z

zmtomorrow
Aug 24, 2023

Hi Shkarlsson, we also came across the same problem when dealing with large document summarization. To address this, we've developed a method that can automatically determine the optimal chunk size for large documents. In our experience, we found this approach can give a much better summarization quality in practice.

For an introduction to this method, please visit our blog.
We also provide a basic implementation of the method in our GitHub Repo.

Your feedback would be invaluable to us. We'd appreciate any insights or suggestions you might have.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better map_reduce summaries by specifying summary length in prompt #6438

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Better map_reduce summaries by specifying summary length in prompt #6438

Uh oh!

shkarlsson Jun 19, 2023

Replies: 1 comment

Uh oh!

Uh oh!

zmtomorrow Aug 24, 2023

shkarlsson
Jun 19, 2023

zmtomorrow
Aug 24, 2023