Recursive Summarization and load_summarize_chain #10930

RERobbins · 2023-09-22T11:52:10Z

RERobbins
Sep 22, 2023

Can I enable recursive summarization when using load_summarize_chain with chain_type=map_reduce?

I was hoping that I could set token_max when using load_summarize_chain so intermediate summaries could be collapsed before passing to the combine chain. But that does not seem to work. When I look at the chain I see token_max but there is no collapse_documents_chain.

map_reduce_chain_tm = load_summarize_chain(LLM, chain_type="map_reduce", token_max=3000)

map_reduce_chain_tm.dict()['reduce_documents_chain']['token_max'] yields 3000
however
map_reduce_chain_tm.dict()['reduce_documents_chain']['collapse_documents_chain'] yields None

Do I need to build my own chain to do that from MapReduceChain, ReduceDocumentsChain, and MapReduceDocumentsChain instead?

Answered by dosubot[bot]

Sep 24, 2023

🤖

Hello,

In the LangChain Python framework, the token_max parameter is not mandatory for enabling recursive summarization in the load_summarize_chain function. If this parameter is not provided, the system uses a default value of 3000. This is evident from the definition of the _load_map_reduce_chain function, where token_max is set to 3000 by default:

def _load_map_reduce_chain(
    ...
    token_max: int = 3000,
    ...
) -> MapReduceDocumentsChain:
    ...
    reduce_documents_chain = ReduceDocumentsChain(
        ...
        token_max=token_max,
        ...
    )
    ...

In the load_summarize_chain function, **kwargs is passed to the loader function, which includes _load_map_reduce_chain

View full answer

@dosu-bot · 2023-09-22T12:00:38Z

dosubot[bot]
bot Sep 22, 2023

🤖

Hello,

Yes, you can enable recursive summarization with load_summarize_chain using chain_type=map_reduce and setting token_max. You don't need to build your own chain using MapReduceChain, ReduceDocumentsChain, and MapReduceDocumentsChain.

The load_summarize_chain function creates a MapReduceDocumentsChain which includes a ReduceDocumentsChain. The ReduceDocumentsChain uses the token_max parameter to control the maximum number of tokens in the summarized document. This effectively enables recursive summarization as the ReduceDocumentsChain will continue to summarize the document until it is below the token_max limit.

Here is the relevant code:

reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=collapse_chain,
    token_max=token_max,
    verbose=verbose,
    callbacks=callbacks,
)
return MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name=map_reduce_document_variable_name,
    verbose=verbose,
    callbacks=callbacks,
    **kwargs,
)

You can use the load_summarize_chain function with chain_type=map_reduce and provide a token_max value to enable recursive summarization:

load_summarize_chain(llm, chain_type="map_reduce", token_max=1000)

The MapReduceDocumentsChain class in the map_reduce.py file also confirms this functionality. The combine_docs and acombine_docs methods in this class apply the llm_chain to each document and then use the reduce_documents_chain to combine the results. If the token_max parameter is provided, it will be passed to the reduce_documents_chain's combine_docs or acombine_docs method, which will recursively reduce the documents until the total number of tokens is below the token_max limit.

The token_max parameter in the load_summarize_chain function is used in the _load_map_reduce_chain function, which is one of the possible chains that can be loaded by load_summarize_chain. The token_max parameter is passed to the ReduceDocumentsChain constructor. This parameter sets the maximum number of tokens that the summarization process can handle. By setting a maximum number of tokens, you can control the length of the text that the summarization process can handle. If the text exceeds this limit, it will need to be truncated or otherwise handled to fit within this limit.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

RERobbins · 2023-09-22T18:15:47Z

RERobbins
Sep 22, 2023
Author

@dosu-bot When I create the load_summarize_chain as you suggest and inspect the ReduceDocumentsChain that it includes, there is no collapse chain. Without a collapse chain, the recursive summarization feature fails to operate. Is this a bug?

…

On Fri, Sep 22, 2023 at 7:00 AM dosu-beta[bot] ***@***.***> wrote: *🤖* Hello, Yes, you can enable recursive summarization with load_summarize_chain using chain_type=map_reduce and setting token_max. You don't need to build your own chain using MapReduceChain, ReduceDocumentsChain, and MapReduceDocumentsChain. The load_summarize_chain function creates a MapReduceDocumentsChain which includes a ReduceDocumentsChain. The ReduceDocumentsChain uses the token_max parameter to control the maximum number of tokens in the summarized document. This effectively enables recursive summarization as the ReduceDocumentsChain will continue to summarize the document until it is below the token_max limit. Here is the relevant code: reduce_documents_chain = ReduceDocumentsChain( combine_documents_chain=combine_documents_chain, collapse_documents_chain=collapse_chain, token_max=token_max, verbose=verbose, callbacks=callbacks, )return MapReduceDocumentsChain( llm_chain=map_chain, reduce_documents_chain=reduce_documents_chain, document_variable_name=map_reduce_document_variable_name, verbose=verbose, callbacks=callbacks, **kwargs, ) You can use the load_summarize_chain function with chain_type=map_reduce and provide a token_max value to enable recursive summarization: load_summarize_chain(llm, chain_type="map_reduce", token_max=1000) The MapReduceDocumentsChain class in the map_reduce.py file also confirms this functionality. The combine_docs and acombine_docs methods in this class apply the llm_chain to each document and then use the reduce_documents_chain to combine the results. If the token_max parameter is provided, it will be passed to the reduce_documents_chain's combine_docs or acombine_docs method, which will recursively reduce the documents until the total number of tokens is below the token_max limit. The token_max parameter in the load_summarize_chain function is used in the _load_map_reduce_chain function, which is one of the possible chains that can be loaded by load_summarize_chain. The token_max parameter is passed to the ReduceDocumentsChain constructor. This parameter sets the maximum number of tokens that the summarization process can handle. By setting a maximum number of tokens, you can control the length of the text that the summarization process can handle. If the text exceeds this limit, it will need to be truncated or otherwise handled to fit within this limit. I hope this helps! Let me know if you have any other questions. Sources - summarization <#2192> - reduce with long text <#5829> - libs/langchain/langchain/chains/summarize/*init*.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/summarize/__init__.py> - libs/langchain/langchain/chains/combine_documents/map_reduce.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/map_reduce.py> ------------------------------ This response is meant to be useful and save you time. It is *not meant to be a precise solution*, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot <https://github.com/dosu-bot>. — Reply to this email directly, view it on GitHub <#10930 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYUR2NNWCYVJWBI3G6NACDX3V4XFANCNFSM6AAAAAA5C7RZZ4> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

1 reply

dosubot[bot] bot Sep 22, 2023

🤖

Hello,

Thank you for reaching out and providing a detailed explanation of your issue.

The collapse_documents_chain yielding None is not a bug, but rather a result of how the ReduceDocumentsChain class is designed. In this class, the _collapse_chain property returns collapse_documents_chain if it's not None, otherwise it returns combine_documents_chain. This means that if collapse_documents_chain is None, the combine_documents_chain is used for both combining and collapsing documents.

The collapse_documents_chain is used when the documents passed in are too many to all be passed to combine_documents_chain in one go. In this case, collapse_documents_chain is called recursively on as big of groups of documents as are allowed. This is particularly useful in the recursive summarization feature of the LangChain framework. When the number of documents exceeds the maximum token limit (token_max), the collapse_documents_chain is used to reduce the size of the documents.

In your case, if you want to use a different chain for collapsing documents, you need to pass a collapse_document_chain_config when calling load_summarize_chain with chain_type=map_reduce. Here is how you can do it:

collapse_document_chain_config = {...}  # Your collapse document chain config
map_reduce_chain_tm = load_summarize_chain(LLM, chain_type="map_reduce", token_max=3000, collapse_document_chain_config=collapse_document_chain_config)

Please replace {...} with your actual collapse_document_chain_config.

I hope this helps! If you have any other questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

RERobbins · 2023-09-24T01:01:02Z

RERobbins
Sep 24, 2023
Author

@dosu-bot If I do not supply token_max will recursive summarization still take place with a reasonable default value for token_max or must I specify that parameter to even allow for the possibility of recursive summarization?

…

On Fri, Sep 22, 2023 at 1:25 PM dosu-beta[bot] ***@***.***> wrote: *🤖* Hello, Thank you for reaching out and providing a detailed explanation of your issue. The collapse_documents_chain yielding None is not a bug, but rather a result of how the ReduceDocumentsChain class is designed. In this class, the _collapse_chain property returns collapse_documents_chain if it's not None, otherwise it returns combine_documents_chain. This means that if collapse_documents_chain is None, the combine_documents_chain is used for both combining and collapsing documents. The collapse_documents_chain is used when the documents passed in are too many to all be passed to combine_documents_chain in one go. In this case, collapse_documents_chain is called recursively on as big of groups of documents as are allowed. This is particularly useful in the recursive summarization feature of the LangChain framework. When the number of documents exceeds the maximum token limit (token_max), the collapse_documents_chain is used to reduce the size of the documents. In your case, if you want to use a different chain for collapsing documents, you need to pass a collapse_document_chain_config when calling load_summarize_chain with chain_type=map_reduce. Here is how you can do it: collapse_document_chain_config = {...} # Your collapse document chain configmap_reduce_chain_tm = load_summarize_chain(LLM, chain_type="map_reduce", token_max=3000, collapse_document_chain_config=collapse_document_chain_config) Please replace {...} with your actual collapse_document_chain_config. I hope this helps! If you have any other questions, feel free to ask. Best, Dosu Sources - summarization <#2192> - reduce with long text <#5829> - libs/langchain/langchain/chains/combine_documents/reduce.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/reduce.py> - libs/langchain/langchain/chains/combine_documents/map_reduce.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/map_reduce.py> - libs/langchain/langchain/chains/summarize/*init*.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/summarize/__init__.py> - libs/langchain/langchain/chains/loading.py <https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/loading.py> ------------------------------ This response is meant to be useful and save you time. It is *not meant to be a precise solution*, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot <https://github.com/dosu-bot>. — Reply to this email directly, view it on GitHub <#10930 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYUR2KYCYB5EAPUR43SG6LX3XJZ7ANCNFSM6AAAAAA5C7RZZ4> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

1 reply

dosubot[bot] bot Sep 24, 2023

🤖

Hello,

In the LangChain Python framework, the token_max parameter is not mandatory for enabling recursive summarization in the load_summarize_chain function. If this parameter is not provided, the system uses a default value of 3000. This is evident from the definition of the _load_map_reduce_chain function, where token_max is set to 3000 by default:

def _load_map_reduce_chain(
    ...
    token_max: int = 3000,
    ...
) -> MapReduceDocumentsChain:
    ...
    reduce_documents_chain = ReduceDocumentsChain(
        ...
        token_max=token_max,
        ...
    )
    ...

In the load_summarize_chain function, **kwargs is passed to the loader function, which includes _load_map_reduce_chain. If token_max is not provided in **kwargs, the default value of 3000 is used.

The token_max parameter in the _load_map_reduce_chain function is used to set the maximum number of tokens that the ReduceDocumentsChain can handle. This parameter is passed to the ReduceDocumentsChain during its initialization. If the number of tokens in the documents exceeds this limit, it's likely that the ReduceDocumentsChain will either truncate the documents or throw an error.

Here is the relevant code:

reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=collapse_chain,
    token_max=token_max,
    verbose=verbose,
    callbacks=callbacks,
)

I hope this answers your question. If you have any more questions, feel free to ask.

Best regards,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Answer selected by RERobbins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recursive Summarization and load_summarize_chain #10930

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Recursive Summarization and load_summarize_chain #10930

Uh oh!

RERobbins Sep 22, 2023

Replies: 3 comments · 2 replies

Uh oh!

dosubot[bot] bot Sep 22, 2023

Sources

Uh oh!

RERobbins Sep 22, 2023 Author

Uh oh!

dosubot[bot] bot Sep 22, 2023

Sources

Uh oh!

RERobbins Sep 24, 2023 Author

Uh oh!

dosubot[bot] bot Sep 24, 2023

Sources

RERobbins
Sep 22, 2023

Replies: 3 comments 2 replies

dosubot[bot]
bot Sep 22, 2023

RERobbins
Sep 22, 2023
Author

RERobbins
Sep 24, 2023
Author