-
Can I enable recursive summarization when using I was hoping that I could set
Do I need to build my own chain to do that from |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
🤖 Hello, Yes, you can enable recursive summarization with The Here is the relevant code: reduce_documents_chain = ReduceDocumentsChain(
combine_documents_chain=combine_documents_chain,
collapse_documents_chain=collapse_chain,
token_max=token_max,
verbose=verbose,
callbacks=callbacks,
)
return MapReduceDocumentsChain(
llm_chain=map_chain,
reduce_documents_chain=reduce_documents_chain,
document_variable_name=map_reduce_document_variable_name,
verbose=verbose,
callbacks=callbacks,
**kwargs,
) You can use the load_summarize_chain(llm, chain_type="map_reduce", token_max=1000) The The I hope this helps! Let me know if you have any other questions. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot
When I create the load_summarize_chain as you suggest and inspect the
ReduceDocumentsChain that it includes, there is no collapse chain.
Without a collapse chain, the recursive summarization feature fails to
operate.
Is this a bug?
…On Fri, Sep 22, 2023 at 7:00 AM dosu-beta[bot] ***@***.***> wrote:
*🤖*
Hello,
Yes, you can enable recursive summarization with load_summarize_chain
using chain_type=map_reduce and setting token_max. You don't need to
build your own chain using MapReduceChain, ReduceDocumentsChain, and
MapReduceDocumentsChain.
The load_summarize_chain function creates a MapReduceDocumentsChain which
includes a ReduceDocumentsChain. The ReduceDocumentsChain uses the
token_max parameter to control the maximum number of tokens in the
summarized document. This effectively enables recursive summarization as
the ReduceDocumentsChain will continue to summarize the document until it
is below the token_max limit.
Here is the relevant code:
reduce_documents_chain = ReduceDocumentsChain(
combine_documents_chain=combine_documents_chain,
collapse_documents_chain=collapse_chain,
token_max=token_max,
verbose=verbose,
callbacks=callbacks,
)return MapReduceDocumentsChain(
llm_chain=map_chain,
reduce_documents_chain=reduce_documents_chain,
document_variable_name=map_reduce_document_variable_name,
verbose=verbose,
callbacks=callbacks,
**kwargs,
)
You can use the load_summarize_chain function with chain_type=map_reduce
and provide a token_max value to enable recursive summarization:
load_summarize_chain(llm, chain_type="map_reduce", token_max=1000)
The MapReduceDocumentsChain class in the map_reduce.py file also confirms
this functionality. The combine_docs and acombine_docs methods in this
class apply the llm_chain to each document and then use the
reduce_documents_chain to combine the results. If the token_max parameter
is provided, it will be passed to the reduce_documents_chain's
combine_docs or acombine_docs method, which will recursively reduce the
documents until the total number of tokens is below the token_max limit.
The token_max parameter in the load_summarize_chain function is used in
the _load_map_reduce_chain function, which is one of the possible chains
that can be loaded by load_summarize_chain. The token_max parameter is
passed to the ReduceDocumentsChain constructor. This parameter sets the
maximum number of tokens that the summarization process can handle. By
setting a maximum number of tokens, you can control the length of the text
that the summarization process can handle. If the text exceeds this limit,
it will need to be truncated or otherwise handled to fit within this limit.
I hope this helps! Let me know if you have any other questions.
Sources
- summarization <#2192>
- reduce with long text
<#5829>
- libs/langchain/langchain/chains/summarize/*init*.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/summarize/__init__.py>
- libs/langchain/langchain/chains/combine_documents/map_reduce.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/map_reduce.py>
------------------------------
This response is meant to be useful and save you time. It is *not meant
to be a precise solution*, but rather a starting point for your own
research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it
is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot
<https://github.com/dosu-bot>.
—
Reply to this email directly, view it on GitHub
<#10930 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYUR2NNWCYVJWBI3G6NACDX3V4XFANCNFSM6AAAAAA5C7RZZ4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
@dosu-bot
If I do not supply token_max will recursive summarization still take place
with a reasonable default value for token_max or must I specify that
parameter to even allow for the possibility of recursive summarization?
…On Fri, Sep 22, 2023 at 1:25 PM dosu-beta[bot] ***@***.***> wrote:
*🤖*
Hello,
Thank you for reaching out and providing a detailed explanation of your
issue.
The collapse_documents_chain yielding None is not a bug, but rather a
result of how the ReduceDocumentsChain class is designed. In this class,
the _collapse_chain property returns collapse_documents_chain if it's not
None, otherwise it returns combine_documents_chain. This means that if
collapse_documents_chain is None, the combine_documents_chain is used for
both combining and collapsing documents.
The collapse_documents_chain is used when the documents passed in are too
many to all be passed to combine_documents_chain in one go. In this case,
collapse_documents_chain is called recursively on as big of groups of
documents as are allowed. This is particularly useful in the recursive
summarization feature of the LangChain framework. When the number of
documents exceeds the maximum token limit (token_max), the
collapse_documents_chain is used to reduce the size of the documents.
In your case, if you want to use a different chain for collapsing
documents, you need to pass a collapse_document_chain_config when calling
load_summarize_chain with chain_type=map_reduce. Here is how you can do
it:
collapse_document_chain_config = {...} # Your collapse document chain configmap_reduce_chain_tm = load_summarize_chain(LLM, chain_type="map_reduce", token_max=3000, collapse_document_chain_config=collapse_document_chain_config)
Please replace {...} with your actual collapse_document_chain_config.
I hope this helps! If you have any other questions, feel free to ask.
Best,
Dosu
Sources
- summarization <#2192>
- reduce with long text
<#5829>
- libs/langchain/langchain/chains/combine_documents/reduce.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/reduce.py>
- libs/langchain/langchain/chains/combine_documents/map_reduce.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/combine_documents/map_reduce.py>
- libs/langchain/langchain/chains/summarize/*init*.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/summarize/__init__.py>
- libs/langchain/langchain/chains/loading.py
<https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/loading.py>
------------------------------
This response is meant to be useful and save you time. It is *not meant
to be a precise solution*, but rather a starting point for your own
research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it
is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot
<https://github.com/dosu-bot>.
—
Reply to this email directly, view it on GitHub
<#10930 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEYUR2KYCYB5EAPUR43SG6LX3XJZ7ANCNFSM6AAAAAA5C7RZZ4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
🤖
Hello,
In the LangChain Python framework, the
token_max
parameter is not mandatory for enabling recursive summarization in theload_summarize_chain
function. If this parameter is not provided, the system uses a default value of 3000. This is evident from the definition of the_load_map_reduce_chain
function, wheretoken_max
is set to 3000 by default:In the
load_summarize_chain
function,**kwargs
is passed to the loader function, which includes_load_map_reduce_chain