best document question answer for langchain for acdemic documents - having images(flowcharts,diagrams...) #10323

Abhijithm2447 · 2023-09-07T10:43:37Z

Abhijithm2447
Sep 7, 2023

Looking for recommendations on effective document question-answering solutions for LangChain. I'm working with documents containing a combination of images (diagrams and flowcharts) and text data. Any suggestions?

I have tried different loaders Unstructured Loader AmazonTextractPDFLoader OnlinePDFLoader PyMuPDFLoader PDFMinerLoader PDFMinerPDFasHTMLLoader PDFPlumberLoader PyPDFLoader PyPDFium2Loader MathpixPDFLoader

and image captioning in langchain but. I am not able to found a good method for question answering in documents which have flowcharts and diagrams and text data as well

@dosu-bot · 2023-09-07T10:58:07Z

dosubot[bot]
bot Sep 7, 2023

🤖

Hello,

For implementing document question-answering in LangChain for academic documents that contain a mix of text data and images such as diagrams and flowcharts, you can use the BaseQAWithSourcesChain class provided in the LangChain framework. This class uses the QUESTION_PROMPT and COMBINE_PROMPT templates to extract relevant text from the document based on the provided question and combine the extracted parts of the document to create a final answer with references.

Here is a code snippet for your reference:

class BaseQAWithSourcesChain(Chain, ABC):
    """Question answering chain with sources over documents."""

    combine_documents_chain: BaseCombineDocumentsChain
    """Chain to use to combine documents."""
    question_key: str = "question"  #: :meta private:
    input_docs_key: str = "docs"  #: :meta private:
    answer_key: str = "answer"  #: :meta private:
    sources_answer_key: str = "sources"  #: :meta private:
    return_source_documents: bool = False
    """Return the source documents."""

    @classmethod
    def from_llm(
        cls,
        llm: BaseLanguageModel,
        document_prompt: BasePromptTemplate = EXAMPLE_PROMPT,
        question_prompt: BasePromptTemplate = QUESTION_PROMPT,
        combine_prompt: BasePromptTemplate = COMBINE_PROMPT,
        **kwargs: Any,
    ) -> BaseQAWithSourcesChain:
        """Construct the chain from an LLM."""
        llm_question_chain = LLMChain(llm=llm, prompt=question_prompt)
        llm_combine_chain = LLMChain(llm=llm, prompt=combine_prompt)
        combine_results_chain = StuffDocumentsChain(
            llm_chain=llm_combine_chain,
            document_prompt=document_prompt,
            document_variable_name="summaries",
        )
        reduce_documents_chain = ReduceDocumentsChain(
            combine_documents_chain=combine_results_chain
        )
        combine_documents_chain = MapReduceDocumentsChain(
            llm_chain=llm_question_chain,
            reduce_documents_chain=reduce_documents_chain,
            document_variable_name="context",
        )
        return cls(
            combine_documents_chain=combine_documents_chain,
            **kwargs,
        )

However, please note that the current implementation does not support image processing. If the images like diagrams and flowcharts contain important information to answer the question, you would need to use an OCR (Optical Character Recognition) tool to convert the image text into machine-readable text before feeding it into LangChain.

As for the different loaders you mentioned, they are used to load PDF files using different libraries or services. Each loader has its specific functionalities and requirements. For example, the AmazonTextractPDFLoader uses the Amazon Textract service to load PDF files and requires the amazon-textract-caller and boto3 Python packages. The MathpixPDFLoader uses the Mathpix service to load PDF files. The PyPDFLoader and PyPDFium2Loader use the pypdf and pypdfium2 libraries respectively to load PDF files and chunk at the character level. You can find more details about these loaders in the document loaders and parsers modules of LangChain.

I hope this helps! If you have any further questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

best document question answer for langchain for acdemic documents - having images(flowcharts,diagrams...) #10323

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

best document question answer for langchain for acdemic documents - having images(flowcharts,diagrams...) #10323

Uh oh!

Abhijithm2447 Sep 7, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 7, 2023

Sources

Abhijithm2447
Sep 7, 2023

dosubot[bot]
bot Sep 7, 2023