Retrieving the metadata of query from Pinecone #10063

Abe410 · 2023-08-31T18:07:01Z

Abe410
Aug 31, 2023

Hi

I have a chatbot which uses gpt-3.5-turbo as an LLM and Pinecone as the vector store.

I have some documents against which I have metadata (let's say it is the article URL of the document)

I want my agent to tell me the URL as well from the metadata, along with the result.

Is this possible?

This is how I have set up my retriever:

def text_retrieval_chain():
    # main retrieval chain class
    class RetrievalChain:
        def __init__(self, llm, retriever):
            self.chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

        def run(self, prompt):
            response = self.chain(prompt)
            return response['result']

    # vectorstore
    index_name = "data"
    vectorstore = Pinecone.from_existing_index(
        index_name=index_name, namespace="textdata", embedding=embeddings
    )
    retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
    return RetrievalChain(llm=tool_llm, retriever=retriever)

This returns me the retrieval chain which then gets passed as a tool.

Moreover, if I have my metadata as a UNIX timestamp, is it possible to pass the UNIX timestamp through the agent in the search_kwargs? I can do that manually, but what if the user asks "Give me the latest data"? Then is there a way for the agent to use the LLM and automatically fill up the metadata field for a filtered query?

@dosu-bot · 2023-08-31T18:14:22Z

dosubot[bot]
bot Aug 31, 2023

🤖

Hi,

Yes, it is possible to retrieve metadata such as a URL from Pinecone along with the result. This can be done by passing metadata along with the texts to the Pinecone.from_texts method. The metadata is then retrieved along with the search results. Here is an example:

metadatas = [{"page": i} for i in range(len(texts))]
docsearch = Pinecone.from_texts(
    texts,
    embedding_openai,
    index_name=index_name,
    metadatas=metadatas,
    namespace=namespace_name,
)
output = docsearch.similarity_search(needs, k=1, namespace=namespace_name)
assert output == [Document(page_content=needs, metadata={"page": 0.0})]

You can find this in the test_from_texts_with_metadatas method in the test_pinecone.py file.

As for passing a UNIX timestamp as metadata through the agent in the search_kwargs for a filtered query, the PineconeTranslator class in the pinecone.py file shows that the LangChain framework supports a variety of comparison operators, including equality and inequality, less than and greater than, and their inclusive versions. This suggests that it is possible to pass a UNIX timestamp as metadata through the agent in the search_kwargs for a filtered query. However, the specific implementation would depend on how the UNIX timestamp is used in the comparison. You can find more details in the pinecone.py file.

To modify the RetrievalChain class to return the metadata along with the result, you can modify the _get_docs and _aget_docs methods to return a tuple of documents and their metadata. Here's how you can do it:

def _get_docs(
    self, inputs: Dict[str, Any], *, run_manager: CallbackManagerForChainRun
) -> Tuple[List[Document], List[Dict[str, Any]]]:
    question = inputs[self.question_key]
    docs = self.retriever.get_relevant_documents(
        question, callbacks=run_manager.get_child()
    )
    metadata = [doc.metadata for doc in docs]
    return self._reduce_tokens_below_limit(docs), metadata

async def _aget_docs(
    self, inputs: Dict[str, Any], *, run_manager: AsyncCallbackManagerForChainRun
) -> Tuple[List[Document], List[Dict[str, Any]]]:
    question = inputs[self.question_key]
    docs = await self.retriever.aget_relevant_documents(
        question, callbacks=run_manager.get_child()
    )
    metadata = [doc.metadata for doc in docs]
    return self._reduce_tokens_below_limit(docs), metadata

This assumes that the Document class has a metadata attribute. If it doesn't, you'll need to modify the Document class to include this attribute. You can find more details in the retrieval.py file.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieving the metadata of query from Pinecone #10063

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Retrieving the metadata of query from Pinecone #10063

Uh oh!

Abe410 Aug 31, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Aug 31, 2023

Sources

Abe410
Aug 31, 2023

dosubot[bot]
bot Aug 31, 2023