[Question]: How can I utilize JSONQueryEngine with MongoDB documents ? #8410
Replies: 10 comments
-
🤖 Hello, Thanks for your question. If you want to use JSONQueryEngine directly on MongoDB documents, you can use the Here is a relevant code snippet: def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
knn_beta: Dict[str, Any] = {
"vector": query.query_embedding,
"path": self._embedding_key,
"k": query.similarity_top_k,
}
if query.filters:
knn_beta["filter"] = _to_mongodb_filter(query.filters)
pipeline = [
{
"$search": {
"index": self._index_name,
"knnBeta": knn_beta,
}
},
{"$project": {"score": {"$meta": "searchScore"}, self._embedding_key: 0}},
]
cursor = self._collection.aggregate(pipeline)
...
return result This method constructs a MongoDB aggregation pipeline that uses the You can find more details in the source code of the Please let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot this doesn't seem to address the case, I don't have mongo vector database, I have JSON mongodb documents, stored in mongodb, and I need to perform queries on them using JSONQueryEngine |
Beta Was this translation helpful? Give feedback.
-
🤖 Hello, To use the JSONQueryEngine directly on MongoDB documents in the LlamaIndex framework, you would need to convert your MongoDB documents into JSON format and then pass them to the JSONQueryEngine. Here's how you can do it: from llama_index.indices.struct_store.json_query import JSONQueryEngine, JSONType
from llama_index.indices.query.schema import QueryBundle
from llama_index.indices.service_context import ServiceContext
# Assuming you have a MongoDB document as a dictionary
mongo_doc = {
"field1": "value1",
"field2": "value2",
# ...
}
# Convert MongoDB document to JSON
json_val = cast(JSONType, mongo_doc)
# Define your JSON schema
json_schema = cast(JSONType, {})
# Create a service context
service_context = ServiceContext()
# Create a JSONQueryEngine instance
query_engine = JSONQueryEngine(
json_value=json_val,
json_schema=json_schema,
service_context=service_context,
verbose=True,
synthesize_response=True,
)
# Use the query engine
response = cast(Response, query_engine.query(QueryBundle("your_query_here")))
# The response object now contains the result of the query
print(response.response) Please note that you need to define your own output_processor function to process the output as needed. The JSON schema (json_schema) should also be defined according to your specific needs. The example above assumes a simple case where the schema is an empty dictionary. In the context shared, the JSONQueryEngine class has a default_output_processor function that executes the JSON Path query, which can be used if no custom output_processor is provided. The The For more details, you can refer to the source code of the I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-beta the example you provided is assuming I have a single json document, in my case it's couple of thousands documents, in your case you don'treally connect to the database and read the data, show me how to connect to the mongodb, query on a collection using JSONQueryEngine |
Beta Was this translation helpful? Give feedback.
-
🤖 Sure, I can provide an example of how to connect to a MongoDB, read data from a specific collection, and use the JSONQueryEngine in the LlamaIndex framework to perform queries on these documents. Here's a Python script that does just that: # Import necessary libraries
import logging
import sys
import json
from llama_index import SummaryIndex, SimpleMongoReader, ServiceContext
from llama_index.indices.struct_store.json_query import JSONQueryEngine
from IPython.display import Markdown, display
import os
# Set up logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
# Define MongoDB connection parameters
host = "<host>"
port = "<port>"
db_name = "<db_name>"
collection_name = "<collection_name>"
# Define query parameters
query_dict = {}
field_names = ["text"]
# Create a MongoDB reader and load data
reader = SimpleMongoReader(host, port)
documents = reader.load_data(
db_name, collection_name, field_names, query_dict=query_dict
)
# Create a SummaryIndex from the documents
index = SummaryIndex.from_documents(documents)
# Create a ServiceContext
service_context = ServiceContext()
# Create a JSONQueryEngine and perform a query
json_value = json.loads(documents)
json_schema = {} # Define your JSON schema here
query_engine = JSONQueryEngine(json_value, json_schema, service_context)
response = query_engine.query("<query_text>")
# Display the response
display(Markdown(f"<b>{response}</b>")) Please replace The SimpleMongoReader class in the LlamaIndex uses the following parameters and methods for connecting to a MongoDB database and reading data from a collection: Parameters:
Methods:
The For more details, you can refer to the MongoDemo notebook and the json_query.py file in the LlamaIndex repository. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-bot am getting now an error with
|
Beta Was this translation helpful? Give feedback.
-
🤖 Hello, The error you're encountering is due to the fact that the To resolve this issue, you don't need to convert the documents into a JSON object. Instead, you can directly use the list of # Create a MongoDB reader and load data
reader = SimpleMongoReader(host, port)
documents = reader.load_data(
db_name, collection_name, field_names, query_dict=query_dict
)
# Now, 'documents' is a list of Document objects. You can iterate over this list and access the data of each document.
for document in documents:
print(document.text) # or any other operation you want to perform on the document In this code, I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@dosu-beta is that a best practice to loopover all my documents ?my goald is to use LLM on my mongodb json documents |
Beta Was this translation helpful? Give feedback.
-
🤖 Hello, The error you're encountering is due to the fact that To use the Language Model (LLM) on MongoDB JSON documents in the LlamaIndex framework, you should first convert the BSON documents into JSON. This can be done using the Here is an example of how you can modify the from bson import json_util
def load_data(
self,
db_name: str,
collection_name: str,
field_names: List[str] = ["text"],
query_dict: Optional[Dict] = None,
) -> List[Document]:
documents = []
db = self.client[db_name]
if query_dict is None:
cursor = db[collection_name].find()
else:
cursor = db[collection_name].find(query_dict)
for item in cursor:
text = ""
for field_name in field_names:
if field_name not in item:
raise ValueError(
f"`{field_name}` field not found in Mongo document."
)
text += json_util.dumps(item[field_name]) # Use bson.json_util.dumps instead of just appending the field
documents.append(Document(text))
return documents This modification uses As for your concern about efficiency, this approach should be reasonably efficient even for large numbers of documents. The I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
@Exorcismus dis you got any solution for this ?? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question Validation
Question
Am using the below code snippet to read data from a mongoDB
as my data structure is different from expected, I had to combine multiple values into
text
documents = reader.load_data( db_name, collection_name, field_names, query_dict) index = GPTVectorStoreIndex.from_documents(documents) index.storage_context.persist() query_engine = index.as_query_engine( similarity_top_k=5, service_context=service_context, response_synthesizer=response_synthesizer, verbose=True)
I can see there is a
JSON Query Engine
but I can't see an easy way to combine both.is there a best practice on how to JSONQueryEngine on Mongo documents directly ?
Beta Was this translation helpful? Give feedback.
All reactions