-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Hi,
I'm trying to load over 1,000 documents into my vector store, but I'm encountering a timeout error. I also checked the LangChain AstraDB documentation, which mentions a parameter to set api_options. However, it appears this is only available in version 0.6.1, which hasn’t been released yet. Anyway, I would appreciate some help in determining the best way to fix this. I’ve tried batching and async approaches without success so far.
This is my code:
async def update_vector_db(collection_name):
loader = JSONLoader(
file_path="docs.json",
jq_schema=".docs[]",
content_key="body_plain",
text_content=False,
metadata_func=metadata_func,
)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
all_splits_ids = {generate_document_id(split): split for split in all_splits}
ids, documents = zip(*all_splits_ids.items())
vector_store = AstraDBVectorStore(
collection_name=collection_name,
embedding=OpenAIEmbeddings(model=config.EMBEDDING_MODEL),
namespace=os.environ.get("ASTRA_DB_NAMESPACE"),
api_endpoint=os.environ.get("ASTRA_DB_API_ENDPOINT"),
token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
bulk_insert_batch_concurrency=80,
bulk_insert_overwrite_concurrency=10,
pre_delete_collection=True,
hybrid_search=True,
)
await vector_store.aadd_documents(
documents=documents,
ids=ids,
batch_size=50,
)
if __name__ == "__main__":
asyncio.run(update_vector_db(config.COLLECTION_NAME))
error:
astrapy.exceptions.data_api_exceptions.DataAPITimeoutException: timed out (timeout honoured: general_method_timeout_ms = 30000 ms)
Metadata
Metadata
Assignees
Labels
No labels