Skip to content

API timeout when trying to load into VectorStore #370

@SCantergiani

Description

@SCantergiani

Hi,

I'm trying to load over 1,000 documents into my vector store, but I'm encountering a timeout error. I also checked the LangChain AstraDB documentation, which mentions a parameter to set api_options. However, it appears this is only available in version 0.6.1, which hasn’t been released yet. Anyway, I would appreciate some help in determining the best way to fix this. I’ve tried batching and async approaches without success so far.

This is my code:

async def update_vector_db(collection_name):
    loader = JSONLoader(
        file_path="docs.json",
        jq_schema=".docs[]",
        content_key="body_plain",
        text_content=False,
        metadata_func=metadata_func,
    )

    docs = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    all_splits = text_splitter.split_documents(docs)
    all_splits_ids = {generate_document_id(split): split for split in all_splits}
    ids, documents = zip(*all_splits_ids.items())

    vector_store = AstraDBVectorStore(
        collection_name=collection_name,
        embedding=OpenAIEmbeddings(model=config.EMBEDDING_MODEL),
        namespace=os.environ.get("ASTRA_DB_NAMESPACE"),
        api_endpoint=os.environ.get("ASTRA_DB_API_ENDPOINT"),
        token=os.environ.get("ASTRA_DB_APPLICATION_TOKEN"),
        bulk_insert_batch_concurrency=80,
        bulk_insert_overwrite_concurrency=10,
        pre_delete_collection=True,
        hybrid_search=True,
    )

    await vector_store.aadd_documents(
        documents=documents,
        ids=ids,
        batch_size=50,
    )


if __name__ == "__main__":
    asyncio.run(update_vector_db(config.COLLECTION_NAME))

error:

astrapy.exceptions.data_api_exceptions.DataAPITimeoutException: timed out (timeout honoured: general_method_timeout_ms = 30000 ms)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions