issue creating a vector store from a collection using imported embeddings

Hi,
I have a question about declaring a vector store from a collection where I did the embeddings before hand (using openai-text-embedding small).

I created the collection as below:

```py
collection_definition = CollectionDefinition(
    vector=CollectionVectorOptions(
        dimension=1536, # openai embeddings small
        metric=VectorMetric.COSINE,
    ),
    indexing={"allow": ["$vector"]}
)
new_collection = database.create_collection(
    "test_collection_chunks_for_demo", 
    definition=collection_definition
)
```

And I fill the collection as below:
```py
embedding = response.data[0].embedding
        to_embed = {
            "chunk_id": f"doc_chunk_{i}_{table_name}",
            "catalog_name": catalog_name, # metadata
            "schema_name": schema_name,   # metadata
            "table_name": table_name,    # metadata
            "$vector": embedding  ,     # text to embed 
            "chunk": chunk       # raw text
        }
```

Later on during the retrieval, I want to use a vector store to make a `retriever` like this:

```py
vstore = AstraDBVectorStore(
    embedding=OpenAIEmbeddings(model="text-embedding-3-small", api_key=os.getenv("OpenAI_API_KEY")),
    collection_name="test_collection_chunks_for_demo,
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)
```

but got the following error message:

> ValueError: Astra DB collection 'test_collection_chunks_for_demo' is detected as having the following indexing policy: {"allow": ["$vector"]}. This is incompatible with the requested indexing policy for this object. Consider indexing anew on a fresh collection with the requested indexing policy, or alternatively align the requested indexing settings to the collection to keep using it.

I did try to make the collection without indexing on `$vector`. I can create a `retriever` (I'm following this example: [https://docs.datastax.com/en/ragstack/default-architecture/retrieval.html](https://docs.datastax.com/en/ragstack/default-architecture/retrieval.html)) but got the following error message:

> 417     logger.warning(invalid_doc_warning)
>     418     return None
>     419 return Document(
> --> [420](https://file+.vscode-resource.vscode-cdn.net/Users/jonathan/Documents/WORK/DATA_PLATFORM/agent-rag/data-platform/deployments/langgraph-chat/~/Documents/WORK/DATA_PLATFORM/agent-rag/data-platform/deployments/langgraph-chat/.venv/lib/python3.12/site-packages/langchain_astradb/utils/vector_store_codecs.py:420)     page_content=astra_document[self.content_field],
>     421     metadata=astra_document[DEFAULT_METADATA_FIELD_NAME],
>     422     id=astra_document["_id"],
>     423 )
> 
> KeyError: 'content'

Is this the wrong way to use imported embeddings ?
Is there an option I need to pass to `AstraDBVectorStore` to use this specific indexing ?

Thanks for any help.

Regards,
Jonathan

---
python 3.12.10

Package                   Version

astrapy                   2.0.1
langchain                 0.3.27
langchain-astradb         0.6.0
langchain-community       0.3.27
langchain-core            0.3.72
langchain-mcp-adapters    0.1.9
langchain-openai          0.3.28
langchain-tavily          0.2.0
langchain-text-splitters  0.3.9
langgraph                 0.6.0
langgraph-api             0.2.108
langgraph-checkpoint      2.1.1
langgraph-cli             0.3.6
langgraph-prebuilt        0.6.0
langgraph-runtime-inmem   0.6.3
langgraph-sdk             0.2.0
langsmith                 0.4.8
openai                    1.97.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue creating a vector store from a collection using imported embeddings #369

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

issue creating a vector store from a collection using imported embeddings #369

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions