-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi,
I have a question about declaring a vector store from a collection where I did the embeddings before hand (using openai-text-embedding small).
I created the collection as below:
collection_definition = CollectionDefinition(
vector=CollectionVectorOptions(
dimension=1536, # openai embeddings small
metric=VectorMetric.COSINE,
),
indexing={"allow": ["$vector"]}
)
new_collection = database.create_collection(
"test_collection_chunks_for_demo",
definition=collection_definition
)
And I fill the collection as below:
embedding = response.data[0].embedding
to_embed = {
"chunk_id": f"doc_chunk_{i}_{table_name}",
"catalog_name": catalog_name, # metadata
"schema_name": schema_name, # metadata
"table_name": table_name, # metadata
"$vector": embedding , # text to embed
"chunk": chunk # raw text
}
Later on during the retrieval, I want to use a vector store to make a retriever
like this:
vstore = AstraDBVectorStore(
embedding=OpenAIEmbeddings(model="text-embedding-3-small", api_key=os.getenv("OpenAI_API_KEY")),
collection_name="test_collection_chunks_for_demo,
token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)
but got the following error message:
ValueError: Astra DB collection 'test_collection_chunks_for_demo' is detected as having the following indexing policy: {"allow": ["$vector"]}. This is incompatible with the requested indexing policy for this object. Consider indexing anew on a fresh collection with the requested indexing policy, or alternatively align the requested indexing settings to the collection to keep using it.
I did try to make the collection without indexing on $vector
. I can create a retriever
(I'm following this example: https://docs.datastax.com/en/ragstack/default-architecture/retrieval.html) but got the following error message:
417 logger.warning(invalid_doc_warning)
418 return None
419 return Document(
--> 420 page_content=astra_document[self.content_field],
421 metadata=astra_document[DEFAULT_METADATA_FIELD_NAME],
422 id=astra_document["_id"],
423 )KeyError: 'content'
Is this the wrong way to use imported embeddings ?
Is there an option I need to pass to AstraDBVectorStore
to use this specific indexing ?
Thanks for any help.
Regards,
Jonathan
python 3.12.10
Package Version
astrapy 2.0.1
langchain 0.3.27
langchain-astradb 0.6.0
langchain-community 0.3.27
langchain-core 0.3.72
langchain-mcp-adapters 0.1.9
langchain-openai 0.3.28
langchain-tavily 0.2.0
langchain-text-splitters 0.3.9
langgraph 0.6.0
langgraph-api 0.2.108
langgraph-checkpoint 2.1.1
langgraph-cli 0.3.6
langgraph-prebuilt 0.6.0
langgraph-runtime-inmem 0.6.3
langgraph-sdk 0.2.0
langsmith 0.4.8
openai 1.97.1