Skip to content

[BUG]: VectorDB Retriever seems to not use id_property_external #448

@msenechal

Description

@msenechal

Before You Report a Bug, Please Confirm You Have Done The Following...

  • I have updated to the latest version of the packages.
  • I have searched for both existing issues and closed issues and found none that matched my issue.

neo4j-graphrag-python's version

1.10.1

Python version

3.10.7

Operating System

Windows & Macos

Dependencies

Using the QdrantNeo4jRetriever , it looks like no matter what I put in id_property_external , it always use the same id from my vector store:
i.e

id_property_external="uid",
id_property_neo4j="uid",

with top k of 1, gives me:

{'match_params': [['5383723d-06fb-414e-af62-d81e03db2852', 0.7332783]]], 'id_property': 'uid'}

note the value and format of the uid 5383...

id_property_external="test-whatever", # but same if I use another real attribute
id_property_neo4j="uid",

(or any real metadata that exists in my vector db instead of "test-whatever")
Also gives me:

{'match_params': [['5383723d-06fb-414e-af62-d81e03db2852', 0.7332783]]], 'id_property': 'uid'}

Same results, id used is the same 5383...

When looking at the code, not sure I find the issue as I see you do update the parameter payload with this external property id: https://github.com/neo4j/neo4j-graphrag-python/blob/main/src/neo4j_graphrag/retrievers/external/qdrant/qdrant.py#L220

More in slack:
https://neo4j.slack.com/archives/C092146TSR4/p1764265357649359

Code Snippet shared below in reproducible example

Reproducible example

import os
import logging
from qdrant_client import QdrantClient
from dotenv import load_dotenv
from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import QdrantNeo4jRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.embeddings import Embedder
from fastembed import TextEmbedding
from typing import Any

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class FastEmbedEmbedder(Embedder):
    def __init__(self, model_name: str = "nomic-ai/nomic-embed-text-v1", **kwargs: Any):
        super().__init__(**kwargs)
        self.model = TextEmbedding(model_name=model_name)
    
    def embed_query(self, text: str) -> list[float]:
        embeddings = list(self.model.embed([text]))
        result = embeddings[0].tolist()
        return result


def main():
    load_dotenv()
    
    QDRANT_URL = os.getenv('QDRANT_URL')
    QDRANT_API_KEY = os.getenv('QDRANT_API_KEY')
    QDRANT_COLLECTION_NAME = os.getenv("QDRANT_COLLECTION_NAME")
    QDRANT_PORT = os.getenv("QDRANT_PORT")
    NEO4J_URI = os.getenv("NEO4J_URI")
    NEO4J_USERNAME = os.getenv("NEO4J_USERNAME")
    NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    
    qdrant_client = QdrantClient(
        url=QDRANT_URL,
        api_key=QDRANT_API_KEY,
        port=QDRANT_PORT,
        verify=False,
    )
    
    driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
    
    embedder = FastEmbedEmbedder(model_name="nomic-ai/nomic-embed-text-v1")
    
    retriever = QdrantNeo4jRetriever(
        driver=driver,
        client=qdrant_client,
        collection_name=QDRANT_COLLECTION_NAME,
        using="nomic-ai/nomic-embed-text-v1",
        id_property_external="test-whatever",
        id_property_neo4j="uid",
        embedder=embedder,
        retrieval_query="MATCH (node:Section) WHERE node.uid=match_id_value RETURN node LIMIT 5"
    )
    
    query_text = "How was Gas detection tests performed?"
    ret = retriever.search(query_text=query_text, top_k=5)
    print(ret)

Relevant Log Output

2025-11-28 08:24:00,163 - httpx - INFO - HTTP Request: POST https://qdrant-poc.gliadatalake-poc.corp.aws.novonordisk.com/collections/rag_poc_hybrid/points/query "HTTP/1.1 200 OK"
2025-11-28 08:24:00,163 - httpcore.http11 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2025-11-28 08:24:00,165 - httpcore.http11 - DEBUG - receive_response_body.complete
2025-11-28 08:24:00,166 - httpcore.http11 - DEBUG - response_closed.started
2025-11-28 08:24:00,166 - httpcore.http11 - DEBUG - response_closed.complete
2025-11-28 08:24:00,168 - neo4j_graphrag.retrievers.external.qdrant.qdrant - DEBUG - Qdrant Store Cypher parameters: {'match_params': [['5383723d-06fb-414e-af62-d81e03db2852', 0.7332783], ['70155aa7-522a-4b4c-be84-d823c26ba985', 0.71326256], ['81ea44be-61da-423e-a75c-5cdc0d62ada2', 0.70934486], ['51575ba0-ae90-4bfb-a3f0-0ce65198fa40', 0.7056217], ['5bf30796-3091-47ac-acd2-07e4faf57ab7', 0.7024784]], 'id_property': 'uid'}
2025-11-28 08:24:00,168 - neo4j_graphrag.retrievers.external.qdrant.qdrant - DEBUG - Qdrant Store Cypher query: UNWIND $match_params AS match_param WITH match_param[0] AS match_id_value, match_param[1]
AS score MATCH (node) WHERE node[$id_property] = match_id_value MATCH (node:Section) WHERE node.uid=match_id_value RETURN node LIMIT 5

Expected Result

The id value in the generated query below to change based on the id_property_external used
{'match_params': [['5383723d-06fb-414e-af62-d81e03db2852', 0.7332783], ['70155aa7-522a-4b4c-be84-d823c26ba985', 0.71326256], ['81ea44be-61da-423e-a75c-5cdc0d62ada2', 0.70934486], ['51575ba0-ae90-4bfb-a3f0-0ce65198fa40', 0.7056217], ['5bf30796-3091-47ac-acd2-07e4faf57ab7', 0.7024784]], 'id_property': 'uid'}

What happened instead?

no matter what property I use in id_property_external, it always use the same IDs

Additional Info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions