Skip to content

MONGODB BUG FIX: knnVector is deprecatedΒ #3970

@ranfysvalle02

Description

@ranfysvalle02

πŸ› Describe the bug

[Refactor] Upgrade MongoDB Vector Store from legacy knnBeta to stable vectorSearch

πŸš€ Summary

https://www.mongodb.com/docs/atlas/atlas-search/operators-collectors/knn-beta/#:~:text=The%20Atlas%20Search%20knnVector%20field,%2C%20images%2C%20and%20other%20content.

The current implementation of mem0/vector_stores/mongodb.py uses the deprecated knnVector index type and legacy index definition structure. MongoDB Atlas has moved the Vector Search feature to General Availability (GA), changing the syntax for index creation and search.

πŸ› The Problem

In the create_col method (lines 66-83), the code defines the index using the legacy mappings syntax:

# CURRENT CODE (Deprecated)
definition={
    "mappings": {
        "dynamic": False,
        "fields": {
            "embedding": {
                "type": "knnVector",  # <--- DEPRECATED
                "dimensions": self.embedding_model_dims,
                "similarity": self.SIMILARITY_METRIC,
            }
        },
    }
}

Issues:

  1. Deprecation: knnVector is legacy. The correct type is vector.
  2. Recall/Accuracy: The search method sets numCandidates to be equal to limit (line 144). For HNSW indexes, numCandidates should be significantly higher (10x-20x) than the limit to ensure accurate results.

πŸ›  Proposed Solution

Update the MongoDB class to use the stable vectorSearch index type and numDimensions.

1. Update Index Creation (create_col)

Replace the mappings definition with the fields list format required for type="vectorSearch".

# NEW IMPLEMENTATION
search_index_model = SearchIndexModel(
    name=self.index_name,
    type="vectorSearch",  # Explicitly set index type
    definition={
        "fields": [
            {
                "type": "vector",
                "path": "embedding",
                "numDimensions": self.embedding_model_dims, # Note: 'numDimensions' not 'dimensions'
                "similarity": self.SIMILARITY_METRIC,
            }
        ]
    },
)

2. Update Search Logic (search)

In the $vectorSearch pipeline (line 144), increase numCandidates to improve search accuracy.

# NEW IMPLEMENTATION
pipeline = [
    {
        "$vectorSearch": {
            "index": self.index_name,
            "path": "embedding",
            "queryVector": vectors,
            "limit": limit,
            "numCandidates": limit * 20, # Recommended: 10x-20x the limit
        }
    },
    {"$set": {"score": {"$meta": "vectorSearchScore"}}},
    {"$project": {"embedding": 0}},
]

βœ… Acceptance Criteria

  • Index creation uses type="vectorSearch".
  • Field definition uses numDimensions instead of dimensions.
  • Search pipeline explicitly sets numCandidates > limit.
  • knnVector references removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions