-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Bug Description
The @pleaseai/context-please-mcp v0.5.0 package ignores the VECTOR_DB_TYPE=milvus environment variable and continues to use FAISS for vector storage, despite having Milvus correctly configured and running.
Environment
- Package:
@pleaseai/context-please-mcp@0.5.0 - Node.js: v20.17
- Milvus: v2.4.1 (self-hosted, running in Docker)
- Embedding Provider: OpenAI (using SambaNova API endpoint)
- Platform: Linux (DDEV development environment)
Configuration
{
"context-please": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@pleaseai/context-please-mcp@latest"],
"env": {
"VECTOR_DB_TYPE": "milvus",
"FAISS_STORAGE_DIR": "/home/lucas/.context/faiss-indexes/",
"EMBEDDING_PROVIDER": "OpenAI",
"OPENAI_BASE_URL": "https://api.sambanova.ai/v1",
"OPENAI_API_KEY": "${SAMBANOVA_API_KEY}",
"EMBEDDING_MODEL": "E5-Mistral-7B-Instruct",
"NODE_ENV": "production",
"MILVUS_ADDRESS": "milvus:19530"
}
}
}Steps to Reproduce
-
Set up Milvus:
# Milvus 2.4.1 running in Docker # Health check: http://milvus:9091/healthz returns "OK" # Accessible at: milvus:19530
-
Configure MCP with Milvus:
- Set
VECTOR_DB_TYPE=milvus - Set
MILVUS_ADDRESS=milvus:19530 - Start the MCP server
- Set
-
Verify environment variables (checking running process):
cat /proc/<pid>/environ | tr '\0' '\n' | grep -E "(VECTOR|MILVUS)"
Output:
VECTOR_DB_TYPE=milvus MILVUS_ADDRESS=milvus:19530✅ Environment variables are correctly set
-
Index a codebase:
mcp-cli call context-please/index_codebase '{"path": "/path/to/code", "splitter": "ast"}' -
Check where data is stored:
# Check FAISS directory ls -lh ~/.context/faiss-indexes/ # Output: Contains index files (dense.index, sparse.json, documents.json) # Check Milvus collections python3 <<PYTHON from pymilvus import connections, utility, Collection connections.connect(host='milvus', port='19530') for coll in utility.list_collections(): c = Collection(coll) print(f"{coll}: {c.num_entities} entities") PYTHON # Output: Collections exist but have 0 entities
Expected Behavior
When VECTOR_DB_TYPE=milvus is set:
- Data should be stored in Milvus collections
- FAISS should not be used
- Milvus collections should contain indexed data
- Search should query Milvus
Actual Behavior
Despite VECTOR_DB_TYPE=milvus:
- ❌ Data is stored in
~/.context/faiss-indexes/ - ❌ Milvus collections are created but remain empty (0 entities)
- ❌ Search fails with: "Collection exists but statistics could not be retrieved from the vector database"
- ✅ Indexing reports success: "541 files, 2637 chunks"
- ✅ Environment variables are correctly set in the process
Evidence
1. Process Environment Variables
$ cat /proc/28551/environ | tr '\0' '\n' | grep -E "(VECTOR|MILVUS)"
EMBEDDING_MODEL=E5-Mistral-7B-Instruct
EMBEDDING_PROVIDER=OpenAI
FAISS_STORAGE_DIR=/home/lucas/.context/faiss-indexes/
MILVUS_ADDRESS=milvus:19530
VECTOR_DB_TYPE=milvus ← CORRECTLY SET2. Data Location
$ ls -lh ~/.context/faiss-indexes/hybrid_code_chunks_*/
total 1.2M
-rw-r--r-- 1 lucas lucas 1.1M dense.index ← Data in FAISS
-rw-r--r-- 1 lucas lucas 121K documents.json
-rw-r--r-- 1 lucas lucas 151 metadata.json
-rw-r--r-- 1 lucas lucas 55K sparse.json3. Milvus Collections
# Collections created but empty
hybrid_code_chunks_40d1b2d8:
Entities: 0 ← NO DATA
Indexes:
- vector: AUTOINDEX
- sparse_vector: SPARSE_INVERTED_INDEX4. Search Error
Error: Collection exists for '/var/www/html' but statistics could not be retrieved from the vector database.
Root Cause Analysis
The MCP server appears to:
- Read
VECTOR_DB_TYPE=milvusconfiguration ✅ - Create Milvus collections (with correct schema) ✅
- But then use FAISS for actual data storage ❌
- Try to read statistics from Milvus (which is empty) during search ❌
This suggests a conditional logic bug where the vector DB type check isn't being applied consistently across:
- Collection creation (uses Milvus)
- Data insertion (uses FAISS)
- Data retrieval (tries Milvus)
Additional Notes
Sparse Vector Index Issue (Solved)
When Milvus is properly used, there's a secondary issue: self-hosted Milvus 2.4.x requires manual creation of sparse vector indexes. I've created a workaround:
from pymilvus import connections, Collection, utility
connections.connect(host='milvus', port='19530')
for coll_name in utility.list_collections():
coll = Collection(coll_name)
indexes = {idx.field_name for idx in coll.indexes}
if 'sparse_vector' not in indexes:
coll.create_index(
field_name="sparse_vector",
index_params={
"metric_type": "IP",
"index_type": "SPARSE_INVERTED_INDEX",
"params": {"inverted_index_algo": "DAAT_WAND"}
},
index_name="sparse_inverted_index"
)
coll.load()This is a separate issue but worth noting for self-hosted Milvus deployments.
Workaround
For now, using VECTOR_DB_TYPE=faiss-local works perfectly:
- Indexing succeeds
- Search works
- No Milvus dependency issues
Impact
- Severity: High - Core feature (Milvus integration) doesn't work
- Workaround: Use FAISS (works fine for smaller codebases)
- User Impact: Users cannot use production-grade Milvus for large codebases
System Information
$ npm view @pleaseai/context-please-mcp version
0.5.0
$ node --version
v20.17.0
$ python3 -c "from pymilvus import __version__; print(__version__)"
2.6.5Related
- Milvus Documentation: https://milvus.io/docs/v2.4.x/sparse_vector.md
- MCP + Milvus: https://milvus.io/docs/milvus_and_mcp.md
Thank you for this excellent tool! Happy to provide more information or test fixes.