Skip to content

[Bug] VECTOR_DB_TYPE=milvus ignored, MCP uses FAISS despite correct configuration #62

@ProxiBlue

Description

@ProxiBlue

Bug Description

The @pleaseai/context-please-mcp v0.5.0 package ignores the VECTOR_DB_TYPE=milvus environment variable and continues to use FAISS for vector storage, despite having Milvus correctly configured and running.

Environment

  • Package: @pleaseai/context-please-mcp@0.5.0
  • Node.js: v20.17
  • Milvus: v2.4.1 (self-hosted, running in Docker)
  • Embedding Provider: OpenAI (using SambaNova API endpoint)
  • Platform: Linux (DDEV development environment)

Configuration

{
  "context-please": {
    "type": "stdio",
    "command": "npx",
    "args": ["-y", "@pleaseai/context-please-mcp@latest"],
    "env": {
      "VECTOR_DB_TYPE": "milvus",
      "FAISS_STORAGE_DIR": "/home/lucas/.context/faiss-indexes/",
      "EMBEDDING_PROVIDER": "OpenAI",
      "OPENAI_BASE_URL": "https://api.sambanova.ai/v1",
      "OPENAI_API_KEY": "${SAMBANOVA_API_KEY}",
      "EMBEDDING_MODEL": "E5-Mistral-7B-Instruct",
      "NODE_ENV": "production",
      "MILVUS_ADDRESS": "milvus:19530"
    }
  }
}

Steps to Reproduce

  1. Set up Milvus:

    # Milvus 2.4.1 running in Docker
    # Health check: http://milvus:9091/healthz returns "OK"
    # Accessible at: milvus:19530
  2. Configure MCP with Milvus:

    • Set VECTOR_DB_TYPE=milvus
    • Set MILVUS_ADDRESS=milvus:19530
    • Start the MCP server
  3. Verify environment variables (checking running process):

    cat /proc/<pid>/environ | tr '\0' '\n' | grep -E "(VECTOR|MILVUS)"

    Output:

    VECTOR_DB_TYPE=milvus
    MILVUS_ADDRESS=milvus:19530
    

    ✅ Environment variables are correctly set

  4. Index a codebase:

    mcp-cli call context-please/index_codebase '{"path": "/path/to/code", "splitter": "ast"}'
  5. Check where data is stored:

    # Check FAISS directory
    ls -lh ~/.context/faiss-indexes/
    # Output: Contains index files (dense.index, sparse.json, documents.json)
    
    # Check Milvus collections
    python3 <<PYTHON
    from pymilvus import connections, utility, Collection
    connections.connect(host='milvus', port='19530')
    for coll in utility.list_collections():
        c = Collection(coll)
        print(f"{coll}: {c.num_entities} entities")
    PYTHON
    # Output: Collections exist but have 0 entities

Expected Behavior

When VECTOR_DB_TYPE=milvus is set:

  • Data should be stored in Milvus collections
  • FAISS should not be used
  • Milvus collections should contain indexed data
  • Search should query Milvus

Actual Behavior

Despite VECTOR_DB_TYPE=milvus:

  • ❌ Data is stored in ~/.context/faiss-indexes/
  • ❌ Milvus collections are created but remain empty (0 entities)
  • ❌ Search fails with: "Collection exists but statistics could not be retrieved from the vector database"
  • ✅ Indexing reports success: "541 files, 2637 chunks"
  • ✅ Environment variables are correctly set in the process

Evidence

1. Process Environment Variables

$ cat /proc/28551/environ | tr '\0' '\n' | grep -E "(VECTOR|MILVUS)"
EMBEDDING_MODEL=E5-Mistral-7B-Instruct
EMBEDDING_PROVIDER=OpenAI
FAISS_STORAGE_DIR=/home/lucas/.context/faiss-indexes/
MILVUS_ADDRESS=milvus:19530
VECTOR_DB_TYPE=milvus  ← CORRECTLY SET

2. Data Location

$ ls -lh ~/.context/faiss-indexes/hybrid_code_chunks_*/
total 1.2M
-rw-r--r-- 1 lucas lucas 1.1M dense.index      ← Data in FAISS
-rw-r--r-- 1 lucas lucas 121K documents.json
-rw-r--r-- 1 lucas lucas  151 metadata.json
-rw-r--r-- 1 lucas lucas  55K sparse.json

3. Milvus Collections

# Collections created but empty
hybrid_code_chunks_40d1b2d8:
  Entities: 0NO DATA
  Indexes:
    - vector: AUTOINDEX
    - sparse_vector: SPARSE_INVERTED_INDEX

4. Search Error

Error: Collection exists for '/var/www/html' but statistics could not be retrieved from the vector database.

Root Cause Analysis

The MCP server appears to:

  1. Read VECTOR_DB_TYPE=milvus configuration ✅
  2. Create Milvus collections (with correct schema) ✅
  3. But then use FAISS for actual data storage
  4. Try to read statistics from Milvus (which is empty) during search ❌

This suggests a conditional logic bug where the vector DB type check isn't being applied consistently across:

  • Collection creation (uses Milvus)
  • Data insertion (uses FAISS)
  • Data retrieval (tries Milvus)

Additional Notes

Sparse Vector Index Issue (Solved)

When Milvus is properly used, there's a secondary issue: self-hosted Milvus 2.4.x requires manual creation of sparse vector indexes. I've created a workaround:

from pymilvus import connections, Collection, utility

connections.connect(host='milvus', port='19530')
for coll_name in utility.list_collections():
    coll = Collection(coll_name)
    indexes = {idx.field_name for idx in coll.indexes}

    if 'sparse_vector' not in indexes:
        coll.create_index(
            field_name="sparse_vector",
            index_params={
                "metric_type": "IP",
                "index_type": "SPARSE_INVERTED_INDEX",
                "params": {"inverted_index_algo": "DAAT_WAND"}
            },
            index_name="sparse_inverted_index"
        )
        coll.load()

This is a separate issue but worth noting for self-hosted Milvus deployments.

Workaround

For now, using VECTOR_DB_TYPE=faiss-local works perfectly:

  • Indexing succeeds
  • Search works
  • No Milvus dependency issues

Impact

  • Severity: High - Core feature (Milvus integration) doesn't work
  • Workaround: Use FAISS (works fine for smaller codebases)
  • User Impact: Users cannot use production-grade Milvus for large codebases

System Information

$ npm view @pleaseai/context-please-mcp version
0.5.0

$ node --version
v20.17.0

$ python3 -c "from pymilvus import __version__; print(__version__)"
2.6.5

Related

Thank you for this excellent tool! Happy to provide more information or test fixes.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions