Skip to content

Conversation

@Jonathan-Improving
Copy link

@Jonathan-Improving Jonathan-Improving commented Dec 1, 2025

Summary

Changes

Introduces a simple MCP wrapper around the Valkey vector search capability, and provides a full semantic search MCP for Valkey, leveraging its VSS capabilities, and encapsulating vector embeddings so that agents can interop using pure natural language and/or keywords.

User experience

Vector Similarity Search

Prior to the introduction of this tool users could not interact with Valkey's vector search capabilities via an MCP client, but now one can and retrieve documents stored in a Valkey instance, sought according to a vector embedding.

Caveat: The user must provide the vector embedding itself, and the vector embedding must have been generated by the same model that the documents stored within the Valkey instance contain (if applicable), presuming that the end-user has already seeded a Valkey instance with documents each having a vector embedding field used to VSS.

Results

A basic test revealed that the agent struggled to generate vector embeddings from an MCP that provides embeddings, and then pass them on intact to this VSS MCP, which requires a list of floats (vector embedding) for input. In one test the agent passed a completely fake vector embedding to the VSS MCP, and in another test the agent hung for more than 10 minutes before it was terminated. This implies that a specifically configured agentic context is key for the VSS MCP to be useful.

Semantic Search

The role of a data scientist would first add documents to the Valkey database using the add_documents MCP, under a specifically named collection, that can be sought using semantic search. The role of researcher would leverage the semantic_search MCP to seek for documents added under a specifically named collection, or query the list_collections MCP to find out what named collections are available for searching.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Not a breaking change.

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@Jonathan-Improving Jonathan-Improving changed the title Semantic Search MCP for VAlkey Semantic Search MCP for Valkey Dec 1, 2025
Example:
provider = BedrockEmbeddings(
region_name="us-east-1",
model_id="amazon.titan-embed-text-v1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jonathan-Improving Jonathan-Improving changed the title Semantic Search MCP for Valkey VSS and Semantic Search MCPs for Valkey Dec 4, 2025
@Jonathan-Improving Jonathan-Improving changed the title VSS and Semantic Search MCPs for Valkey feat(api): VSS and Semantic Search MCPs for Valkey Dec 4, 2025
@Jonathan-Improving Jonathan-Improving force-pushed the feature/semantic-search branch 3 times, most recently from 3042a57 to 26ecda1 Compare December 4, 2025 23:03
@Jonathan-Improving
Copy link
Author

Latest enhancements include full FT.SEARCH support in the VSS MCP as well as structured return types for APIs

@Jonathan-Improving
Copy link
Author

Removing draft status to push this forward, one thing remains: to thoroughly test the Bedrock integration.

@Jonathan-Improving Jonathan-Improving marked this pull request as ready for review December 8, 2025 18:19
@Jonathan-Improving Jonathan-Improving requested review from a team as code owners December 8, 2025 18:19
@Jonathan-Improving
Copy link
Author

Ready for review

Copy link
Contributor

@seaofawareness seaofawareness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this addition, please address comments

'embedding', 'VECTOR', 'FLAT', '6',
'TYPE', 'FLOAT32',
'DIM', str(actual_dimensions),
'DISTANCE_METRIC', 'L2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why not expose settings to user from env variables? Use of FLAT is limiting, HNSW with default settings should be supported for this PR.

async def vector_search(index: str,
field: str,
vector: List[float],
filter_expression: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to sanitize this string to ensure only safe/approved content is allowed to prevent unexpected executions? Any length limit checks etc?


# Initialize embeddings provider
try:
embeddings_provider = create_embeddings_provider()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to support mock enbedding provider to enable validation without setup of actual provider?

@pytest.fixture(autouse=True)
def setup_valkey_config(self):
"""Configure Valkey connection to use environment variables."""
original_config = VALKEY_CFG.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error out if Valkey is not available or doesn't have search module

print("="*70)

@pytest.mark.asyncio
async def test_semantic_search_with_filter_expression(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for invalid input

assert isinstance(result, str)

@pytest.mark.asyncio
async def test_vector_search_connection_error(self, mock_connection):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for missing search module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

5 participants