-
Notifications
You must be signed in to change notification settings - Fork 298
Adding semantic caching with Azure Managed Redis #3024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add SemanticCacheOptions with similarity threshold, max results, TTL - Add AzureManagedRedisOptions for Redis connection configuration - Add EmbeddingProviderOptions for Azure OpenAI configuration - Wire semantic cache options into RuntimeOptions and RuntimeConfig - Add UserProvided flags following DAB repository patterns
- Add SemanticCacheOptionsConverterFactory with validation - Add AzureManagedRedisOptionsConverterFactory - Add EmbeddingProviderOptionsConverterFactory - Register converters in RuntimeConfigLoader.GetSerializationOptions() - Validate similarity threshold (0.0-1.0) and numeric fields
… Redis - Implement AzureOpenAIEmbeddingService with exponential backoff retry - Implement RedisVectorStore with RediSearch vector similarity (KNN) - Implement SemanticCacheService orchestration layer - Add SemanticCacheResult DTO - Register services in DI with conditional configuration validation - Use COSINE distance metric for text embeddings - Support automatic Redis vector index creation
- Architecture overview and component descriptions - Configuration examples and parameter reference - Usage patterns and integration examples - Performance characteristics and scalability guidance - Troubleshooting guide and monitoring recommendations
Add ValidateSemanticCacheConfiguration() method to RuntimeConfigValidator to ensure semantic cache is properly configured when enabled. **Validations:** - Validates Azure Managed Redis connection string is not null/empty - Validates embedding provider endpoint, API key, and model are configured - Validates similarity-threshold is between 0.0 and 1.0 - Validates max-results and expire-seconds are positive integers - Integrated into ValidateConfigProperties() for startup validation Completes semantic caching infrastructure implementation.
…r semantic caching
…rFactory, updating integration tests, e2e tests and readme file
|
Hi @robertopc1, once you think the PR is ready for review please change it from a draft to an open PR. |
Thank you @RubenCerna2079 - I just did :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds semantic caching support for Data API Builder, enabling repeated or semantically similar SQL queries to be served from cache using vector similarity search rather than exact string matching. The implementation uses Azure OpenAI for generating embeddings and Azure Managed Redis (with RediSearch) for vector storage and similarity search.
Key changes:
- New semantic cache infrastructure with ISemanticCache and IEmbeddingService interfaces
- Integration with SQL query execution pipeline (MSSQL, MySQL, PostgreSQL)
- Runtime configuration support with CLI commands for semantic cache settings
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Service/Startup.cs | Registers semantic cache services with DI container when enabled |
| src/Service/SemanticCache/*.cs | Core semantic cache implementation: service, Redis vector store, Azure OpenAI embedding service |
| src/Core/Services/*.cs | Service interfaces for semantic cache and embeddings |
| src/Core/Resolvers/SqlQueryEngine.cs | Integrates semantic cache into SQL query execution pipeline |
| src/Core/Resolvers/QueryExecutor.cs | Adds semantic cache check/store logic at executor level |
| src/Core/Resolvers/*QueryExecutor.cs | Updates MSSQL/MySQL/PostgreSQL executors to accept semantic cache services |
| src/Core/Resolvers/Factories/*.cs | Updates factories to pass semantic cache services to executors |
| src/Core/Configurations/RuntimeConfigValidator.cs | Adds validation for semantic cache configuration |
| src/Config/ObjectModel/*.cs | New configuration models for semantic cache options |
| src/Config/Converters/*.cs | JSON converters for semantic cache configuration |
| src/Config/RuntimeConfigLoader.cs | Registers semantic cache converters |
| src/Cli/*.cs | CLI support for configuring semantic cache via command line |
| src/Service.Tests/*.cs | Unit, integration, and E2E tests for semantic cache |
| src/Service/SemanticCache/README.md | Comprehensive documentation for semantic cache feature |
| docs/Testing/*.md | Setup guide for testing with real Azure OpenAI |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Check semantic cache first if enabled | ||
| if (runtimeConfig.IsSemanticCachingEnabled && | ||
| _semanticCache is not null && | ||
| _embeddingService is not null && | ||
| structure.DbPolicyPredicatesForOperations[EntityActionOperation.Read] == string.Empty) | ||
| { | ||
| _logger.LogInformation( | ||
| "Semantic cache IS ENABLED - will attempt to use it for query: {Query}", | ||
| queryString.Substring(0, Math.Min(100, queryString.Length))); | ||
|
|
||
| try | ||
| { | ||
| // Generate embedding for the query | ||
| float[] embedding = await _embeddingService.GenerateEmbeddingAsync(queryString); | ||
|
|
||
| _logger.LogDebug( | ||
| "Generated embedding with {Dimensions} dimensions", | ||
| embedding.Length); | ||
|
|
||
| // Get semantic cache config | ||
| var semanticCacheConfig = runtimeConfig.Runtime?.SemanticCache; | ||
| int maxResults = semanticCacheConfig?.MaxResults ?? SemanticCacheOptions.DEFAULT_MAX_RESULTS; | ||
| double similarityThreshold = semanticCacheConfig?.SimilarityThreshold ?? SemanticCacheOptions.DEFAULT_SIMILARITY_THRESHOLD; | ||
|
|
||
| // Query semantic cache | ||
| SemanticCacheResult? cacheResult = await _semanticCache.QueryAsync( | ||
| embedding, | ||
| maxResults, | ||
| similarityThreshold); | ||
|
|
||
| if (cacheResult is not null) | ||
| { | ||
| _logger.LogInformation( | ||
| "Semantic cache hit! Similarity: {Similarity:F4} for query: {Query}", | ||
| cacheResult.Similarity, | ||
| queryString.Substring(0, Math.Min(100, queryString.Length))); | ||
|
|
||
| // Parse cached JSON response back to JsonDocument | ||
| return JsonDocument.Parse(cacheResult.Response); | ||
| } | ||
|
|
||
| _logger.LogDebug("Semantic cache miss for query: {Query}", | ||
| queryString.Substring(0, Math.Min(100, queryString.Length))); | ||
|
|
||
| // Execute query against database | ||
| JsonDocument? queryResponse = await ExecuteQueryAndCacheAsync( | ||
| queryExecutor, | ||
| queryString, | ||
| structure, | ||
| dataSourceName, | ||
| embedding, | ||
| runtimeConfig); | ||
|
|
||
| return queryResponse; | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| _logger.LogWarning(ex, "Semantic cache operation failed, falling back to normal execution"); | ||
| // Fall through to normal execution | ||
| } | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The embedding generation logic is duplicated in both SqlQueryEngine (lines 334-394) and QueryExecutor (lines 958-1081). This creates a maintenance burden and potential for inconsistency. Consider consolidating the semantic cache check logic into a single location or creating a shared helper method.
| // Note: We'll use a default dimension (1536 for text-embedding-3-small) | ||
| // The actual dimension should match your embedding model | ||
| int defaultDimensions = 1536; // Adjust based on your embedding model |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded default dimension (1536) is specific to text-embedding-ada-002 and text-embedding-3-small models. If a user configures a different model (like text-embedding-3-large with 3072 dimensions), the index will be created with the wrong dimension size, causing vector search failures. Consider making the dimension configurable or dynamically determining it from the first stored embedding.
| // Configure HTTP client | ||
| _httpClient.DefaultRequestHeaders.Add("api-key", _options.ApiKey); | ||
| _httpClient.Timeout = TimeSpan.FromSeconds(30); |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API key is added to the HttpClient's DefaultRequestHeaders, which means all instances created by this factory will share the same headers. If the factory creates clients for different purposes, this could leak the API key to unintended endpoints. Consider creating a named HttpClient specifically for Azure OpenAI or setting headers per request instead of on the client.
| if (semanticCacheOptions.AzureManagedRedis is null || | ||
| string.IsNullOrWhiteSpace(semanticCacheOptions.AzureManagedRedis.ConnectionString)) | ||
| { | ||
| throw new Exception("Semantic Cache: Azure Managed Redis connection string is required when semantic caching is enabled."); | ||
| } | ||
|
|
||
| if (semanticCacheOptions.EmbeddingProvider is null || | ||
| string.IsNullOrWhiteSpace(semanticCacheOptions.EmbeddingProvider.Endpoint)) | ||
| { | ||
| throw new Exception("Semantic Cache: Embedding provider endpoint is required when semantic caching is enabled."); | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using generic Exception for validation errors is too broad and doesn't provide a standardized error response. Consider using DataApiBuilderException with appropriate status codes and subStatusCodes to align with the existing error handling pattern used elsewhere in the codebase (see RuntimeConfigValidator for examples).
| // Generate embedding for SQL query | ||
| float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText); |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query embedding is generated twice: once in SqlQueryEngine for cache lookup and again in QueryExecutor when storing the result. This doubles the cost and latency of embedding generation. The embedding should be passed between these methods to avoid redundant API calls to Azure OpenAI.
| // Generate embedding for SQL query | |
| float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText); | |
| // Reuse precomputed embedding for SQL query when available, otherwise generate it. | |
| float[] embedding; | |
| if (httpContext?.Items != null && | |
| httpContext.Items.TryGetValue("SemanticCache.SqlQueryEmbedding", out object? existingEmbeddingObj) && | |
| existingEmbeddingObj is float[] existingEmbedding) | |
| { | |
| embedding = existingEmbedding; | |
| } | |
| else | |
| { | |
| embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText); | |
| } |
| // Build FT.SEARCH query for vector similarity | ||
| // KNN query format: *=>[KNN K @field_name $vector AS score] | ||
| string indexName = GetIndexName(); | ||
| string keyPrefix = _options.KeyPrefix ?? "resp:"; |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to keyPrefix is useless, since its value is never read.
| // Check if index exists using FT.INFO | ||
| try | ||
| { | ||
| var infoResult = await _database.ExecuteAsync("FT.INFO", indexName); |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to infoResult is useless, since its value is never read.
| var createResult = await _database.ExecuteAsync( | ||
| "FT.CREATE", | ||
| indexName, | ||
| "ON", "HASH", | ||
| "PREFIX", "1", keyPrefix, | ||
| "SCHEMA", | ||
| FIELD_QUERY, "TEXT", | ||
| FIELD_EMBEDDING, "VECTOR", "FLAT", "6", | ||
| "TYPE", "FLOAT32", | ||
| "DIM", defaultDimensions.ToString(), | ||
| "DISTANCE_METRIC", "COSINE", | ||
| FIELD_RESPONSE, "TEXT", | ||
| FIELD_TIMESTAMP, "NUMERIC", | ||
| FIELD_DIMENSIONS, "NUMERIC"); |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to createResult is useless, since its value is never read.
| if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) || | ||
| sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) || | ||
| sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) || | ||
| sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) || | ||
| sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| return false; | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 'if' statements can be combined.
| if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| return false; | |
| } | |
| if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase) && | |
| (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) || | |
| sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase))) | |
| { | |
| return false; |
| if (response.StatusCode == HttpStatusCode.TooManyRequests) | ||
| { | ||
| if (attempt < MAX_RETRIES) | ||
| { | ||
| int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1); | ||
| _logger.LogWarning( | ||
| "Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})", | ||
| delayMs, | ||
| attempt, | ||
| MAX_RETRIES); | ||
| await Task.Delay(delayMs, cancellationToken); | ||
| continue; | ||
| } |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 'if' statements can be combined.
| if (response.StatusCode == HttpStatusCode.TooManyRequests) | |
| { | |
| if (attempt < MAX_RETRIES) | |
| { | |
| int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1); | |
| _logger.LogWarning( | |
| "Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})", | |
| delayMs, | |
| attempt, | |
| MAX_RETRIES); | |
| await Task.Delay(delayMs, cancellationToken); | |
| continue; | |
| } | |
| if (response.StatusCode == HttpStatusCode.TooManyRequests && attempt < MAX_RETRIES) | |
| { | |
| int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1); | |
| _logger.LogWarning( | |
| "Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})", | |
| delayMs, | |
| attempt, | |
| MAX_RETRIES); | |
| await Task.Delay(delayMs, cancellationToken); | |
| continue; |
|
I am a little concerned that such a large PR was submitted for a new feature by the same author without any planning. Coupling to Azure OpenAI and Azure Redis so tightly feels like we are moving away from our core principles. Then again, I am open to advanced features like this, especially when they bring such high value to our customers. But we need to discuss this before moving forward on this plan. |
Why make this change?
Closes #3023 — Adds semantic caching support so repeated (or semantically equivalent) SQL queries can be served from cache instead of re-executing against the database, reducing latency and database load. This also enables “near-duplicate” query reuse by caching against vector similarity rather than exact string matching.
Additional discussion/setup notes: semantic-cache-real-azure-openai-setup.md
What is this change?
Introduces a new semantic caching pipeline for SQL query execution (MSSQL/MySQL/PostgreSQL) backed by:
Embeddings generated via an IEmbeddingService implementation (Azure OpenAI).
Vector storage + similarity search via ISemanticCache implemented on top of Azure Managed Redis vector capabilities.
Adds runtime config support for semantic caching:
New config object models (SemanticCacheOptions, EmbeddingProviderOptions, AzureManagedRedisOptions) and JSON converter factories.
Runtime config loading + validation updates to enforce required semantic cache configuration.
Wires semantic cache through the execution stack:
Updates QueryManagerFactory / QueryEngineFactory and SQL executors to use the semantic cache-aware QueryExecutor flow.
Updates service startup to register semantic cache services.
Adds CLI support to generate semantic cache configuration via config generation paths (CLI options + config generator updates).
References:
Real Azure setup + testing guide: semantic-cache-real-azure-openai-setup.md
Redis vector similarity search concepts (for reviewers): https://redis.io/docs/latest/develop/interact/search-and-query/query/vector-search/
How was this tested?
Integration Tests — SemanticCacheIntegrationTests.cs
Unit Tests — SemanticCacheOptionsTests.cs, SemanticCacheServiceTests.cs, AzureOpenAIEmbeddingServiceTests.cs
E2E Tests — SemanticCacheE2ETests.cs and CLI e2e updates in EndToEndTests.cs
Sample Request(s)
# First request (cache miss -> DB execution + cache write) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"# Second request (expected semantic cache hit -> served from cache) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"# First request (cache miss) query { books(filter: { title: { eq: "Dune" } }) { id title } }# Re-run the same (or semantically equivalent) query (expected cache hit) query { books(filter: { title: { eq: "Dune" } }) { id title } }