-
Notifications
You must be signed in to change notification settings - Fork 31
Description
This is a meta tracking issue for implementing the Auto-optimized hybrid search feature in the Search Relevance Workbench. The goal is to reduce hybrid search optimization from a multi-day manual process to a guided, mostly-automated workflow — from generating test queries to deploying the optimal search pipeline configuration.
Background
The Hybrid Optimizer experiment in the Search Relevance Workbench runs a grid search over normalization techniques, combination methods, and weight configurations (66 variants by default) to find the optimal hybrid search pipeline. However, the current workflow requires users to manually:
- Create query terms and import them
- Set up an LLM connector and generate relevance judgments
- Run the optimizer experiment
- Manually read raw results to identify the best configuration
- Manually create and deploy the search pipeline
This meta issue tracks the work to automate steps 1, 4, 5 (and 2 at some extend), significantly reducing the barrier to finding and deploying optimal hybrid search configurations.
Components
The implementation is broken down into the following work streams:
- Query Set Generation with LLM — Extend the query sets API (
PUT /query_sets) with anllm_generatedsampling mode that generates synthetic search queries from index documents using an LLM, eliminating manual query creation - Improved Judgment Coverage for Hybrid Optimizer — Add
expandCoverageparameter to the judgment creation API that pools documents from multiple hybrid weight configurations to improve rating coverage from ~50% to ~71-78%- RFC: [RFC] Improved judgment coverage for Hybrid search optimizer #401
- Implementation
- Dashboards UI changes
- Aggregated Experiment Results — Pre-compute and store per-configuration aggregated metrics (mean NDCG, MAP, Precision) at experiment completion, with a new retrieval API. Replaces the current approach that requires fetching 33K+ raw evaluation result documents
- Experiment Results Summary UI — Replace the current raw results table in the Dashboards experiment view (which breaks at OpenSearch's 10K
max_result_window) with a ranked configuration summary showing the best hybrid search configuration and all 66 variants with aggregated metrics - Deploy Optimal Configuration — New API and UI to deploy the best experiment result as an OpenSearch search pipeline and set it as the index-level default, completing the optimization workflow end-to-end
- Judgment Cache Cleanup — Add configurable TTL-based cleanup for the judgment cache index to bound growth from repeated optimization runs
- Documentation
- Integration tests
Relation to Existing Hybrid Optimizer
This feature builds on the existing Hybrid Optimizer (#107) and extends it with LLM-powered automation. The existing experiment API (PUT /experiments with type: HYBRID_OPTIMIZER) and grid search logic remain unchanged. The new components automate the preparation steps (query creation, judgment generation) and post-processing steps (results aggregation, pipeline deployment) that surround the optimizer.
Related Changes
Infrastructure and bug fixes that support this feature:
- Fixed thread pool starvation in LLM judgment processing Fixed thread pool starvation in LLM judgment processing #387
- Extract reusable BatchedAsyncExecutor; migrate LlmJudgmentTaskManager and ExperimentTaskManager Extract reusable BatchedAsyncExecutor; migrate LlmJudgmentTaskManager and ExperimentTaskManager to use it #392
- LLM Judgment customized prompt template implementation [SRW] LLM Judge Dynamic Template Backend #264
- Version-based index mapping update support Enable Index Mapping Update with version #344
Future Work (not tracked here)
The following items are planned for later phases and will be tracked separately:
- Auto-Optimized experiment type — single API call orchestrating the full workflow (query generation → judgment creation → grid search → aggregation → deployment)
- Auto-Optimize mode in Dashboards UI with guided form
- On-the-fly LLM model provisioning
- RRF and z_score normalization support in the optimizer grid [FEATURE] Onboard z_score and RRF normalization and combination techniques on SRW hybrid optimizer experiment #343
- Detailed per-query experiment results view
- Progress monitoring with cancel capability
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status