Skip to content

[META] Auto-optimized hybrid search #407

@martin-gaievski

Description

@martin-gaievski

This is a meta tracking issue for implementing the Auto-optimized hybrid search feature in the Search Relevance Workbench. The goal is to reduce hybrid search optimization from a multi-day manual process to a guided, mostly-automated workflow — from generating test queries to deploying the optimal search pipeline configuration.

Background

The Hybrid Optimizer experiment in the Search Relevance Workbench runs a grid search over normalization techniques, combination methods, and weight configurations (66 variants by default) to find the optimal hybrid search pipeline. However, the current workflow requires users to manually:

  1. Create query terms and import them
  2. Set up an LLM connector and generate relevance judgments
  3. Run the optimizer experiment
  4. Manually read raw results to identify the best configuration
  5. Manually create and deploy the search pipeline

This meta issue tracks the work to automate steps 1, 4, 5 (and 2 at some extend), significantly reducing the barrier to finding and deploying optimal hybrid search configurations.

Components

The implementation is broken down into the following work streams:

  • Query Set Generation with LLM — Extend the query sets API (PUT /query_sets) with an llm_generated sampling mode that generates synthetic search queries from index documents using an LLM, eliminating manual query creation
  • Improved Judgment Coverage for Hybrid Optimizer — Add expandCoverage parameter to the judgment creation API that pools documents from multiple hybrid weight configurations to improve rating coverage from ~50% to ~71-78%
  • Aggregated Experiment Results — Pre-compute and store per-configuration aggregated metrics (mean NDCG, MAP, Precision) at experiment completion, with a new retrieval API. Replaces the current approach that requires fetching 33K+ raw evaluation result documents
  • Experiment Results Summary UI — Replace the current raw results table in the Dashboards experiment view (which breaks at OpenSearch's 10K max_result_window) with a ranked configuration summary showing the best hybrid search configuration and all 66 variants with aggregated metrics
  • Deploy Optimal Configuration — New API and UI to deploy the best experiment result as an OpenSearch search pipeline and set it as the index-level default, completing the optimization workflow end-to-end
  • Judgment Cache Cleanup — Add configurable TTL-based cleanup for the judgment cache index to bound growth from repeated optimization runs
  • Documentation
  • Integration tests

Relation to Existing Hybrid Optimizer

This feature builds on the existing Hybrid Optimizer (#107) and extends it with LLM-powered automation. The existing experiment API (PUT /experiments with type: HYBRID_OPTIMIZER) and grid search logic remain unchanged. The new components automate the preparation steps (query creation, judgment generation) and post-processing steps (results aggregation, pipeline deployment) that surround the optimizer.

Related Changes

Infrastructure and bug fixes that support this feature:

Future Work (not tracked here)

The following items are planned for later phases and will be tracked separately:

Metadata

Metadata

Assignees

No one assigned

    Labels

    MetaMeta issue, not directly linked to a PR

    Type

    No type

    Projects

    Status

    🆕 New

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions