Skip to content

[RFC] search-builder project #20731

@fen-qin

Description

@fen-qin

Summary

Search Builder is an agentic onboarding experience that enables developers to prototype search applications locally and deploy to AWS in minutes. It addresses the complexity barrier that drives developers to simpler alternatives during early development.

Problem Statement

Search Expertise Gap

Modern search application developers—data scientists and AI engineers—lack deep search expertise. They view search as a means to an end, not a core competency. This creates barriers to OpenSearch adoption for GenAI and Agentic AI applications.

First-Time User Friction

Developers start with simpler alternatives (SQLite FTS5, Meilisearch, Chroma) for prototypes, only evaluating OpenSearch when hitting production requirements or feature limitations. As AI-powered development platforms enable direct cloud deployment, developers bypass teams with AWS expertise entirely.

Current onboarding requires users to:

  • Choose deployment methods and configure security before seeing value
  • Understand clusters, shards, index mappings, and k-NN configuration
  • Write setup code before any meaningful query
  • Navigate documentation written for search engineers, not AI developers

Result: Time-to-first-success is too long. Users leave before realizing OpenSearch's capabilities.

Proposed Solution

Phase 1: Local Search Builder Experience

Packaged Distribution

  • OpenSearch, Data Prepper, and Search Builder UI
  • Downloadable for any local or cloud-based development platform
  • Works offline except for LLMs/agents

Search Builder MCP Server

  • Integrates with AI development tools (Kiro, Cursor, Vercel, etc.)
  • Converts search requirements into end-to-end applications
  • Maintains a "manifest" describing the search app being built

Phase 2: AWS OpenSearch Experience

Cloud Search Agent

  • MCP server that deploys locally-built search apps to AWS
  • Creates OSI pipelines, OpenSearch Serverless collections, Bedrock configurations
  • Fully agentic experience in Amazon OpenSearch Service Console

Infrastructure Automation

  • Gathers inputs: availability requirements, expected traffic, data volume
  • Provisions and wires security/networking configurations automatically

Ongoing Optimization

  • Monitors cloud infrastructure
  • Proposes strategies based on usage patterns and new OpenSearch features
  • Assists with upgrades, cost optimization, and performance tuning

User Journey (Golden Path)

Onboarding Tenets

  1. No OpenSearch terminology (shards, replicas, mappings)
  2. Show value quickly, rather than teaching concepts
  3. Generate everything with few prompts or clicks
  4. Let users learn by inspecting generated artifacts
  5. Support 3-5 main use cases, not 100 features
  6. AWS agnostic - works with any IDE
  7. Everything works offline except LLMs/agents

Steps

Step Action Description
1 Install Download docker-compose.yml or install as Kiro Power
2 Run Execute docker-compose up in terminal
3 Select Data Choose CSV/JSON/PDF files or upload your own
4 Auto-Configure System creates index, mapping, pipeline, model, sample queries
5 Test & Prototype Query data, tune relevance, select models
6 Migrate to AWS Export configuration for AWS deployment

Key: Users never manually touch shards, mappings, analyzers, or cluster config.

Target Use Cases

  1. Keyword Search - Traditional text matching
  2. Semantic Search - Vector-based similarity search
  3. RAG - Retrieval-augmented generation for GenAI applications

Auto-Generated Artifacts

When users provide data and requirements, Search Builder automatically creates:

  • Index with optimized mappings
  • Ingest Pipeline with text chunking
  • Embedding Model (locally hosted)
  • Vector Embeddings for semantic search
  • Sample Queries demonstrating capabilities

Sample Datasets

Dataset Complexity Description
IMDb Medium 5K movies, ~10 fields
Jargon Low Dictionary-style, 2 text-heavy fields
BoardGame Geek Medium Games with name, description, numeric fields
Amazon Product Q&A Medium 10M user questions/answers with product info
Wikipedia High Subset of entries, text-heavy

Default: Movies dataset ships as fully-worked example for novice users.

Success Criteria

Metric Target
Time to first successful query < 10 minutes
AI development tool integrations ≥ 3 (Kiro, Vercel, Cursor)
End-to-end AWS deployment time < 10 minutes (excluding data load)

Installation Options

Option A: Docker Compose

# Download from opensearch.org, GitHub, or Docker Hub
curl -O https://opensearch.org/downloads/search-builder-docker-compose.yml
docker-compose up

Option B: Kiro Power

Install Search Builder as a Kiro Power package for IDE integration.

Example Workflow

User Request: "Create a semantic knowledge base for PDF documents in this folder"

Search Builder Actions:

  1. Analyzes PDFs
  2. Creates parsing and chunking strategies
  3. Organizes metadata into JSON documents
  4. Builds indexes with appropriate field mappings
  5. Integrates embedding models for semantic search
  6. Configures hybrid search scoring with sensible defaults

User Iteration: Test queries in Search Builder UI, provide feedback, refine requirements.

Cloud Deployment (Phase 2)

From IDE, user tells LLM to deploy search app on AWS. Cloud Search MCP:

  1. Gathers inputs: availability requirements, expected traffic, data volume
  2. Creates OSI pipeline with processors for enrichment, chunking, parsing
  3. Creates OpenSearch Serverless collection with pre-configured index mappings
  4. Configures Bedrock models
  5. Wires security and networking configurations

Post-Deployment: Cloud Search Agent monitors infrastructure and proposes optimizations based on usage patterns and new OpenSearch features.

Describe the solution you'd like

N/A

Related component

No response

Describe alternatives you've considered

This is a new repository proposal

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    SearchSearch query, autocomplete ...etcenhancementEnhancement or improvement to existing feature or request

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions