[RFC] search-builder project

## Summary

Search Builder is an agentic onboarding experience that enables developers to prototype search applications locally and deploy to AWS in minutes. It addresses the complexity barrier that drives developers to simpler alternatives during early development.

## Problem Statement

### Search Expertise Gap
Modern search application developers—data scientists and AI engineers—lack deep search expertise. They view search as a means to an end, not a core competency. This creates barriers to OpenSearch adoption for GenAI and Agentic AI applications.

### First-Time User Friction
Developers start with simpler alternatives (SQLite FTS5, Meilisearch, Chroma) for prototypes, only evaluating OpenSearch when hitting production requirements or feature limitations. As AI-powered development platforms enable direct cloud deployment, developers bypass teams with AWS expertise entirely.

**Current onboarding requires users to**:
- Choose deployment methods and configure security before seeing value
- Understand clusters, shards, index mappings, and k-NN configuration
- Write setup code before any meaningful query
- Navigate documentation written for search engineers, not AI developers

**Result**: Time-to-first-success is too long. Users leave before realizing OpenSearch's capabilities.

## Proposed Solution

### Phase 1: Local Search Builder Experience

**Packaged Distribution**
- OpenSearch, Data Prepper, and Search Builder UI
- Downloadable for any local or cloud-based development platform
- Works offline except for LLMs/agents

**Search Builder MCP Server**
- Integrates with AI development tools (Kiro, Cursor, Vercel, etc.)
- Converts search requirements into end-to-end applications
- Maintains a "manifest" describing the search app being built

### Phase 2: AWS OpenSearch Experience

**Cloud Search Agent**
- MCP server that deploys locally-built search apps to AWS
- Creates OSI pipelines, OpenSearch Serverless collections, Bedrock configurations
- Fully agentic experience in Amazon OpenSearch Service Console

**Infrastructure Automation**
- Gathers inputs: availability requirements, expected traffic, data volume
- Provisions and wires security/networking configurations automatically

**Ongoing Optimization**
- Monitors cloud infrastructure
- Proposes strategies based on usage patterns and new OpenSearch features
- Assists with upgrades, cost optimization, and performance tuning

## User Journey (Golden Path)

### Onboarding Tenets
1. No OpenSearch terminology (shards, replicas, mappings)
2. Show value quickly, rather than teaching concepts
3. Generate everything with few prompts or clicks
4. Let users learn by inspecting generated artifacts
5. Support 3-5 main use cases, not 100 features
6. AWS agnostic - works with any IDE
7. Everything works offline except LLMs/agents

### Steps

| Step | Action | Description |
|------|--------|-------------|
| 1 | Install | Download docker-compose.yml or install as Kiro Power |
| 2 | Run | Execute `docker-compose up` in terminal |
| 3 | Select Data | Choose CSV/JSON/PDF files or upload your own |
| 4 | Auto-Configure | System creates index, mapping, pipeline, model, sample queries |
| 5 | Test & Prototype | Query data, tune relevance, select models |
| 6 | Migrate to AWS | Export configuration for AWS deployment |

**Key**: Users never manually touch shards, mappings, analyzers, or cluster config.

## Target Use Cases

1. **Keyword Search** - Traditional text matching
2. **Semantic Search** - Vector-based similarity search
3. **RAG** - Retrieval-augmented generation for GenAI applications

## Auto-Generated Artifacts

When users provide data and requirements, Search Builder automatically creates:

- **Index** with optimized mappings
- **Ingest Pipeline** with text chunking
- **Embedding Model** (locally hosted)
- **Vector Embeddings** for semantic search
- **Sample Queries** demonstrating capabilities

## Sample Datasets

| Dataset | Complexity | Description |
|---------|------------|-------------|
| IMDb | Medium | 5K movies, ~10 fields |
| Jargon | Low | Dictionary-style, 2 text-heavy fields |
| BoardGame Geek | Medium | Games with name, description, numeric fields |
| Amazon Product Q&A | Medium | 10M user questions/answers with product info |
| Wikipedia | High | Subset of entries, text-heavy |

**Default**: Movies dataset ships as fully-worked example for novice users.

## Success Criteria

| Metric | Target |
|--------|--------|
| Time to first successful query | < 10 minutes |
| AI development tool integrations | ≥ 3 (Kiro, Vercel, Cursor) |
| End-to-end AWS deployment time | < 10 minutes (excluding data load) |

## Installation Options

### Option A: Docker Compose
```bash
# Download from opensearch.org, GitHub, or Docker Hub
curl -O https://opensearch.org/downloads/search-builder-docker-compose.yml
docker-compose up
```

### Option B: Kiro Power
Install Search Builder as a Kiro Power package for IDE integration.

## Example Workflow

**User Request**: "Create a semantic knowledge base for PDF documents in this folder"

**Search Builder Actions**:
1. Analyzes PDFs
2. Creates parsing and chunking strategies
3. Organizes metadata into JSON documents
4. Builds indexes with appropriate field mappings
5. Integrates embedding models for semantic search
6. Configures hybrid search scoring with sensible defaults

**User Iteration**: Test queries in Search Builder UI, provide feedback, refine requirements.

## Cloud Deployment (Phase 2)

From IDE, user tells LLM to deploy search app on AWS. Cloud Search MCP:

1. Gathers inputs: availability requirements, expected traffic, data volume
2. Creates OSI pipeline with processors for enrichment, chunking, parsing
3. Creates OpenSearch Serverless collection with pre-configured index mappings
4. Configures Bedrock models
5. Wires security and networking configurations

**Post-Deployment**: Cloud Search Agent monitors infrastructure and proposes optimizations based on usage patterns and new OpenSearch features.


### Describe the solution you'd like

N/A

### Related component

_No response_

### Describe alternatives you've considered

This is a new repository proposal

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] search-builder project #20731

Summary

Problem Statement

Search Expertise Gap

First-Time User Friction

Proposed Solution

Phase 1: Local Search Builder Experience

Phase 2: AWS OpenSearch Experience

User Journey (Golden Path)

Onboarding Tenets

Steps

Target Use Cases

Auto-Generated Artifacts

Sample Datasets

Success Criteria

Installation Options

Option A: Docker Compose

Option B: Kiro Power

Example Workflow

Cloud Deployment (Phase 2)

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Step	Action	Description
1	Install	Download docker-compose.yml or install as Kiro Power
2	Run	Execute `docker-compose up` in terminal
3	Select Data	Choose CSV/JSON/PDF files or upload your own
4	Auto-Configure	System creates index, mapping, pipeline, model, sample queries
5	Test & Prototype	Query data, tune relevance, select models
6	Migrate to AWS	Export configuration for AWS deployment

Dataset	Complexity	Description
IMDb	Medium	5K movies, ~10 fields
Jargon	Low	Dictionary-style, 2 text-heavy fields
BoardGame Geek	Medium	Games with name, description, numeric fields
Amazon Product Q&A	Medium	10M user questions/answers with product info
Wikipedia	High	Subset of entries, text-heavy

Metric	Target
Time to first successful query	< 10 minutes
AI development tool integrations	≥ 3 (Kiro, Vercel, Cursor)
End-to-end AWS deployment time	< 10 minutes (excluding data load)

[RFC] search-builder project #20731

Description

Summary

Problem Statement

Search Expertise Gap

First-Time User Friction

Proposed Solution

Phase 1: Local Search Builder Experience

Phase 2: AWS OpenSearch Experience

User Journey (Golden Path)

Onboarding Tenets

Steps

Target Use Cases

Auto-Generated Artifacts

Sample Datasets

Success Criteria

Installation Options

Option A: Docker Compose

Option B: Kiro Power

Example Workflow

Cloud Deployment (Phase 2)

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions