🧠 Agentic Search O11y Autotune

A Mastra-based demo that showcases agentic search relevance tuning using observability signals. Built with Mastra, Elasticsearch, and OpenAI.

📦 Project Structure

This repo uses:

mastra for workflow orchestration
@mastra/* packages for memory, logging, and tool integration
@elastic/elasticsearch for search backend
zod for schema validation
pino for structured logging
TypeScript + tsx for dev ergonomics

🚀 Getting Started

✅ Prerequisites

Node.js >= 20.9.0
Python >=3.10, <3.13
Git

📥 Install

git clone https://github.com/jwilliams-elastic/agentic-search-o11y-autotune.git
cd agentic-search-o11y-autotune
python -m venv .venv
source .venv/bin/activate
pip install -r ./src/python/requirements.txt
npm install

⚙️ Setup

Create 1 serverless project
- Elasticsearch optimized for vectors
  - you will need to obtain URL for .env ELASTIC_URL entry
  - you will need to create an API key for .env ELASTIC_API_KEY entry
Create a .env file:

cp .env.example .env

Populate .env with values for the following variables

PROJECT_HOME=YOUR_PROJECT_HOME_ABSOLUTE_PATH
ELASTIC_URL=YOUR_ELASTIC_URL
ELASTIC_API_KEY=YOUR_ELASTIC_API_KEY
GOOGLE_GENERATIVE_AI_API_KEY=YOUR_GOOGLE_GENERATIVE_AI_API_KEY

🛠 Run Mastra

Command	Description
`npm run dev`	Run Mastra in dev mode (hot reload)
`./start.sh`	Set python venv then start mastra.

🧪 Demo Flow

Open http://localhost:4111/workflows
Run 'elastic-setup-workflow' (.env file has default values but you can override in mastra UI)
Run 'search-autotune-workflow' (LOW and HIGH option generates different simulated search engagement behavior - HIGH = Luxury, LOW = Affordable)
Open http://localhost:4111/agents and run the "Home Search Agent"
Show the difference b/t LTR and no-LTR LLM jugdment with a query like "affordable home", "luxury home" and "6 bed, 6 bath single family home near orlando fl with garage and pool under 5M with designer finishes throughout"
You can trigger engagement by asking for more detail for a specific result(ex: tell me more about result #20 in v4 results)
Open the "Agentic Search Analytics" dashboard - KPIs like CTR, Average Click Position and search template usage.

🧪 Development Notes

Heavy use of vibe coded TypeScript
Mastra workflows and tools live in /src
Logs use pino-pretty during development
Logs are shipped to an elasticsearch datastream running in the same serverless project
LTR script is written in Python with a mastra tool wrapper for invocation

📁 Folder Structure

agentic-search-o11y-autotune/
├── src/                      # Mastra tools, agents, workflows, and utilities (TypeScript)
│   └── mastra/
│       ├── index.ts
│       ├── logger.ts
│       ├── logger-agentless.ts
│       ├── agents/
│       ├── tools/
│       └── workflows/
├── python/                   # Python scripts for LTR, feature analysis, and plotting
│   ├── plot_feature_importance.py
│   ├── properties-learn-to-rank.py
│   └── requirements.txt
├── models/                   # ML models, scalers, and metadata
│   ├── feature_importance.png
│   ├── feature_scaler.pkl
│   ├── home_search_ltr_model.json
│   ├── ltr_model_metadata.json
│   └── xgboost_ltr_model.json
|── dashboards/
│   └── sample_kibana_dashboard.ndjson # prebuilt search o11y dashboard
├── data/                     # Data files and JSONL property data
│   └── properties.jsonl
├── search_templates/         # Mustache templates for ES search
│   ├── properties-search-v1.mustache
│   ├── properties-search-v2.mustache
│   ├── properties-search-v3.mustache
│   └── properties-search-v4.mustache
├── event.schema              # Event schema for analytics
├── feature_importance_analysis.ipynb  # Jupyter notebook for feature analysis
├── sample_kibana_dashboard.ndjson     # Sample kibana dashboard for search analytics
├── mastra.config.js          # Mastra project config
├── package.json              # Project metadata and scripts
├── tsconfig.json             # TypeScript config
└── README.md                 # You're here

📊 Observability Features

This demo includes:

Search event logging (Mastra logger + pino -> Elasticsearch datastream)
Search tuning hooks
Elasticsearch query templates
Basic analytics-ready output for ES|QL dashboards

🙋‍♀️ Questions or Issues?

Create a github issue or email repo maintainers.

🎯 Unified Learning-to-Rank (LTR) System

NEW: Production-ready LTR system with observability-driven ranking!

🎪 Key LTR Features:

📡 Observability-Driven:

ECS-compliant structured logging
Real-time feature extraction from user behavior
Elasticsearch Data Streams: logs-agentic-search-o11y-autotune.events

🧬 Feature Engineering (from `properties-learn-to-rank.py`):

Position-aware features: Result position, log/reciprocal position, position bias, and engagement at position.
Search performance: Elasticsearch score, search time, and template complexity.
Query analysis: Query length, word count, complexity, and presence of geo/price/bedroom filters.
User interaction: Click/view counts, interaction rate, conversational detection, and engagement score.
Session context: Query count, average position, and session duration.
Text relevance: Overlap and exact match between query and property title/description.
BM25 relevance: Field-specific BM25 scores (title, description, features, headings, combined).
Semantic similarity: Embedding-based similarity between query and property description/features.
Property attributes: Normalized price, bedroom/bathroom match, square footage, tax, and maintenance.
Geo-relevance: Estimated distance, geo relevance, and neighborhood match.
Query-document matching: Exact/partial matches and status relevance (e.g., active listings).

These features are extracted and engineered from both search and engagement events, then used to train and evaluate the LTR model for property search ranking.

Event Schema

{
  "@timestamp": "2025-08-08T12:34:56Z",
  "event.action": "search_result_logged",         // or "property_engagement"
  "event.type": "search",                        // or "engagement"
  "event.category": ["search"],                  // or ["user"]
  "event.outcome": "success",
  "user.id": "user-123",
  "session.id": "session-abc",                   // flat, for easy filtering/joining
  "query.text": "2 bedroom in Brooklyn",
  "query.template_id": "properties-search-v3",
  "query.filters": {
    "bedrooms": 2,
    "bathrooms": 1,
    "maintenance": 500,
    "square_footage": 800,
    "home_price": 1000000,
    "geo": {
      "latitude": 40.6782,
      "longitude": -73.9442,
      "distance": "10km"
    },
    "features": "balcony"
  },
  "result": {
    "document_id": "property-456",
    "position": 1,
    "elasticsearch_score": 12.34
  },
  "interaction": {
    "type": "property_engagement",               // only for engagement events
    "original_message": "I like this one"
  },
  "performance": {
    "search_time_ms": 123,
    "elasticsearch_time_ms": 100
  },
  "service.name": "elasticsearch-search-tool"
}

📚 LTR Documentation:

Complete LTR Guide - Comprehensive system documentation
ESQL Queries - Query examples for feature analysis
Confidence Scoring - Pattern-based confidence calculation
Feature Logs - Feature extraction reference

🎉 LTR Business Value:

Real-time Learning: Continuous improvement from user behavior
Zero Breaking Changes: Seamless integration with existing search
Production-Ready: Enterprise-grade logging and error handling

Top 15 Features

Here is a concise business logic summary for the top 15 features in the LTR model:

position_log: Strongly rewards higher-ranked (top) search results, with diminishing returns for lower positions.
position_bias_factor: Adjusts for user bias toward top results, giving more weight to higher positions.
position_engagement_signal: Captures user engagement (e.g., clicks) at specific positions, boosting results that get attention even at lower ranks.
click_count: Directly measures how many times a property was clicked, indicating user interest.
position: The absolute rank of the result; lower values (top results) are favored.
query_length: Reflects the complexity or specificity of the user's query; longer queries may indicate more intent.
session_avg_position: Average position of results viewed in a session, capturing user browsing patterns.
user_engagement_score: Aggregates various engagement signals (clicks, interactions) into a single score.
view_count: Counts how many times a property was viewed, showing general interest.
elasticsearch_score: The original ES relevance score, representing text-based matching.
same_neighborhood: Indicates if the property is in the same neighborhood as the user's query or filter, boosting local relevance.
time_in_session_ms: Measures how long the user spent in the session, which can correlate with engagement or satisfaction.
position_reciprocal: Another way to emphasize top results, giving a higher score to higher-ranked properties.
bm25_title_score: Measures how well the property title matches the query using BM25 relevance.
search_time_ms: The time taken to perform the search; can be a proxy for query complexity or backend performance.

These features combine user behavior, search ranking, query complexity, and property relevance to optimize which properties are shown at the top of search results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Agentic Search O11y Autotune

📦 Project Structure

🚀 Getting Started

✅ Prerequisites

📥 Install

⚙️ Setup

🛠 Run Mastra

🧪 Demo Flow

🧪 Development Notes

📁 Folder Structure

📊 Observability Features

🙋‍♀️ Questions or Issues?

🎯 Unified Learning-to-Rank (LTR) System

🎪 Key LTR Features:

📡 Observability-Driven:

🧬 Feature Engineering (from `properties-learn-to-rank.py`):

Event Schema

📚 LTR Documentation:

🎉 LTR Business Value:

Top 15 Features

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
dashboards		dashboards
data		data
models		models
search_templates		search_templates
src		src
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
elastic-agent-reference.yml		elastic-agent-reference.yml
event.schema		event.schema
feature_importance_analysis.ipynb		feature_importance_analysis.ipynb
mastra.config.js		mastra.config.js
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh
tsconfig.json		tsconfig.json

jwilliams-elastic/agentic-search-o11y-autotune

Folders and files

Latest commit

History

Repository files navigation

🧠 Agentic Search O11y Autotune

📦 Project Structure

🚀 Getting Started

✅ Prerequisites

📥 Install

⚙️ Setup

🛠 Run Mastra

🧪 Demo Flow

🧪 Development Notes

📁 Folder Structure

📊 Observability Features

🙋‍♀️ Questions or Issues?

🎯 Unified Learning-to-Rank (LTR) System

🎪 Key LTR Features:

📡 Observability-Driven:

🧬 Feature Engineering (from properties-learn-to-rank.py):

Event Schema

📚 LTR Documentation:

🎉 LTR Business Value:

Top 15 Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

🧬 Feature Engineering (from `properties-learn-to-rank.py`):

Packages