A Mastra-based demo that showcases agentic search relevance tuning using observability signals. Built with Mastra, Elasticsearch, and OpenAI.
This repo uses:
mastrafor workflow orchestration@mastra/*packages for memory, logging, and tool integration@elastic/elasticsearchfor search backendzodfor schema validationpinofor structured logging- TypeScript +
tsxfor dev ergonomics
- Node.js
>= 20.9.0 - Python
>=3.10, <3.13 - Git
git clone https://github.com/jwilliams-elastic/agentic-search-o11y-autotune.git
cd agentic-search-o11y-autotune
python -m venv .venv
source .venv/bin/activate
pip install -r ./src/python/requirements.txt
npm install-
Create 1 serverless project
- Elasticsearch optimized for vectors
- you will need to obtain URL for
.envELASTIC_URLentry - you will need to create an API key for
.envELASTIC_API_KEYentry
- you will need to obtain URL for
- Elasticsearch optimized for vectors
-
Create a
.envfile:
cp .env.example .env- Populate
.envwith values for the following variables
PROJECT_HOME=YOUR_PROJECT_HOME_ABSOLUTE_PATH
ELASTIC_URL=YOUR_ELASTIC_URL
ELASTIC_API_KEY=YOUR_ELASTIC_API_KEY
GOOGLE_GENERATIVE_AI_API_KEY=YOUR_GOOGLE_GENERATIVE_AI_API_KEY
| Command | Description |
|---|---|
npm run dev |
Run Mastra in dev mode (hot reload) |
./start.sh |
Set python venv then start mastra. |
- Open http://localhost:4111/workflows
- Run 'elastic-setup-workflow' (.env file has default values but you can override in mastra UI)
- Run 'search-autotune-workflow' (LOW and HIGH option generates different simulated search engagement behavior - HIGH = Luxury, LOW = Affordable)
- Open http://localhost:4111/agents and run the "Home Search Agent"
- Show the difference b/t LTR and no-LTR LLM jugdment with a query like "affordable home", "luxury home" and "6 bed, 6 bath single family home near orlando fl with garage and pool under 5M with designer finishes throughout"
- You can trigger engagement by asking for more detail for a specific result(ex: tell me more about result #20 in v4 results)
- Open the "Agentic Search Analytics" dashboard - KPIs like CTR, Average Click Position and search template usage.
- Heavy use of vibe coded TypeScript
- Mastra workflows and tools live in
/src - Logs use pino-pretty during development
- Logs are shipped to an elasticsearch datastream running in the same serverless project
- LTR script is written in Python with a mastra tool wrapper for invocation
agentic-search-o11y-autotune/
βββ src/ # Mastra tools, agents, workflows, and utilities (TypeScript)
β βββ mastra/
β βββ index.ts
β βββ logger.ts
β βββ logger-agentless.ts
β βββ agents/
β βββ tools/
β βββ workflows/
βββ python/ # Python scripts for LTR, feature analysis, and plotting
β βββ plot_feature_importance.py
β βββ properties-learn-to-rank.py
β βββ requirements.txt
βββ models/ # ML models, scalers, and metadata
β βββ feature_importance.png
β βββ feature_scaler.pkl
β βββ home_search_ltr_model.json
β βββ ltr_model_metadata.json
β βββ xgboost_ltr_model.json
|ββ dashboards/
β βββ sample_kibana_dashboard.ndjson # prebuilt search o11y dashboard
βββ data/ # Data files and JSONL property data
β βββ properties.jsonl
βββ search_templates/ # Mustache templates for ES search
β βββ properties-search-v1.mustache
β βββ properties-search-v2.mustache
β βββ properties-search-v3.mustache
β βββ properties-search-v4.mustache
βββ event.schema # Event schema for analytics
βββ feature_importance_analysis.ipynb # Jupyter notebook for feature analysis
βββ sample_kibana_dashboard.ndjson # Sample kibana dashboard for search analytics
βββ mastra.config.js # Mastra project config
βββ package.json # Project metadata and scripts
βββ tsconfig.json # TypeScript config
βββ README.md # You're here
This demo includes:
- Search event logging (Mastra logger + pino -> Elasticsearch datastream)
- Search tuning hooks
- Elasticsearch query templates
- Basic analytics-ready output for ES|QL dashboards
Create a github issue or email repo maintainers.
NEW: Production-ready LTR system with observability-driven ranking!
- ECS-compliant structured logging
- Real-time feature extraction from user behavior
- Elasticsearch Data Streams:
logs-agentic-search-o11y-autotune.events
- Position-aware features: Result position, log/reciprocal position, position bias, and engagement at position.
- Search performance: Elasticsearch score, search time, and template complexity.
- Query analysis: Query length, word count, complexity, and presence of geo/price/bedroom filters.
- User interaction: Click/view counts, interaction rate, conversational detection, and engagement score.
- Session context: Query count, average position, and session duration.
- Text relevance: Overlap and exact match between query and property title/description.
- BM25 relevance: Field-specific BM25 scores (title, description, features, headings, combined).
- Semantic similarity: Embedding-based similarity between query and property description/features.
- Property attributes: Normalized price, bedroom/bathroom match, square footage, tax, and maintenance.
- Geo-relevance: Estimated distance, geo relevance, and neighborhood match.
- Query-document matching: Exact/partial matches and status relevance (e.g., active listings).
These features are extracted and engineered from both search and engagement events, then used to train and evaluate the LTR model for property search ranking.
{
"@timestamp": "2025-08-08T12:34:56Z",
"event.action": "search_result_logged", // or "property_engagement"
"event.type": "search", // or "engagement"
"event.category": ["search"], // or ["user"]
"event.outcome": "success",
"user.id": "user-123",
"session.id": "session-abc", // flat, for easy filtering/joining
"query.text": "2 bedroom in Brooklyn",
"query.template_id": "properties-search-v3",
"query.filters": {
"bedrooms": 2,
"bathrooms": 1,
"maintenance": 500,
"square_footage": 800,
"home_price": 1000000,
"geo": {
"latitude": 40.6782,
"longitude": -73.9442,
"distance": "10km"
},
"features": "balcony"
},
"result": {
"document_id": "property-456",
"position": 1,
"elasticsearch_score": 12.34
},
"interaction": {
"type": "property_engagement", // only for engagement events
"original_message": "I like this one"
},
"performance": {
"search_time_ms": 123,
"elasticsearch_time_ms": 100
},
"service.name": "elasticsearch-search-tool"
}- Complete LTR Guide - Comprehensive system documentation
- ESQL Queries - Query examples for feature analysis
- Confidence Scoring - Pattern-based confidence calculation
- Feature Logs - Feature extraction reference
- Real-time Learning: Continuous improvement from user behavior
- Zero Breaking Changes: Seamless integration with existing search
- Production-Ready: Enterprise-grade logging and error handling
Here is a concise business logic summary for the top 15 features in the LTR model:
- position_log: Strongly rewards higher-ranked (top) search results, with diminishing returns for lower positions.
- position_bias_factor: Adjusts for user bias toward top results, giving more weight to higher positions.
- position_engagement_signal: Captures user engagement (e.g., clicks) at specific positions, boosting results that get attention even at lower ranks.
- click_count: Directly measures how many times a property was clicked, indicating user interest.
- position: The absolute rank of the result; lower values (top results) are favored.
- query_length: Reflects the complexity or specificity of the user's query; longer queries may indicate more intent.
- session_avg_position: Average position of results viewed in a session, capturing user browsing patterns.
- user_engagement_score: Aggregates various engagement signals (clicks, interactions) into a single score.
- view_count: Counts how many times a property was viewed, showing general interest.
- elasticsearch_score: The original ES relevance score, representing text-based matching.
- same_neighborhood: Indicates if the property is in the same neighborhood as the user's query or filter, boosting local relevance.
- time_in_session_ms: Measures how long the user spent in the session, which can correlate with engagement or satisfaction.
- position_reciprocal: Another way to emphasize top results, giving a higher score to higher-ranked properties.
- bm25_title_score: Measures how well the property title matches the query using BM25 relevance.
- search_time_ms: The time taken to perform the search; can be a proxy for query complexity or backend performance.
These features combine user behavior, search ranking, query complexity, and property relevance to optimize which properties are shown at the top of search results.
