Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 72 additions & 2 deletions skills/algolia-mcp/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,84 @@ For clients that don't support commands, see [connection-setup](references/conne
| Trending facets | `algolia_recommendations` | `trending-facets` |
| Visually similar items | `algolia_recommendations` | `looking-similar` |

## Required workflow
## Search Filter Syntax

Filters go in the `algolia_search_index` call alongside `query`:

**facetFilters** (array-based):
```
[["color:red", "color:blue"]] β†’ OR (red OR blue)
[["brand:Nike"], ["category:running"]] β†’ AND (Nike AND running)
[["size:10"], ["color:red", "color:blue"]] β†’ mixed (size 10 AND (red OR blue))
```
Each inner array is OR'd; outer arrays are AND'd.

**numericFilters** (string-based):
```
["price < 100"] β†’ single condition
["price >= 50", "price <= 200"] β†’ range (AND'd)
```

**Date filtering**: Dates must be stored as Unix timestamps. Use `numericFilters: ["timestamp >= 1704067200"]`.

**Attribute selection**: Use `attributesToRetrieve: ["name", "price"]` to limit response size.

## Analytics Key Details

- **`clickAnalytics: true`**: Set this on `algolia_analytics_top_searches` or `algolia_analytics_top_search_results` to include CTR, conversion rate, and click count. Only these two tools support it.
- **`revenueAnalytics: true`**: Set on the same tools to also include add-to-cart rate, purchase rate, and revenue.
- **Data delay**: Recent data has a 1–4 hour processing delay. Use date ranges ending at least 4 hours ago for complete data.

### Interpreting Results

| No-results rate | Assessment |
|----------------|------------|
| < 5% | Excellent |
| 5–10% | Good |
| 10–20% | Needs improvement |
| > 20% | Poor |

**Click positions**: Healthy = 30–40% of clicks at position 1, decreasing through 10. Even distribution = poor relevance. Concentrated at positions 5–10 = ranking issues.

**Low CTR + high search volume** = poor result relevance. Common causes: missing synonyms, content gaps, mismatched query intent.

## Recommendation Thresholds

| Threshold | Behavior |
|-----------|----------|
| 50 | More results, lower relevance |
| **60** | **Balanced (good default)** |
| 75 | Fewer results, higher relevance |

**Model parameter requirements**:
- `bought-together`, `related-products`, `looking-similar` β†’ require `objectID`
- `trending-items` β†’ does NOT require `objectID`. Use `facetName` + `facetValue` to filter by category
- `trending-facets` β†’ requires `facetName`

## Required Workflow

1. **Discover first**: Always call `algolia_search_list_indices` before other tools to resolve `applicationId` and `indexName`. The `applicationId` parameter is an enum β€” select from the values in the tool schema, never guess.
2. **Index names are case-sensitive**: Use the exact name returned by `algolia_search_list_indices`.
3. **Date parameters**: Analytics tools accept `startDate` and `endDate` in `YYYY-MM-DD` format. Default period is the last 8 days.
4. **Permissions**: Not all tools are available to every user. Analytics tools require the Analytics permission; recommendations require the Recommend feature.

## Reference docs
## Common Workflows

### Search Quality Audit
1. `algolia_search_list_indices` β†’ get applicationId and index name
2. `algolia_analytics_no_results_rate` β†’ check overall health (< 5% is excellent)
3. `algolia_analytics_searches_no_results` β†’ find the specific failing queries
4. `algolia_analytics_top_searches` with `clickAnalytics: true` β†’ find high-volume queries with low CTR
5. `algolia_analytics_click_positions` β†’ check if clicks are concentrated at position 1 (good) or spread evenly (poor relevance)
6. For each problematic query: `algolia_search_index` with that query to see what results look like

### Recommendation Setup Check
1. `algolia_search_list_indices` β†’ resolve applicationId
2. Start with `trending-items` (requires least data) to verify Recommend is working
3. Then try `bought-together` or `related-products` with a known product objectID
4. If results are empty, check event volume requirements in [recommendations reference](references/recommendations.md)

## Reference Docs

- [connection-setup](references/connection-setup.md) β€” MCP server configuration and authentication
- [search](references/search.md) β€” Search parameters, filter syntax (`facetFilters`, `numericFilters`), pagination
Expand Down
120 changes: 120 additions & 0 deletions skills/algolia-mcp/evals/EVAL_RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Algolia MCP Skill β€” Evaluation Results

Evaluation performed on 2026-03-18 using Claude Opus 4.6 (1M context).

## Summary

The skill was evaluated across 5 realistic user scenarios, comparing **with-skill** (Claude reads the skill before responding) vs **without-skill** (Claude relies on general knowledge).

| Eval | With Skill | Without Skill | Delta |
|------|:----------:|:-------------:|:-----:|
| **Eval 1** β€” Search with filters | 100% (6/6) | 17% (1/6) | **+83%** |
| **Eval 2** β€” Analytics report | 100% (6/6) | 33% (2/6) | **+67%** |
| **Eval 3** β€” Recommendations | 100% (6/6) | 33% (2/6) | **+67%** |
| **Eval 4** β€” Multi-step investigation | 100% (6/6) | 17% (1/6) | **+83%** |
| **Eval 5** β€” Date filtering + pagination | 100% (6/6) | 33% (2/6) | **+67%** |
| **Average** | **100%** | **27%** | **+73%** |

## Eval Details

### Eval 1: Search with Filters

**Prompt:** *"I want to search my 'products' index for shoes under $100 in either red or blue. Show me only the name, price, and color fields. Also, what are the available facet values for the 'brand' attribute?"*

| Assertion | Without Skill | With Skill |
|-----------|:---:|:---:|
| Calls `algolia_search_list_indices` first | FAIL | PASS |
| Uses `facetFilters` with OR syntax `[["color:red", "color:blue"]]` | FAIL | PASS |
| Uses `numericFilters` with string syntax `["price < 100"]` | FAIL | PASS |
| Combines facetFilters AND numericFilters in same call | FAIL | PASS |
| Sets `attributesToRetrieve` to `["name", "price", "color"]` | PASS | PASS |
| Uses `algolia_search_for_facet_values` for brand | FAIL | PASS |

**Key finding:** Without the skill, Claude used a generic `algolia_search` tool with a combined `filters` string instead of the MCP-specific `facetFilters`/`numericFilters` array parameters. It also used `facets` parameter instead of the dedicated `algolia_search_for_facet_values` tool.

### Eval 2: Analytics Report

**Prompt:** *"Give me a search quality report for my 'ecommerce' index over the last 30 days β€” I want to know the no-results rate, top searches that have no clicks, and the click position distribution. Include click-through rates where possible."*

| Assertion | Without Skill | With Skill |
|-----------|:---:|:---:|
| Calls `algolia_search_list_indices` first | FAIL | PASS |
| Uses `algolia_analytics_no_results_rate` with correct dates | FAIL | PASS |
| Uses `algolia_analytics_top_searches_without_clicks` | FAIL | PASS |
| Uses `algolia_analytics_click_positions` | FAIL | PASS |
| Sets `clickAnalytics: true` on a supported tool | PASS | PASS |
| Does NOT use algolia-cli commands | PASS | PASS |

**Key finding:** The baseline fabricated all tool names using camelCase (`algolia_getNoResultsRate`, `algolia_getClickThroughRate`) instead of the actual snake_case MCP tool names (`algolia_analytics_no_results_rate`). It also skipped the discovery step entirely.

### Eval 3: Recommendations

**Prompt:** *"For product ID 'SKU-1234' in my 'catalog' index, show me frequently bought together items and related products. Also show me what's trending in the 'shoes' category. Use a balanced relevance threshold."*

| Assertion | Without Skill | With Skill |
|-----------|:---:|:---:|
| Calls `algolia_search_list_indices` first | FAIL | PASS |
| Uses `algolia_recommendations` with `bought-together` + objectID | FAIL | PASS |
| Uses `algolia_recommendations` with `related-products` + objectID | FAIL | PASS |
| Uses `trending-items` with `facetName`/`facetValue` | FAIL | PASS |
| Sets threshold to 60 (balanced default) | PASS | PASS |
| Does NOT pass objectID for trending-items | PASS | PASS |

**Key finding:** The baseline guessed the tool name as `algolia_get_recommendations` (wrong) and used threshold 50 instead of the documented balanced default of 60. It also used `facetFilters` instead of the dedicated `facetName`/`facetValue` parameters for trending-items.

### Eval 4: Multi-Step Investigation (harder)

**Prompt:** *"Our 'ecommerce' index has a no-results rate of 18%. I need to find the specific queries that are failing, then for the top 3 failing queries, actually run those searches to see what results come back. Also check if our click-through rates have been improving β€” compare the last 7 days vs the previous 7 days."*

| Assertion | Without Skill | With Skill |
|-----------|:---:|:---:|
| Calls `algolia_search_list_indices` first | FAIL | PASS |
| Uses `algolia_analytics_searches_no_results` | FAIL | PASS |
| Uses `algolia_search_index` to test failing queries | FAIL | PASS |
| Uses `algolia_analytics_top_searches` with `clickAnalytics: true` for BOTH date ranges | FAIL | PASS |
| Sets `clickAnalytics: true` on `algolia_analytics_top_searches` specifically | FAIL | PASS |
| Uses correct YYYY-MM-DD date format | PASS | PASS |

**Key finding:** The baseline invented a non-existent `algolia_getClickThroughRate` endpoint instead of using `algolia_analytics_top_searches` with `clickAnalytics: true`. The skill's Search Quality Audit workflow guided the correct multi-step approach. The with-skill run also accounted for the 1-4 hour data processing delay by ending date ranges at the previous day.

### Eval 5: Date Filtering + Pagination (harder)

**Prompt:** *"Search my 'events' index for all conferences happening after January 1st 2025. The date is stored as a Unix timestamp field called 'event_date'. Filter to only events in 'technology' or 'science' categories with a ticket price between $50 and $500. Show me page 3 with 20 results per page."*

| Assertion | Without Skill | With Skill |
|-----------|:---:|:---:|
| Calls `algolia_search_list_indices` first | FAIL | PASS |
| Uses numericFilters with Unix timestamp (1735689600) | FAIL | PASS |
| Uses facetFilters with OR syntax for categories | PASS | PASS |
| Uses numericFilters for price range | FAIL | PASS |
| Combines all filters in a single `algolia_search_index` call | FAIL | PASS |
| Sets page to 2 (0-indexed) and hitsPerPage to 20 | PASS | PASS |

**Key finding:** The baseline used the wrong tool name (`algolia_search`), guessed the field name as `ticket_price` instead of `price`, and used `>` instead of `>=` for the date filter. The skill's explicit Unix timestamp guidance and filter syntax examples prevented all these mistakes.

## What the Skill Adds

The biggest areas where the skill outperforms general knowledge:

1. **Correct MCP tool names** β€” Every baseline fabricated plausible but wrong tool names (camelCase vs snake_case, missing `analytics_` prefix, wrong base names)
2. **Discovery workflow** β€” `algolia_search_list_indices` as mandatory first step (every baseline skipped it)
3. **`clickAnalytics: true`** β€” Knowing this flag exists and which tools support it (`top_searches`, `top_search_results` only)
4. **Filter syntax** β€” `facetFilters` array-based OR/AND vs `numericFilters` string-based format
5. **Recommendation parameters** β€” Which models need `objectID` vs `facetName`/`facetValue`, and threshold guidance
6. **Multi-step workflows** β€” Search Quality Audit pattern: analytics β†’ identify problems β†’ search to diagnose

## Improvements Made

1. **Surfaced filter syntax** (facetFilters OR/AND, numericFilters strings) from reference into main SKILL.md
2. **Surfaced `clickAnalytics: true`** guidance with which tools support it
3. **Surfaced recommendation thresholds** (50/60/75) and model parameter requirements table
4. **Added analytics interpretation benchmarks** (no-results rate thresholds, click position patterns)
5. **Added Common Workflows** section (Search Quality Audit, Recommendation Setup Check)
6. **Added algolia-cli cross-reference** for write operations

## Reproducibility

- Model: Claude Opus 4.6 (1M context)
- Eval definitions: `evals/evals.json`
- Date: 2026-03-18
- Each eval was run once per configuration
75 changes: 75 additions & 0 deletions skills/algolia-mcp/evals/evals.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{
"skill_name": "algolia-mcp",
"evals": [
{
"id": 1,
"prompt": "I want to search my 'products' index for shoes under $100 in either red or blue. Show me only the name, price, and color fields. Also, what are the available facet values for the 'brand' attribute?",
"expected_output": "Correct MCP tool calls with proper filter syntax, attribute selection, and facet exploration",
"files": [],
"expectations": [
"Calls algolia_search_list_indices first to discover applicationId and verify index name",
"Uses algolia_search_index with facetFilters using OR syntax for colors: [[\"color:red\", \"color:blue\"]]",
"Uses numericFilters with string syntax: [\"price < 100\"]",
"Combines facetFilters AND numericFilters in the same search call",
"Sets attributesToRetrieve to [\"name\", \"price\", \"color\"] to limit response",
"Uses algolia_search_for_facet_values to explore the 'brand' attribute"
]
},
{
"id": 2,
"prompt": "Give me a search quality report for my 'ecommerce' index over the last 30 days β€” I want to know the no-results rate, top searches that have no clicks, and the click position distribution. Include click-through rates where possible.",
"expected_output": "Multiple analytics tool calls with correct date params and clickAnalytics enabled",
"files": [],
"expectations": [
"Calls algolia_search_list_indices first to resolve applicationId",
"Uses algolia_analytics_no_results_rate with startDate and endDate in YYYY-MM-DD format spanning 30 days",
"Uses algolia_analytics_top_searches_without_clicks for searches with no clicks",
"Uses algolia_analytics_click_positions for click position distribution",
"Sets clickAnalytics: true on at least one analytics call to include CTR data",
"Does NOT use algolia-cli commands β€” this is a read-only analytics task"
]
},
{
"id": 3,
"prompt": "For product ID 'SKU-1234' in my 'catalog' index, show me frequently bought together items and related products. Also show me what's trending in the 'shoes' category. Use a balanced relevance threshold.",
"expected_output": "Three recommendation calls with correct model params and threshold",
"files": [],
"expectations": [
"Calls algolia_search_list_indices first to resolve applicationId",
"Uses algolia_recommendations with model 'bought-together' and objectID 'SKU-1234'",
"Uses algolia_recommendations with model 'related-products' and objectID 'SKU-1234'",
"Uses algolia_recommendations with model 'trending-items' with facetName and facetValue for shoes category",
"Sets threshold to 60 (or close) as the balanced default",
"Does NOT pass objectID for the trending-items call (trending-items does not require it)"
]
},
{
"id": 4,
"prompt": "Our 'ecommerce' index has a no-results rate of 18%. I need to find the specific queries that are failing, then for the top 3 failing queries, actually run those searches to see what results come back (or don't). Also check if our click-through rates have been improving β€” compare the last 7 days vs the previous 7 days.",
"expected_output": "Multi-step investigation workflow: analytics to find failing queries, then search to diagnose, plus two date-range comparisons",
"files": [],
"expectations": [
"Calls algolia_search_list_indices first to resolve applicationId",
"Uses algolia_analytics_searches_no_results to find specific failing queries",
"Uses algolia_search_index to test the failing queries and see actual results",
"Uses algolia_analytics_top_searches with clickAnalytics: true for BOTH date ranges (last 7 days AND previous 7 days) to compare CTR",
"Sets clickAnalytics: true specifically on algolia_analytics_top_searches (not on a tool that doesn't support it)",
"Uses correct YYYY-MM-DD date format for all date parameters"
]
},
{
"id": 5,
"prompt": "Search my 'events' index for all conferences happening after January 1st 2025. The date is stored as a Unix timestamp field called 'event_date'. Filter to only events in 'technology' or 'science' categories with a ticket price between $50 and $500. Show me page 3 with 20 results per page.",
"expected_output": "Search with Unix timestamp numericFilter, facetFilters, pagination, and correct date conversion",
"files": [],
"expectations": [
"Calls algolia_search_list_indices first to resolve applicationId",
"Uses numericFilters with Unix timestamp for date: event_date >= 1735689600 (or equivalent for Jan 1 2025)",
"Uses facetFilters with OR syntax for categories: [[\"category:technology\", \"category:science\"]]",
"Uses numericFilters for price range: [\"price >= 50\", \"price <= 500\"]",
"Combines all three filter types (date, category, price) in a single algolia_search_index call",
"Sets page to 2 (0-indexed, so page 3 = page parameter 2) and hitsPerPage to 20"
]
}
]
}
Loading