Skip to content

Commit 7f72647

Browse files
Your Nameclaude
andcommitted
feat: enhance algolia-mcp skill with evals and surfaced reference content
Add evaluation framework and improve SKILL.md by surfacing critical details from reference files into the main skill body for more robust behavior. Changes to SKILL.md: - Surface filter syntax (facetFilters OR/AND, numericFilters strings) - Surface clickAnalytics: true guidance with supported tools list - Surface recommendation thresholds (50/60/75) and model parameter table - Add analytics interpretation benchmarks (no-results rate, click positions) - Add Common Workflows (Search Quality Audit, Recommendation Setup Check) - Add algolia-cli cross-reference for write operations Evals (5 scenarios, 100% with-skill vs 27% baseline): - Search with filters (facetFilters + numericFilters + facet values) - Analytics report (tool selection + clickAnalytics + date params) - Recommendations (model params + threshold + trending-items) - Multi-step investigation (diagnose no-results + CTR comparison) - Date filtering (Unix timestamps + pagination + combined filters) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 53bd356 commit 7f72647

File tree

3 files changed

+267
-2
lines changed

3 files changed

+267
-2
lines changed

skills/algolia-mcp/SKILL.md

Lines changed: 72 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,14 +51,84 @@ For clients that don't support commands, see [connection-setup](references/conne
5151
| Trending facets | `algolia_recommendations` | `trending-facets` |
5252
| Visually similar items | `algolia_recommendations` | `looking-similar` |
5353

54-
## Required workflow
54+
## Search Filter Syntax
55+
56+
Filters go in the `algolia_search_index` call alongside `query`:
57+
58+
**facetFilters** (array-based):
59+
```
60+
[["color:red", "color:blue"]] → OR (red OR blue)
61+
[["brand:Nike"], ["category:running"]] → AND (Nike AND running)
62+
[["size:10"], ["color:red", "color:blue"]] → mixed (size 10 AND (red OR blue))
63+
```
64+
Each inner array is OR'd; outer arrays are AND'd.
65+
66+
**numericFilters** (string-based):
67+
```
68+
["price < 100"] → single condition
69+
["price >= 50", "price <= 200"] → range (AND'd)
70+
```
71+
72+
**Date filtering**: Dates must be stored as Unix timestamps. Use `numericFilters: ["timestamp >= 1704067200"]`.
73+
74+
**Attribute selection**: Use `attributesToRetrieve: ["name", "price"]` to limit response size.
75+
76+
## Analytics Key Details
77+
78+
- **`clickAnalytics: true`**: Set this on `algolia_analytics_top_searches` or `algolia_analytics_top_search_results` to include CTR, conversion rate, and click count. Only these two tools support it.
79+
- **`revenueAnalytics: true`**: Set on the same tools to also include add-to-cart rate, purchase rate, and revenue.
80+
- **Data delay**: Recent data has a 1–4 hour processing delay. Use date ranges ending at least 4 hours ago for complete data.
81+
82+
### Interpreting Results
83+
84+
| No-results rate | Assessment |
85+
|----------------|------------|
86+
| < 5% | Excellent |
87+
| 5–10% | Good |
88+
| 10–20% | Needs improvement |
89+
| > 20% | Poor |
90+
91+
**Click positions**: Healthy = 30–40% of clicks at position 1, decreasing through 10. Even distribution = poor relevance. Concentrated at positions 5–10 = ranking issues.
92+
93+
**Low CTR + high search volume** = poor result relevance. Common causes: missing synonyms, content gaps, mismatched query intent.
94+
95+
## Recommendation Thresholds
96+
97+
| Threshold | Behavior |
98+
|-----------|----------|
99+
| 50 | More results, lower relevance |
100+
| **60** | **Balanced (good default)** |
101+
| 75 | Fewer results, higher relevance |
102+
103+
**Model parameter requirements**:
104+
- `bought-together`, `related-products`, `looking-similar` → require `objectID`
105+
- `trending-items` → does NOT require `objectID`. Use `facetName` + `facetValue` to filter by category
106+
- `trending-facets` → requires `facetName`
107+
108+
## Required Workflow
55109

56110
1. **Discover first**: Always call `algolia_search_list_indices` before other tools to resolve `applicationId` and `indexName`. The `applicationId` parameter is an enum — select from the values in the tool schema, never guess.
57111
2. **Index names are case-sensitive**: Use the exact name returned by `algolia_search_list_indices`.
58112
3. **Date parameters**: Analytics tools accept `startDate` and `endDate` in `YYYY-MM-DD` format. Default period is the last 8 days.
59113
4. **Permissions**: Not all tools are available to every user. Analytics tools require the Analytics permission; recommendations require the Recommend feature.
60114

61-
## Reference docs
115+
## Common Workflows
116+
117+
### Search Quality Audit
118+
1. `algolia_search_list_indices` → get applicationId and index name
119+
2. `algolia_analytics_no_results_rate` → check overall health (< 5% is excellent)
120+
3. `algolia_analytics_searches_no_results` → find the specific failing queries
121+
4. `algolia_analytics_top_searches` with `clickAnalytics: true` → find high-volume queries with low CTR
122+
5. `algolia_analytics_click_positions` → check if clicks are concentrated at position 1 (good) or spread evenly (poor relevance)
123+
6. For each problematic query: `algolia_search_index` with that query to see what results look like
124+
125+
### Recommendation Setup Check
126+
1. `algolia_search_list_indices` → resolve applicationId
127+
2. Start with `trending-items` (requires least data) to verify Recommend is working
128+
3. Then try `bought-together` or `related-products` with a known product objectID
129+
4. If results are empty, check event volume requirements in [recommendations reference](references/recommendations.md)
130+
131+
## Reference Docs
62132

63133
- [connection-setup](references/connection-setup.md) — MCP server configuration and authentication
64134
- [search](references/search.md) — Search parameters, filter syntax (`facetFilters`, `numericFilters`), pagination
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Algolia MCP Skill — Evaluation Results
2+
3+
Evaluation performed on 2026-03-18 using Claude Opus 4.6 (1M context).
4+
5+
## Summary
6+
7+
The skill was evaluated across 5 realistic user scenarios, comparing **with-skill** (Claude reads the skill before responding) vs **without-skill** (Claude relies on general knowledge).
8+
9+
| Eval | With Skill | Without Skill | Delta |
10+
|------|:----------:|:-------------:|:-----:|
11+
| **Eval 1** — Search with filters | 100% (6/6) | 17% (1/6) | **+83%** |
12+
| **Eval 2** — Analytics report | 100% (6/6) | 33% (2/6) | **+67%** |
13+
| **Eval 3** — Recommendations | 100% (6/6) | 33% (2/6) | **+67%** |
14+
| **Eval 4** — Multi-step investigation | 100% (6/6) | 17% (1/6) | **+83%** |
15+
| **Eval 5** — Date filtering + pagination | 100% (6/6) | 33% (2/6) | **+67%** |
16+
| **Average** | **100%** | **27%** | **+73%** |
17+
18+
## Eval Details
19+
20+
### Eval 1: Search with Filters
21+
22+
**Prompt:** *"I want to search my 'products' index for shoes under $100 in either red or blue. Show me only the name, price, and color fields. Also, what are the available facet values for the 'brand' attribute?"*
23+
24+
| Assertion | Without Skill | With Skill |
25+
|-----------|:---:|:---:|
26+
| Calls `algolia_search_list_indices` first | FAIL | PASS |
27+
| Uses `facetFilters` with OR syntax `[["color:red", "color:blue"]]` | FAIL | PASS |
28+
| Uses `numericFilters` with string syntax `["price < 100"]` | FAIL | PASS |
29+
| Combines facetFilters AND numericFilters in same call | FAIL | PASS |
30+
| Sets `attributesToRetrieve` to `["name", "price", "color"]` | PASS | PASS |
31+
| Uses `algolia_search_for_facet_values` for brand | FAIL | PASS |
32+
33+
**Key finding:** Without the skill, Claude used a generic `algolia_search` tool with a combined `filters` string instead of the MCP-specific `facetFilters`/`numericFilters` array parameters. It also used `facets` parameter instead of the dedicated `algolia_search_for_facet_values` tool.
34+
35+
### Eval 2: Analytics Report
36+
37+
**Prompt:** *"Give me a search quality report for my 'ecommerce' index over the last 30 days — I want to know the no-results rate, top searches that have no clicks, and the click position distribution. Include click-through rates where possible."*
38+
39+
| Assertion | Without Skill | With Skill |
40+
|-----------|:---:|:---:|
41+
| Calls `algolia_search_list_indices` first | FAIL | PASS |
42+
| Uses `algolia_analytics_no_results_rate` with correct dates | FAIL | PASS |
43+
| Uses `algolia_analytics_top_searches_without_clicks` | FAIL | PASS |
44+
| Uses `algolia_analytics_click_positions` | FAIL | PASS |
45+
| Sets `clickAnalytics: true` on a supported tool | PASS | PASS |
46+
| Does NOT use algolia-cli commands | PASS | PASS |
47+
48+
**Key finding:** The baseline fabricated all tool names using camelCase (`algolia_getNoResultsRate`, `algolia_getClickThroughRate`) instead of the actual snake_case MCP tool names (`algolia_analytics_no_results_rate`). It also skipped the discovery step entirely.
49+
50+
### Eval 3: Recommendations
51+
52+
**Prompt:** *"For product ID 'SKU-1234' in my 'catalog' index, show me frequently bought together items and related products. Also show me what's trending in the 'shoes' category. Use a balanced relevance threshold."*
53+
54+
| Assertion | Without Skill | With Skill |
55+
|-----------|:---:|:---:|
56+
| Calls `algolia_search_list_indices` first | FAIL | PASS |
57+
| Uses `algolia_recommendations` with `bought-together` + objectID | FAIL | PASS |
58+
| Uses `algolia_recommendations` with `related-products` + objectID | FAIL | PASS |
59+
| Uses `trending-items` with `facetName`/`facetValue` | FAIL | PASS |
60+
| Sets threshold to 60 (balanced default) | PASS | PASS |
61+
| Does NOT pass objectID for trending-items | PASS | PASS |
62+
63+
**Key finding:** The baseline guessed the tool name as `algolia_get_recommendations` (wrong) and used threshold 50 instead of the documented balanced default of 60. It also used `facetFilters` instead of the dedicated `facetName`/`facetValue` parameters for trending-items.
64+
65+
### Eval 4: Multi-Step Investigation (harder)
66+
67+
**Prompt:** *"Our 'ecommerce' index has a no-results rate of 18%. I need to find the specific queries that are failing, then for the top 3 failing queries, actually run those searches to see what results come back. Also check if our click-through rates have been improving — compare the last 7 days vs the previous 7 days."*
68+
69+
| Assertion | Without Skill | With Skill |
70+
|-----------|:---:|:---:|
71+
| Calls `algolia_search_list_indices` first | FAIL | PASS |
72+
| Uses `algolia_analytics_searches_no_results` | FAIL | PASS |
73+
| Uses `algolia_search_index` to test failing queries | FAIL | PASS |
74+
| Uses `algolia_analytics_top_searches` with `clickAnalytics: true` for BOTH date ranges | FAIL | PASS |
75+
| Sets `clickAnalytics: true` on `algolia_analytics_top_searches` specifically | FAIL | PASS |
76+
| Uses correct YYYY-MM-DD date format | PASS | PASS |
77+
78+
**Key finding:** The baseline invented a non-existent `algolia_getClickThroughRate` endpoint instead of using `algolia_analytics_top_searches` with `clickAnalytics: true`. The skill's Search Quality Audit workflow guided the correct multi-step approach. The with-skill run also accounted for the 1-4 hour data processing delay by ending date ranges at the previous day.
79+
80+
### Eval 5: Date Filtering + Pagination (harder)
81+
82+
**Prompt:** *"Search my 'events' index for all conferences happening after January 1st 2025. The date is stored as a Unix timestamp field called 'event_date'. Filter to only events in 'technology' or 'science' categories with a ticket price between $50 and $500. Show me page 3 with 20 results per page."*
83+
84+
| Assertion | Without Skill | With Skill |
85+
|-----------|:---:|:---:|
86+
| Calls `algolia_search_list_indices` first | FAIL | PASS |
87+
| Uses numericFilters with Unix timestamp (1735689600) | FAIL | PASS |
88+
| Uses facetFilters with OR syntax for categories | PASS | PASS |
89+
| Uses numericFilters for price range | FAIL | PASS |
90+
| Combines all filters in a single `algolia_search_index` call | FAIL | PASS |
91+
| Sets page to 2 (0-indexed) and hitsPerPage to 20 | PASS | PASS |
92+
93+
**Key finding:** The baseline used the wrong tool name (`algolia_search`), guessed the field name as `ticket_price` instead of `price`, and used `>` instead of `>=` for the date filter. The skill's explicit Unix timestamp guidance and filter syntax examples prevented all these mistakes.
94+
95+
## What the Skill Adds
96+
97+
The biggest areas where the skill outperforms general knowledge:
98+
99+
1. **Correct MCP tool names** — Every baseline fabricated plausible but wrong tool names (camelCase vs snake_case, missing `analytics_` prefix, wrong base names)
100+
2. **Discovery workflow**`algolia_search_list_indices` as mandatory first step (every baseline skipped it)
101+
3. **`clickAnalytics: true`** — Knowing this flag exists and which tools support it (`top_searches`, `top_search_results` only)
102+
4. **Filter syntax**`facetFilters` array-based OR/AND vs `numericFilters` string-based format
103+
5. **Recommendation parameters** — Which models need `objectID` vs `facetName`/`facetValue`, and threshold guidance
104+
6. **Multi-step workflows** — Search Quality Audit pattern: analytics → identify problems → search to diagnose
105+
106+
## Improvements Made
107+
108+
1. **Surfaced filter syntax** (facetFilters OR/AND, numericFilters strings) from reference into main SKILL.md
109+
2. **Surfaced `clickAnalytics: true`** guidance with which tools support it
110+
3. **Surfaced recommendation thresholds** (50/60/75) and model parameter requirements table
111+
4. **Added analytics interpretation benchmarks** (no-results rate thresholds, click position patterns)
112+
5. **Added Common Workflows** section (Search Quality Audit, Recommendation Setup Check)
113+
6. **Added algolia-cli cross-reference** for write operations
114+
115+
## Reproducibility
116+
117+
- Model: Claude Opus 4.6 (1M context)
118+
- Eval definitions: `evals/evals.json`
119+
- Date: 2026-03-18
120+
- Each eval was run once per configuration
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
{
2+
"skill_name": "algolia-mcp",
3+
"evals": [
4+
{
5+
"id": 1,
6+
"prompt": "I want to search my 'products' index for shoes under $100 in either red or blue. Show me only the name, price, and color fields. Also, what are the available facet values for the 'brand' attribute?",
7+
"expected_output": "Correct MCP tool calls with proper filter syntax, attribute selection, and facet exploration",
8+
"files": [],
9+
"expectations": [
10+
"Calls algolia_search_list_indices first to discover applicationId and verify index name",
11+
"Uses algolia_search_index with facetFilters using OR syntax for colors: [[\"color:red\", \"color:blue\"]]",
12+
"Uses numericFilters with string syntax: [\"price < 100\"]",
13+
"Combines facetFilters AND numericFilters in the same search call",
14+
"Sets attributesToRetrieve to [\"name\", \"price\", \"color\"] to limit response",
15+
"Uses algolia_search_for_facet_values to explore the 'brand' attribute"
16+
]
17+
},
18+
{
19+
"id": 2,
20+
"prompt": "Give me a search quality report for my 'ecommerce' index over the last 30 days — I want to know the no-results rate, top searches that have no clicks, and the click position distribution. Include click-through rates where possible.",
21+
"expected_output": "Multiple analytics tool calls with correct date params and clickAnalytics enabled",
22+
"files": [],
23+
"expectations": [
24+
"Calls algolia_search_list_indices first to resolve applicationId",
25+
"Uses algolia_analytics_no_results_rate with startDate and endDate in YYYY-MM-DD format spanning 30 days",
26+
"Uses algolia_analytics_top_searches_without_clicks for searches with no clicks",
27+
"Uses algolia_analytics_click_positions for click position distribution",
28+
"Sets clickAnalytics: true on at least one analytics call to include CTR data",
29+
"Does NOT use algolia-cli commands — this is a read-only analytics task"
30+
]
31+
},
32+
{
33+
"id": 3,
34+
"prompt": "For product ID 'SKU-1234' in my 'catalog' index, show me frequently bought together items and related products. Also show me what's trending in the 'shoes' category. Use a balanced relevance threshold.",
35+
"expected_output": "Three recommendation calls with correct model params and threshold",
36+
"files": [],
37+
"expectations": [
38+
"Calls algolia_search_list_indices first to resolve applicationId",
39+
"Uses algolia_recommendations with model 'bought-together' and objectID 'SKU-1234'",
40+
"Uses algolia_recommendations with model 'related-products' and objectID 'SKU-1234'",
41+
"Uses algolia_recommendations with model 'trending-items' with facetName and facetValue for shoes category",
42+
"Sets threshold to 60 (or close) as the balanced default",
43+
"Does NOT pass objectID for the trending-items call (trending-items does not require it)"
44+
]
45+
},
46+
{
47+
"id": 4,
48+
"prompt": "Our 'ecommerce' index has a no-results rate of 18%. I need to find the specific queries that are failing, then for the top 3 failing queries, actually run those searches to see what results come back (or don't). Also check if our click-through rates have been improving — compare the last 7 days vs the previous 7 days.",
49+
"expected_output": "Multi-step investigation workflow: analytics to find failing queries, then search to diagnose, plus two date-range comparisons",
50+
"files": [],
51+
"expectations": [
52+
"Calls algolia_search_list_indices first to resolve applicationId",
53+
"Uses algolia_analytics_searches_no_results to find specific failing queries",
54+
"Uses algolia_search_index to test the failing queries and see actual results",
55+
"Uses algolia_analytics_top_searches with clickAnalytics: true for BOTH date ranges (last 7 days AND previous 7 days) to compare CTR",
56+
"Sets clickAnalytics: true specifically on algolia_analytics_top_searches (not on a tool that doesn't support it)",
57+
"Uses correct YYYY-MM-DD date format for all date parameters"
58+
]
59+
},
60+
{
61+
"id": 5,
62+
"prompt": "Search my 'events' index for all conferences happening after January 1st 2025. The date is stored as a Unix timestamp field called 'event_date'. Filter to only events in 'technology' or 'science' categories with a ticket price between $50 and $500. Show me page 3 with 20 results per page.",
63+
"expected_output": "Search with Unix timestamp numericFilter, facetFilters, pagination, and correct date conversion",
64+
"files": [],
65+
"expectations": [
66+
"Calls algolia_search_list_indices first to resolve applicationId",
67+
"Uses numericFilters with Unix timestamp for date: event_date >= 1735689600 (or equivalent for Jan 1 2025)",
68+
"Uses facetFilters with OR syntax for categories: [[\"category:technology\", \"category:science\"]]",
69+
"Uses numericFilters for price range: [\"price >= 50\", \"price <= 500\"]",
70+
"Combines all three filter types (date, category, price) in a single algolia_search_index call",
71+
"Sets page to 2 (0-indexed, so page 3 = page parameter 2) and hitsPerPage to 20"
72+
]
73+
}
74+
]
75+
}

0 commit comments

Comments
 (0)