Skyll uses a multi-signal ranking algorithm to order search results by relevance. Each skill receives a score from 0-100 based on weighted factors, with a small additional boost for curated registry skills.
Today, developers pre-install skills manually before agents can use them. With Skyll, agents can explore and make their own decisions. The ranked list enables:
- Agent autonomy: Agents choose skills based on user requests, task context, or their own judgment
- Discovery: Surface popular and trending skills the agent (or developer) didn't know existed
- Flexibility: Filter by score, pick the top match, or let the agent evaluate multiple options
Options give agents the freedom to discover and decide, rather than being limited to pre-installed choices.
relevance_score = content + references + query_match + popularity + curated_boost
The base score is 0-100. The curated boost can add up to 8 additional points for highly relevant registry skills.
Skills with successfully fetched SKILL.md content receive 40 points. Skills without content are sorted last (not filtered) since they may still serve as pointers.
| Condition | Points |
|---|---|
| Has content | 40 |
| No content (fetch failed) | 0 |
When include_references=true, skills with reference files receive a 15-point boost. This surfaces skills with richer documentation.
| Condition | Points |
|---|---|
| Has references (when requested) | 15 |
| No references | 0 |
How well the skill matches the search query. Checks multiple fields in priority order:
ID Matching (strongest signal):
| Match Type | Score | Points |
|---|---|---|
| Exact ID match | 1.0 | 30 |
| All query terms in ID | 0.9 | 27 |
| All ID terms in query | 0.85 | 25.5 |
Title Matching:
| Match Type | Score | Points |
|---|---|---|
| All query terms in title | 0.8 | 24 |
| All terms across ID + title combined | 0.75 | 22.5 |
Description Matching:
| Match Type | Score | Points |
|---|---|---|
| All query terms in description | 0.7 | 21 |
| Partial description match | 0.0-0.35 | 0-10.5 |
Content Matching (weakest signal, first 2000 chars):
| Match Type | Score | Points |
|---|---|---|
| Partial content match | 0.0-0.15 | 0-4.5 |
The ranker computes all applicable scores and takes the highest. For example, a skill whose ID partially matches but whose description fully matches will use the description score (0.7) rather than the partial ID score.
Example: Query "deep research" on skill "gpt-researcher"
- ID "gpt-researcher" → partial match ("research" found) → 0.25
- Description "Autonomous deep research agent..." → all terms found → 0.7
- Best score: 0.7 → 21 points
Install count from skills.sh, using logarithmic scaling:
popularity = min(log10(install_count + 1) / 4, 1.0) × 15| Install Count | Score | Points |
|---|---|---|
| 0 | 0 | 0 |
| 10 | 0.25 | 3.75 |
| 100 | 0.5 | 7.5 |
| 1,000 | 0.75 | 11.25 |
| 10,000+ | 1.0 | 15 |
Skills from the local curated registry (registry/SKILLS.md) receive a boost scaled by their query relevance. This rewards hand-picked quality skills without letting irrelevant registry entries jump the ranks.
curated_boost = is_curated × 8 × query_match| Query Match | Boost |
|---|---|
| Strong match (0.9) | +7.2 |
| Description match (0.7) | +5.6 |
| Partial match (0.25) | +2 |
| No match (0.0) | 0 |
Example: Curated skill "gpt-researcher" for query "deep research"
- Query match: 0.7 (description match)
- Curated boost: 8 × 0.7 = 5.6 points
Before ranking, the search pipeline:
- Over-fetches from sources (2× requested limit) to give the ranker more candidates
- Deduplicates by source/ID, preferring skills.sh results (which have install counts)
- Fetches content from GitHub for all candidates in parallel
- Ranks using the scoring formula above
- Trims to the requested limit
| Skill | Content | Refs | Query | Pop. | Curated | Total |
|---|---|---|---|---|---|---|
| Exact match, popular, with content | 40 | 15 | 30 | 15 | 0 | 100 |
| Good match, popular, with content | 40 | 0 | 25 | 12 | 0 | 77 |
| Curated, description match, no installs | 40 | 0 | 21 | 0 | 5.6 | 66.6 |
| Partial match, new skill | 40 | 0 | 15 | 0 | 0 | 55 |
| No content (fetch failed) | 0 | 0 | 30 | 15 | 0 | 45 |
-
Content is king: Skills without content are less useful, so content availability dominates the score.
-
Query relevance matters: Exact matches should rank above partial matches, regardless of popularity.
-
Multi-field matching: Skills can match via ID, title, description, or content. A skill about "deep research" should surface even if its ID is "gpt-researcher" - the description match catches this.
-
Popularity is a signal, not the answer: Log scaling prevents extremely popular skills from dominating. A skill with 100 installs and good query match can outrank a skill with 10,000 installs and poor match.
-
Curated skills get a nudge: Registry skills are hand-picked for quality. The boost is scaled by relevance to prevent irrelevant registry skills from jumping the ranks.
-
References add value: When users request references, skills that provide them are more valuable.
The ranking algorithm is modular. To create a custom ranker:
from src.ranking.base import Ranker
class MyCustomRanker(Ranker):
def rank(self, skills, query="", include_references=False):
for skill in skills:
# Your scoring logic
skill.relevance_score = ...
return sorted(skills, key=lambda s: s.relevance_score, reverse=True)Register in src/core/service.py:
self._ranker = MyCustomRanker()The ranking system is designed for extension:
- Semantic search: Use embeddings to match query intent, not just keywords
- Recency: Boost recently updated skills
- Quality signals: Factor in documentation completeness, test coverage
- User feedback: Learn from click-through rates
We welcome community contributions to improve ranking. Open an issue or PR to discuss!