Skip to content

Add hf-best-model skill#125

Merged
burtenshaw merged 13 commits intohuggingface:mainfrom
NathanHB:feat/hf-best-model
Apr 23, 2026
Merged

Add hf-best-model skill#125
burtenshaw merged 13 commits intohuggingface:mainfrom
NathanHB:feat/hf-best-model

Conversation

@NathanHB
Copy link
Copy Markdown
Member

@NathanHB NathanHB commented Apr 20, 2026

Adds a skill to make it easier for agents to use benchmark on the hub and the eladerboard feature.
For example, when prompted:

What's the best model to parse my parking tickets locally ?

The model should fetch leaderboards on the hub, find the relevant benchmarks, get the top models according to available hardware, and run it if the user wants to. All while using huggingface hub and leaderboards as backend.

NathanHB and others added 10 commits April 20, 2026 16:30
Skill that finds the best HuggingFace model for a given task and device.
It queries official benchmark leaderboards via the HF REST API, enriches
results with model metadata (parameter count, license), filters by device
constraints (MacBook/RTX/CPU), and returns a ranked comparison table with
benchmark scores and how-to-run snippets (Ollama + transformers).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Return highest-performing models from leaderboards unconditionally
when the user doesn't mention a device.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of a hardcoded lookup table, compute max params from
available memory using: fp16 = RAM/2 B, Q4 = RAM*2 B.
Works for any device without needing to enumerate them all.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Let the API results speak for themselves.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renames the plugin to huggingface-best so users can install with
`hf skills add huggingface-best`, and the internal skill name to `best`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@burtenshaw burtenshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just two nits.

Comment thread docs/superpowers/specs/2026-04-21-whisper-small-fr-space-design.md Outdated
Comment thread skills/huggingface-best/SKILL.md Outdated
@evalstate
Copy link
Copy Markdown
Collaborator

evalstate commented Apr 22, 2026

@NathanHB -- in the llm-trainer skill we have a benchmarks script (I've just pushed an update #125 here -- help text in the PR for easy review). Wondering if that is useful to include here too - or maybe as an hf plugin @hanouticelina ? @merveenoyan think we've also been discussing "best" model recently?

[edit] -- cool skill 😎

NathanHB and others added 2 commits April 22, 2026 15:01
- Remove whisper spec doc that doesn't belong in this PR
- Add REST API and CLI equivalents for hub_repo_details MCP tool

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NathanHB
Copy link
Copy Markdown
Member Author

Yes i think we should have this in the cli to first, reduce the number of skills to install, second make it more natural / easier for agents to use.

@NathanHB
Copy link
Copy Markdown
Member Author

Not a fan of having tool with hardcoded categories and use case as this adds a bit of bloat. I would rather keep it super simple, pointing the agent on how / where to get data and let it imply what to do. For example here it gets the list of benchmarks on the hub and imply from the suer prompt which ones to use.

@evalstate
Copy link
Copy Markdown
Collaborator

I meant #128

@burtenshaw burtenshaw merged commit ddcf680 into huggingface:main Apr 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants