Add hf-best-model skill#125
Conversation
Skill that finds the best HuggingFace model for a given task and device. It queries official benchmark leaderboards via the HF REST API, enriches results with model metadata (parameter count, license), filters by device constraints (MacBook/RTX/CPU), and returns a ranked comparison table with benchmark scores and how-to-run snippets (Ollama + transformers). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Return highest-performing models from leaderboards unconditionally when the user doesn't mention a device. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of a hardcoded lookup table, compute max params from available memory using: fp16 = RAM/2 B, Q4 = RAM*2 B. Works for any device without needing to enumerate them all. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Let the API results speak for themselves. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renames the plugin to huggingface-best so users can install with `hf skills add huggingface-best`, and the internal skill name to `best`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
burtenshaw
left a comment
There was a problem hiding this comment.
Looks good. Just two nits.
|
@NathanHB -- in the llm-trainer skill we have a benchmarks script (I've just pushed an update #125 here -- help text in the PR for easy review). Wondering if that is useful to include here too - or maybe as an [edit] -- cool skill 😎 |
- Remove whisper spec doc that doesn't belong in this PR - Add REST API and CLI equivalents for hub_repo_details MCP tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Yes i think we should have this in the cli to first, reduce the number of skills to install, second make it more natural / easier for agents to use. |
|
Not a fan of having tool with hardcoded categories and use case as this adds a bit of bloat. I would rather keep it super simple, pointing the agent on how / where to get data and let it imply what to do. For example here it gets the list of benchmarks on the hub and imply from the suer prompt which ones to use. |
|
I meant #128 |
Adds a skill to make it easier for agents to use benchmark on the hub and the eladerboard feature.
For example, when prompted:
The model should fetch leaderboards on the hub, find the relevant benchmarks, get the top models according to available hardware, and run it if the user wants to. All while using huggingface hub and leaderboards as backend.