Claude code skills for transformers-api #43340
Open
+4,305
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #42971
This PR adds a Claude Skill for the
huggingface/transformersto help contributors navigate the codebase and common development workflows more efficientlyWhat’s included
What’s not included
The original issue mentions a plugin request as well, but this PR focuses on delivering the Skill first as a minimal, useful step. Plugin support can be handled in a follow-up PR.
How to test
A few of the many examples I tested include questions like:
API existence / anti-hallucination check:
“Does Transformers have a public argument called
temperature_decayongenerate()? If yes, show the exact signature location. If no, point to the closest real knobs and where they’re defined.”Repo navigation / backend dispatch:
“Where is the logic that decides which backend (PyTorch vs TensorFlow vs Flax) gets used when calling
AutoModel.from_pretrained()? Point to the exact files and decision flow.”Generation internals / repetition debugging:
“I’m getting repetitive text in long generations even with
repetition_penaltyset, what knobs interact most strongly with repetition, and which files apply these penalties during decoding?”Quantization & loading performance troubleshooting:
“Loading a 7B causal LM with 4-bit quantization and
device_map="auto"is causing slow CPU offload and high RAM. what are the likely causes in the loading path, what knobs should I change, and where are they handled in code?”Serving/export reality check:
“Is there a supported CLI command
transformers servefor text-generation with batching? If not, what are the supported alternatives in the Transformers ecosystem, and where are the relevant docs/code in this repo?”PS: This is just an initial draft I put together so maintainers and other community folks can try it out first. Once people test it and share feedback, we can iterate on it and polish/improve it.
For review : @Rocketknight1, @stevhliu, @ArthurZucker
CC : @Emasoft, @coolgalsandiego