[Examples] Reorganize examples by model/scheme/algo hierarchy#2510
[Examples] Reorganize examples by model/scheme/algo hierarchy#2510
Conversation
- Restructure all examples into model-first folders (llama3, llama4, qwen1.5, qwen2, qwen2.5, qwen3, qwen3_next, qwen3.5, gemma2, gemma3, medgemma, phi3, granite4, mistral3, pixtral, mixtral, deepseek_r1, glm4, glm5, llava, idefics3, internvl3, whisper, gpt_oss, kimi_k2, omnicoder) with scheme/ then algo/ subfolders - Add top-level ddp/ folder for all distributed examples - Keep quantization_kv_cache/, quantization_non_uniform/, transform/, compressed_inference/, disk_offloading/, big_models_with_sequential_onloading/, sparse_2of4_quantization_fp8/, model_free_ptq/ as-is or README-only - Keep multimodal_vision/, multimodal_audio/, quantizing_moe/ with README only - Split model_free_ptq examples into respective model folders - Move autoround/quantization_w4a4_fp4/README.md into separate llama3 and qwen3 READMEs - Move AWQ and AutoRound README content into docs/steps/choosing-algo.md - Remove old scheme-first folder structure Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the examples directory by implementing a new model-first hierarchical structure. This change aims to enhance the clarity and organization of examples, making it much easier for users to locate and understand quantization implementations for specific models and schemes. Additionally, it centralizes key documentation for AWQ and AutoRound, providing a single source of truth for these techniques, and introduces new utility scripts and examples to support the updated structure. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ors_index.py - Move all model-specific folders under examples/models/ - Remove examples/model_free_ptq/create_safetensors_index.py Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request significantly updates the documentation for AWQ and AutoRound quantization, consolidating details into docs/steps/choosing-algo.md and removing older, separate READMEs. It also introduces new examples for model-free PTQ, including an OmniCoder FP8-Dynamic example, and adds a utility script to generate model.safetensors.index.json files. A new mixed quantization recipe is included, and a run_model.py script is added for testing quantized model generation. Feedback from the review points out a potential issue with an absolute path in the documentation link, a possible bug in the create_safetensors_index.py script regarding unknown data types, and unused imports in run_model.py.
I am having trouble creating individual review comments. Click here to see my feedback.
docs/steps/choosing-algo.md (51)
The link to the mappings registry (/src/llmcompressor/modifiers/awq/mappings.py) is an absolute path from the repository root. This may not resolve correctly when the documentation is built and hosted. It's safer to use a full URL to the file on GitHub to ensure the link is always valid.
To add support for a new model family, supply your own mappings via the `mappings` argument or contribute them to the [mappings registry](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/mappings.py).
examples/model_free_ptq/create_safetensors_index.py (54)
The dtype_sizes dictionary is missing some unsigned integer types (e.g., U16, U32, U64) and some FP8 variants. If an unknown dtype is encountered, element_size defaults to 1, which could lead to an incorrect total_size in the generated index file. It would be safer to explicitly handle unknown dtypes, for example by raising an error or printing a warning.
element_size = dtype_sizes.get(dtype)
if element_size is None:
print(f"Warning: Unknown dtype '{dtype}' found. Assuming size 1. This may result in an incorrect total_size.")
element_size = 1
run_model.py (3-4)
The imports oneshot and QuantizationModifier are not used in this script. It's good practice to remove unused imports to keep the code clean and avoid confusion.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
brian-dellabetta
left a comment
There was a problem hiding this comment.
Being able to look up by algorithm has been useful for me, especially model_free_ptq and disk_offloading. WDYT about an additional examples/topics folder that symlinks so users can still look up things by algorithm scheme?
example/topics/disk_offloading/kimi_k2_thinking_nvfp4a16.py -> example/models/kimi_k2/w4a16_fp4/kimi_k2_thinking_nvfp4a16.py
Those are different pathways that we can add but the flow of model -> scheme -> algo is how users quantize. What you’re talking about is not algorithm but a potential implementation of that the top flow. Once you know you want to quantize kimi k2 to fp8, you need to consider if you want to use model free ptq or oneshot. Thats the flow we want our users to follow and how our docs are currently set-up. We don’t want the thinking to be “I want to use model_free_ptq, let me try gptq” as that way of thinking is not supported and incorrect For certain useful guides, such as for ddp or disk offloading, we would maintain top level readmes and could just link to the relevant examples I don’t think sym links are useful |
Summary
Restructures the
examples/directory from an inconsistent mix of scheme-first, algo-first, and model-type-first organization into a consistent model → scheme → algo hierarchy under a newexamples/models/folder.NOTE: this structure is currently a wip as I still need to restructure the top level readmes / update their content.
New structure
Key changes
examples/models/:llama3,llama4,qwen1.5,qwen2,qwen2.5,qwen3,qwen3_next,qwen3.5,gemma2,gemma3,medgemma,phi3,granite4,mistral3,pixtral,mixtral,deepseek_r1,glm4,glm5,llava,idefics3,internvl3,whisper,gpt_oss,kimi_k2,omnicoderawq/,autoround/) appear under scheme folders where applicableddp/folder consolidates all distributed training examplesmodel_free_ptq/examples moved into their respective model folders undermodels/directly under the scheme folder (nomodel_free_ptq/subfolder)awq/,autoround/,quantization_w4a16/,quantization_w4a16_fp4/,quantization_w4a4_fp4/,quantization_w4a8/,quantization_w4a8_fp8/,quantization_w8a8_fp8/,quantization_w8a8_int8/,quantizing_moe/(python files only)docs/steps/choosing-algo.mdautoround/quantization_w4a4_fp4/README.mdsplit into separate llama3 and qwen3 READMEsgranite4/andinternvl3/READMEs moved into their respective model/scheme folders🤖 Generated with Claude Code