[Examples] Reorganize examples by model/scheme/algo hierarchy by dsikka · Pull Request #2510 · vllm-project/llm-compressor

dsikka · 2026-03-24T19:44:19Z

Summary

Restructures the examples/ directory from an inconsistent mix of scheme-first, algo-first, and model-type-first organization into a consistent model → scheme → algo hierarchy under a new examples/models/ folder.

NOTE: this structure is currently a wip as I still need to restructure the top level readmes / update their content.

New structure

examples/
├── models/                          # all model-specific examples
│   ├── llama3/w4a16/awq/
│   ├── llama4/w8a8_fp8/autoround/
│   ├── qwen3/w4a16/awq/
│   ├── ...                          # 26 model folders total
├── ddp/                             # distributed examples by model/scheme/algo
├── model_free_ptq/                  # README only (examples moved into models/)
├── quantization_kv_cache/           # kept as-is
├── quantization_non_uniform/        # kept as-is
├── multimodal_vision/               # README only
├── multimodal_audio/                # README only
├── quantizing_moe/                  # README only
├── big_models_with_sequential_onloading/
├── compressed_inference/
├── disk_offloading/
├── sparse_2of4_quantization_fp8/
└── transform/

Key changes

26 model folders created under examples/models/: llama3, llama4, qwen1.5, qwen2, qwen2.5, qwen3, qwen3_next, qwen3.5, gemma2, gemma3, medgemma, phi3, granite4, mistral3, pixtral, mixtral, deepseek_r1, glm4, glm5, llava, idefics3, internvl3, whisper, gpt_oss, kimi_k2, omnicoder
Algo subfolders (awq/, autoround/) appear under scheme folders where applicable
Top-level ddp/ folder consolidates all distributed training examples
model_free_ptq/ examples moved into their respective model folders under models/ directly under the scheme folder (no model_free_ptq/ subfolder)
Old scheme-first folders removed: awq/, autoround/, quantization_w4a16/, quantization_w4a16_fp4/, quantization_w4a4_fp4/, quantization_w4a8/, quantization_w4a8_fp8/, quantization_w8a8_fp8/, quantization_w8a8_int8/, quantizing_moe/ (python files only)
AWQ and AutoRound README content moved into docs/steps/choosing-algo.md
autoround/quantization_w4a4_fp4/README.md split into separate llama3 and qwen3 READMEs
granite4/ and internvl3/ READMEs moved into their respective model/scheme folders

🤖 Generated with Claude Code

- Restructure all examples into model-first folders (llama3, llama4, qwen1.5, qwen2, qwen2.5, qwen3, qwen3_next, qwen3.5, gemma2, gemma3, medgemma, phi3, granite4, mistral3, pixtral, mixtral, deepseek_r1, glm4, glm5, llava, idefics3, internvl3, whisper, gpt_oss, kimi_k2, omnicoder) with scheme/ then algo/ subfolders - Add top-level ddp/ folder for all distributed examples - Keep quantization_kv_cache/, quantization_non_uniform/, transform/, compressed_inference/, disk_offloading/, big_models_with_sequential_onloading/, sparse_2of4_quantization_fp8/, model_free_ptq/ as-is or README-only - Keep multimodal_vision/, multimodal_audio/, quantizing_moe/ with README only - Split model_free_ptq examples into respective model folders - Move autoround/quantization_w4a4_fp4/README.md into separate llama3 and qwen3 READMEs - Move AWQ and AutoRound README content into docs/steps/choosing-algo.md - Remove old scheme-first folder structure Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-24T19:44:34Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-03-24T19:46:43Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the examples directory by implementing a new model-first hierarchical structure. This change aims to enhance the clarity and organization of examples, making it much easier for users to locate and understand quantization implementations for specific models and schemes. Additionally, it centralizes key documentation for AWQ and AutoRound, providing a single source of truth for these techniques, and introduces new utility scripts and examples to support the updated structure.

Highlights

Example Reorganization: Restructured all examples into a model-first folder hierarchy (e.g., llama3, qwen1.5), with scheme and algorithm subfolders for improved navigation.
Distributed Examples Consolidation: Introduced a top-level ddp/ folder to centralize all distributed examples.
Documentation Centralization: Moved detailed content for AWQ and AutoRound quantization methods from individual READMEs into the central docs/steps/choosing-algo.md documentation.
Model-Free PTQ Integration: Split model_free_ptq examples and integrated them into their respective model folders, enhancing discoverability.
Legacy Structure Removal: Removed the old scheme-first folder structure, streamlining the examples directory.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ors_index.py - Move all model-specific folders under examples/models/ - Remove examples/model_free_ptq/create_safetensors_index.py Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request significantly updates the documentation for AWQ and AutoRound quantization, consolidating details into docs/steps/choosing-algo.md and removing older, separate READMEs. It also introduces new examples for model-free PTQ, including an OmniCoder FP8-Dynamic example, and adds a utility script to generate model.safetensors.index.json files. A new mixed quantization recipe is included, and a run_model.py script is added for testing quantized model generation. Feedback from the review points out a potential issue with an absolute path in the documentation link, a possible bug in the create_safetensors_index.py script regarding unknown data types, and unused imports in run_model.py.

I am having trouble creating individual review comments. Click here to see my feedback.

docs/steps/choosing-algo.md (51)

The link to the mappings registry (/src/llmcompressor/modifiers/awq/mappings.py) is an absolute path from the repository root. This may not resolve correctly when the documentation is built and hosted. It's safer to use a full URL to the file on GitHub to ensure the link is always valid.

To add support for a new model family, supply your own mappings via the `mappings` argument or contribute them to the [mappings registry](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/mappings.py).

examples/model_free_ptq/create_safetensors_index.py (54)

The dtype_sizes dictionary is missing some unsigned integer types (e.g., U16, U32, U64) and some FP8 variants. If an unknown dtype is encountered, element_size defaults to 1, which could lead to an incorrect total_size in the generated index file. It would be safer to explicitly handle unknown dtypes, for example by raising an error or printing a warning.

        element_size = dtype_sizes.get(dtype)
        if element_size is None:
            print(f"Warning: Unknown dtype '{dtype}' found. Assuming size 1. This may result in an incorrect total_size.")
            element_size = 1

run_model.py (3-4)

The imports oneshot and QuantizationModifier are not used in this script. It's good practice to remove unused imports to keep the code clean and avoid confusion.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

brian-dellabetta

Being able to look up by algorithm has been useful for me, especially model_free_ptq and disk_offloading. WDYT about an additional examples/topics folder that symlinks so users can still look up things by algorithm scheme?

example/topics/disk_offloading/kimi_k2_thinking_nvfp4a16.py -> example/models/kimi_k2/w4a16_fp4/kimi_k2_thinking_nvfp4a16.py

dsikka · 2026-03-24T23:21:02Z

Being able to look up by algorithm has been useful for me, especially model_free_ptq and disk_offloading. WDYT about an additional examples/topics folder that symlinks so users can still look up things by algorithm scheme?
example/topics/disk_offloading/kimi_k2_thinking_nvfp4a16.py -> example/models/kimi_k2/w4a16_fp4/kimi_k2_thinking_nvfp4a16.py

Those are different pathways that we can add but the flow of model -> scheme -> algo is how users quantize. What you’re talking about is not algorithm but a potential implementation of that the top flow. Once you know you want to quantize kimi k2 to fp8, you need to consider if you want to use model free ptq or oneshot. Thats the flow we want our users to follow and how our docs are currently set-up. We don’t want the thinking to be “I want to use model_free_ptq, let me try gptq” as that way of thinking is not supported and incorrect

For certain useful guides, such as for ddp or disk offloading, we would maintain top level readmes and could just link to the relevant examples

I don’t think sym links are useful

mergify bot added the documentation Improvements or additions to documentation label Mar 24, 2026

dsikka and others added 2 commits March 24, 2026 19:46

Remove stray run_model.py and recipe_mixed.yaml

785f8e8

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

[Examples] Move model folders into models/ and remove create_safetens…

5d0aa10

…ors_index.py - Move all model-specific folders under examples/models/ - Remove examples/model_free_ptq/create_safetensors_index.py Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

[Examples] Remove model_free_ptq subfolder from model scheme folders

87202c4

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

brian-dellabetta reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Examples] Reorganize examples by model/scheme/algo hierarchy#2510

[Examples] Reorganize examples by model/scheme/algo hierarchy#2510
dsikka wants to merge 4 commits intomainfrom
reorganize_examples_impl

dsikka commented Mar 24, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

brian-dellabetta left a comment

Uh oh!

dsikka commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dsikka commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New structure

Key changes

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

docs/steps/choosing-algo.md (51)

examples/model_free_ptq/create_safetensors_index.py (54)

run_model.py (3-4)

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dsikka commented Mar 24, 2026 •

edited

Loading

dsikka commented Mar 24, 2026 •

edited

Loading