Skip to content

[Examples] Reorganize examples by model/scheme/algo hierarchy#2510

Draft
dsikka wants to merge 4 commits intomainfrom
reorganize_examples_impl
Draft

[Examples] Reorganize examples by model/scheme/algo hierarchy#2510
dsikka wants to merge 4 commits intomainfrom
reorganize_examples_impl

Conversation

@dsikka
Copy link
Collaborator

@dsikka dsikka commented Mar 24, 2026

Summary

Restructures the examples/ directory from an inconsistent mix of scheme-first, algo-first, and model-type-first organization into a consistent model → scheme → algo hierarchy under a new examples/models/ folder.

NOTE: this structure is currently a wip as I still need to restructure the top level readmes / update their content.

New structure

examples/
├── models/                          # all model-specific examples
│   ├── llama3/w4a16/awq/
│   ├── llama4/w8a8_fp8/autoround/
│   ├── qwen3/w4a16/awq/
│   ├── ...                          # 26 model folders total
├── ddp/                             # distributed examples by model/scheme/algo
├── model_free_ptq/                  # README only (examples moved into models/)
├── quantization_kv_cache/           # kept as-is
├── quantization_non_uniform/        # kept as-is
├── multimodal_vision/               # README only
├── multimodal_audio/                # README only
├── quantizing_moe/                  # README only
├── big_models_with_sequential_onloading/
├── compressed_inference/
├── disk_offloading/
├── sparse_2of4_quantization_fp8/
└── transform/

Key changes

  • 26 model folders created under examples/models/: llama3, llama4, qwen1.5, qwen2, qwen2.5, qwen3, qwen3_next, qwen3.5, gemma2, gemma3, medgemma, phi3, granite4, mistral3, pixtral, mixtral, deepseek_r1, glm4, glm5, llava, idefics3, internvl3, whisper, gpt_oss, kimi_k2, omnicoder
  • Algo subfolders (awq/, autoround/) appear under scheme folders where applicable
  • Top-level ddp/ folder consolidates all distributed training examples
  • model_free_ptq/ examples moved into their respective model folders under models/ directly under the scheme folder (no model_free_ptq/ subfolder)
  • Old scheme-first folders removed: awq/, autoround/, quantization_w4a16/, quantization_w4a16_fp4/, quantization_w4a4_fp4/, quantization_w4a8/, quantization_w4a8_fp8/, quantization_w8a8_fp8/, quantization_w8a8_int8/, quantizing_moe/ (python files only)
  • AWQ and AutoRound README content moved into docs/steps/choosing-algo.md
  • autoround/quantization_w4a4_fp4/README.md split into separate llama3 and qwen3 READMEs
  • granite4/ and internvl3/ READMEs moved into their respective model/scheme folders

🤖 Generated with Claude Code

- Restructure all examples into model-first folders (llama3, llama4,
  qwen1.5, qwen2, qwen2.5, qwen3, qwen3_next, qwen3.5, gemma2, gemma3,
  medgemma, phi3, granite4, mistral3, pixtral, mixtral, deepseek_r1,
  glm4, glm5, llava, idefics3, internvl3, whisper, gpt_oss, kimi_k2,
  omnicoder) with scheme/ then algo/ subfolders
- Add top-level ddp/ folder for all distributed examples
- Keep quantization_kv_cache/, quantization_non_uniform/, transform/,
  compressed_inference/, disk_offloading/, big_models_with_sequential_onloading/,
  sparse_2of4_quantization_fp8/, model_free_ptq/ as-is or README-only
- Keep multimodal_vision/, multimodal_audio/, quantizing_moe/ with README only
- Split model_free_ptq examples into respective model folders
- Move autoround/quantization_w4a4_fp4/README.md into separate
  llama3 and qwen3 READMEs
- Move AWQ and AutoRound README content into docs/steps/choosing-algo.md
- Remove old scheme-first folder structure

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 24, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the examples directory by implementing a new model-first hierarchical structure. This change aims to enhance the clarity and organization of examples, making it much easier for users to locate and understand quantization implementations for specific models and schemes. Additionally, it centralizes key documentation for AWQ and AutoRound, providing a single source of truth for these techniques, and introduces new utility scripts and examples to support the updated structure.

Highlights

  • Example Reorganization: Restructured all examples into a model-first folder hierarchy (e.g., llama3, qwen1.5), with scheme and algorithm subfolders for improved navigation.
  • Distributed Examples Consolidation: Introduced a top-level ddp/ folder to centralize all distributed examples.
  • Documentation Centralization: Moved detailed content for AWQ and AutoRound quantization methods from individual READMEs into the central docs/steps/choosing-algo.md documentation.
  • Model-Free PTQ Integration: Split model_free_ptq examples and integrated them into their respective model folders, enhancing discoverability.
  • Legacy Structure Removal: Removed the old scheme-first folder structure, streamlining the examples directory.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

dsikka and others added 2 commits March 24, 2026 19:46
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…ors_index.py

- Move all model-specific folders under examples/models/
- Remove examples/model_free_ptq/create_safetensors_index.py

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly updates the documentation for AWQ and AutoRound quantization, consolidating details into docs/steps/choosing-algo.md and removing older, separate READMEs. It also introduces new examples for model-free PTQ, including an OmniCoder FP8-Dynamic example, and adds a utility script to generate model.safetensors.index.json files. A new mixed quantization recipe is included, and a run_model.py script is added for testing quantized model generation. Feedback from the review points out a potential issue with an absolute path in the documentation link, a possible bug in the create_safetensors_index.py script regarding unknown data types, and unused imports in run_model.py.

I am having trouble creating individual review comments. Click here to see my feedback.

docs/steps/choosing-algo.md (51)

medium

The link to the mappings registry (/src/llmcompressor/modifiers/awq/mappings.py) is an absolute path from the repository root. This may not resolve correctly when the documentation is built and hosted. It's safer to use a full URL to the file on GitHub to ensure the link is always valid.

To add support for a new model family, supply your own mappings via the `mappings` argument or contribute them to the [mappings registry](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/mappings.py).

examples/model_free_ptq/create_safetensors_index.py (54)

medium

The dtype_sizes dictionary is missing some unsigned integer types (e.g., U16, U32, U64) and some FP8 variants. If an unknown dtype is encountered, element_size defaults to 1, which could lead to an incorrect total_size in the generated index file. It would be safer to explicitly handle unknown dtypes, for example by raising an error or printing a warning.

        element_size = dtype_sizes.get(dtype)
        if element_size is None:
            print(f"Warning: Unknown dtype '{dtype}' found. Assuming size 1. This may result in an incorrect total_size.")
            element_size = 1

run_model.py (3-4)

medium

The imports oneshot and QuantizationModifier are not used in this script. It's good practice to remove unused imports to keep the code clean and avoid confusion.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to look up by algorithm has been useful for me, especially model_free_ptq and disk_offloading. WDYT about an additional examples/topics folder that symlinks so users can still look up things by algorithm scheme?

example/topics/disk_offloading/kimi_k2_thinking_nvfp4a16.py -> example/models/kimi_k2/w4a16_fp4/kimi_k2_thinking_nvfp4a16.py

@dsikka
Copy link
Collaborator Author

dsikka commented Mar 24, 2026

Being able to look up by algorithm has been useful for me, especially model_free_ptq and disk_offloading. WDYT about an additional examples/topics folder that symlinks so users can still look up things by algorithm scheme?

example/topics/disk_offloading/kimi_k2_thinking_nvfp4a16.py -> example/models/kimi_k2/w4a16_fp4/kimi_k2_thinking_nvfp4a16.py

Those are different pathways that we can add but the flow of model -> scheme -> algo is how users quantize. What you’re talking about is not algorithm but a potential implementation of that the top flow. Once you know you want to quantize kimi k2 to fp8, you need to consider if you want to use model free ptq or oneshot. Thats the flow we want our users to follow and how our docs are currently set-up. We don’t want the thinking to be “I want to use model_free_ptq, let me try gptq” as that way of thinking is not supported and incorrect

For certain useful guides, such as for ddp or disk offloading, we would maintain top level readmes and could just link to the relevant examples

I don’t think sym links are useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants