DDP PoC by yiliu30 · Pull Request #19 · yiliu30/llm-compressor-fork

yiliu30 · 2026-02-05T04:51:57Z

SUMMARY:
"please provide a brief summary"

TEST PLAN:
"please outline how the changes were tested"

Signed-off-by: yiliu30 <yi4.liu@intel.com>

github-actions · 2026-02-05T04:52:07Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-02-05T04:52:22Z

Summary of Changes

Hello @yiliu30, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces Distributed Data Parallel (DDP) capabilities to the AutoRound quantization process within the llmcompressor library. This enhancement allows for more efficient quantization of large language models by distributing the computational workload across multiple GPUs. The changes include a new example script showcasing this functionality, modifications to the data loading utilities for DDP compatibility, and the addition of debugging and device management tools to support distributed operations.

Highlights

Distributed Data Parallel (DDP) Integration: Implemented DDP support for the AutoRound quantization modifier, enabling multi-GPU quantization of large language models.
New DDP Example Script: Added experimental/ddp/ddp_qwen3_example.py to demonstrate how to perform AutoRound quantization using DDP for Qwen3 models, including DDP setup, model loading, and quantization execution.
Distributed Data Sampling: Modified the dataset utility in src/llmcompressor/datasets/utils.py to automatically use torch.utils.data.distributed.DistributedSampler when DDP is active, ensuring calibration datasets are correctly distributed across processes.
DDP Debugging and Device Management Utilities: Introduced ForkedPdb for debugging multiprocessing children and helper functions like rank_log and check_device for DDP-aware logging. The apply_autoround method was updated to explicitly move the model to the correct GPU device for each DDP process.

Changelog

examples/quantization_w4a4_fp4/llama3_example.py
- Temporarily updated MODEL_ID to a local Qwen3 model path for testing purposes.
experimental/ddp/README.md
- Added a new README file documenting the DDP + AutoRound Proof of Concept, its current status, and outlining future development steps.
experimental/ddp/ddp_qwen3_example.py
- Added a new example script that demonstrates DDP-enabled AutoRound quantization for Qwen3 models.
- Includes functions for DDP setup and cleanup, deterministic configuration, model and tokenizer loading, calibration dataset preparation, and applying the AutoRound modifier.
- Handles model saving and sample generation specifically on rank 0, while other ranks perform quantization without saving.
src/llmcompressor/datasets/utils.py
- Added an is_ddp function to check if a Distributed Data Parallel environment is active.
- Modified the _make_sampler function to utilize torch.utils.data.distributed.DistributedSampler when DDP is detected, ensuring proper data distribution for calibration.
src/llmcompressor/modifiers/autoround/base.py
- Introduced ForkedPdb, a pdb subclass for debugging in forked multiprocessing environments.
- Added rank_log and check_device utility functions for logging messages and model device information in a DDP-aware manner.
- Updated the apply_autoround method to explicitly move the wrapped_model to the current GPU rank, ensuring correct device placement during multi-GPU quantization.

Activity

The author yiliu30 created this pull request with a minimal title 'Ddp' and placeholder descriptions for the summary and test plan.
The changes introduce significant new functionality for Distributed Data Parallel (DDP) support in AutoRound quantization.
The experimental/ddp/README.md indicates that a Proof of Concept (PoC) implementation is complete and functional, with clear next steps for further development and verification.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a proof-of-concept for Distributed Data Parallel (DDP) support, primarily for AutoRound quantization. It adds a new DDP example, modifies dataset utilities to support distributed sampling, and updates the AutoRound modifier to be DDP-aware.

My review focuses on cleaning up the proof-of-concept code. I've identified several instances of dead or commented-out code in the new example files that should be removed for clarity. I also suggest replacing a print statement with a proper logger call in the AutoRoundModifier and removing an unused debugging utility class. These changes will improve the code's readability and maintainability as it moves from a PoC to a more permanent feature.

gemini-code-assist · 2026-02-05T04:58:21Z

examples/quantization_w4a4_fp4/llama3_example.py

 MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
-
+MODEL_ID = "/storage/yiliu7/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8"


The MODEL_ID variable is defined on line 8 and then immediately overwritten on line 9. The first assignment is now dead code and can be confusing. Please remove the unused assignment.

Suggested change

MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

MODEL_ID = "/storage/yiliu7/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8"

MODEL_ID = "/storage/yiliu7/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8"

gemini-code-assist · 2026-02-05T04:58:21Z

experimental/ddp/ddp_qwen3_example.py

+model_id = "Qwen/Qwen3-235B-A22B"
+model_id = "Qwen/Qwen3-8B"


The model_id variable is defined on line 14 and then immediately overwritten on line 15. The first assignment is dead code. Please remove it to improve clarity.

Suggested change

model_id = "Qwen/Qwen3-235B-A22B"

model_id = "Qwen/Qwen3-8B"

model_id = "Qwen/Qwen3-8B"

gemini-code-assist · 2026-02-05T04:58:21Z

experimental/ddp/ddp_qwen3_example.py

+    # except Exception as e:
+    #     logger.info(f"[Rank {rank}] Error during quantization: {e}")
+    #     raise
+
+    # finally:
+    #     # Cleanup DDP


This commented-out try-except-finally block appears to be leftover from debugging. It should be removed to clean up the code.

gemini-code-assist · 2026-02-05T04:58:21Z

experimental/ddp/ddp_qwen3_example.py

+    # Parse scheme from string if needed
+    from auto_round import schemes as ar_schemes
+
+    scheme_map = {
+        "FP8_STATIC": ar_schemes.FP8_STATIC,
+        "MXFP8": ar_schemes.MXFP8,
+        "MXFP4": ar_schemes.MXFP4,
+    }
+    # scheme = scheme_map.get(args.scheme, args.scheme)


This block for parsing the scheme seems to be unused as the line that would use scheme_map is commented out. This makes the import and the map itself dead code. Please remove this block if it's not needed.

gemini-code-assist · 2026-02-05T04:58:21Z

experimental/ddp/ddp_qwen3_example.py

+    # # Check if running with torchrun
+    # if "RANK" in os.environ and "WORLD_SIZE" in os.environ:
+    #     logger.info("Detected torchrun environment")
+    #     main_torchrun(model_name, scheme, args.iters, args.nsamples)


This commented-out block for handling torchrun is unused. It should be removed to improve code clarity.

gemini-code-assist · 2026-02-05T04:58:21Z

src/llmcompressor/modifiers/autoround/base.py

+import pdb
+import sys
+
+
+class ForkedPdb(pdb.Pdb):
+    """A Pdb subclass that may be used
+    from a forked multiprocessing child
+
+    """
+
+    def interaction(self, *args, **kwargs):
+        _stdin = sys.stdin
+        try:
+            sys.stdin = open("/dev/stdin")
+            pdb.Pdb.interaction(self, *args, **kwargs)
+        finally:
+            sys.stdin = _stdin
+


The ForkedPdb class and its imports (pdb, sys) are defined but not used within this file, making it dead code. This appears to be a debugging utility and should be removed from the production codebase.

gemini-code-assist · 2026-02-05T04:58:22Z

src/llmcompressor/modifiers/autoround/base.py

+            kwargs["device_map"] = (
+                f"cuda:{rank}" if torch.cuda.is_available() else "cpu"
+            )
+            print(f"AutoRoundModifier: moving wrapped_model to {kwargs['device_map']}", flush=True)


A print statement is used for logging. It's better to use the existing logging framework for consistency and better control over log levels and output. Please replace this with a call to rank_log.

Suggested change

print(f"AutoRoundModifier: moving wrapped_model to {kwargs['device_map']}", flush=True)

rank_log(f"AutoRoundModifier: moving wrapped_model to {kwargs['device_map']}")

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 added 14 commits January 25, 2026 16:45

poc ddp

e472d66

Signed-off-by: yiliu30 <yi4.liu@intel.com>

format

6600d37

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add dpp example

245ba82

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update example

aa763cf

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix disptach device

ea65ef7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix CR

93871db

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

b7555c7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add doc

cd685df

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

93e854d

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update

c9a8a07

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

019ec57

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

e04c4d1

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine code

83c327e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

merge

5e442b1

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 changed the title ~~Ddp~~ DDP PoC Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

yiliu30 added 8 commits February 5, 2026 08:17

clean code

d0d415e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

05648c0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

82718d6

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean

60adfb7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update

428e978

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update

3bfe80c

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update

0b7dc03

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

8c9ead3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 closed this Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP PoC#19

DDP PoC#19
yiliu30 wants to merge 22 commits intomainfrom
ddp

yiliu30 commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

gemini-code-assist bot Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

		MODEL_ID = "/storage/yiliu7/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8"

	model_id = "Qwen/Qwen3-235B-A22B"
	model_id = "Qwen/Qwen3-8B"
	model_id = "Qwen/Qwen3-8B"

	print(f"AutoRoundModifier: moving wrapped_model to {kwargs['device_map']}", flush=True)
	rank_log(f"AutoRoundModifier: moving wrapped_model to {kwargs['device_map']}")

Conversation

yiliu30 commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant