Skip to content

Refactor logging, CompressionLogger, support distributed#2408

Merged
kylesayrs merged 5 commits intomainfrom
kylesayrs/better-compression-logger
Mar 5, 2026
Merged

Refactor logging, CompressionLogger, support distributed#2408
kylesayrs merged 5 commits intomainfrom
kylesayrs/better-compression-logger

Conversation

@kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Feb 25, 2026

Purpose

  • Remove misleading information about module size after compression
  • Support loguru logging which logs which rank logs come from
  • Support compression logging that is specific to distributed workloads

Changes

  • Refactor CompressionLogger
    • Remove nvidia/amd logic, instead just use cuda interface
      • This already accounts for "CUDA/AMD_VISIBLE_DEVICES", no need to hard code these env variables
    • Remove "module size" log, which is misleading, as the module size does not actually change as optimization occurs (qdq)
    • Limit devices to just the current device in distributed cases
  • Refactor loguru logger configuration
    • configure_logger can now be called multiple times
    • When oneshot occurs, configure_logger is called again with the rank set
    • Logger now prints rank if applicable

Testing

Single-thread

2026-02-25T17:04:36.8189 | compress_module_list | INFO - Quantizing model.layers.0.mlp.gate_proj using 512 samples
2026-02-25T17:04:38.5924 | GPTQ | METRIC - time 1.77s
2026-02-25T17:04:38.5926 | GPTQ | METRIC - error 663.60
2026-02-25T17:04:38.5932 | GPTQ | METRIC - GPU 0 | usage: 4.45% | total memory: 85.1 GB
2026-02-25T17:04:38.5933 | GPTQ | METRIC - GPU 1 | usage: 0.00% | total memory: 85.1 GB

Distributed

[Rank 1] 2026-02-25T17:10:18.8569 | compress_module_list | INFO - Quantizing model.layers.2.self_attn.o_proj using 512 samples
[Rank 1] 2026-02-25T17:10:20.4585 | GPTQ | METRIC - time 1.60s
[Rank 1] 2026-02-25T17:10:20.4586 | GPTQ | METRIC - error 1.27
[Rank 1] 2026-02-25T17:10:20.4593 | GPTQ | METRIC - GPU 1 | usage: 4.45% | total memory: 85.1 Gb
[Rank 1] 2026-02-25T17:10:20.4637 | compress_module_list | INFO - Quantizing model.layers.2.mlp.up_proj using 512 samples
[Rank 0] 2026-02-25T17:10:20.7379 | GPTQ | METRIC - time 6.59s
[Rank 0] 2026-02-25T17:10:20.7381 | GPTQ | METRIC - error 7.45
[Rank 0] 2026-02-25T17:10:20.7401 | GPTQ | METRIC - GPU 0 | usage: 5.98% | total memory: 85.1 Gb
[Rank 0] 2026-02-25T17:10:20.7590 | compress_module_list | INFO - Quantizing model.layers.2.mlp.gate_proj using 512 samples

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the CompressionLogger utility to streamline GPU memory monitoring and enhance its compatibility with distributed workloads. It removes hardware-specific GPU monitoring logic in favor of a unified torch.cuda interface and ensures that only relevant device metrics are logged in distributed settings. Additionally, a potentially misleading metric related to compressed module size has been removed for clarity.

Highlights

  • Simplified GPU Memory Logging: The CompressionLogger now directly utilizes torch.cuda functions for GPU memory metrics, eliminating the need for separate NVIDIA (pynvml) and AMD (amdsmi) specific implementations.
  • Distributed Workload Support: The logging mechanism has been enhanced to correctly identify and monitor only the current device when operating in a distributed environment, preventing redundant logging across multiple processes.
  • Removed Misleading Module Size Metric: The logging of 'Compressed module size' was removed as it provided potentially misleading information regarding the module's actual size post-compression.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/llmcompressor/utils/metric_logging.py
    • Removed GPUMemory namedtuple, GPUType enum, and get_layer_size_mb function.
    • Removed os import and added Iterable and is_distributed imports.
    • Refactored CompressionLogger initialization to remove GPU type detection and visible device parsing.
    • Updated CompressionLogger.__exit__ to directly query torch.cuda for memory usage.
    • Removed get_GPU_memory_usage, _get_GPU_usage_nv, and _get_GPU_usage_amd methods.
    • Introduced _get_visible_devices helper function to determine devices to monitor based on distributed status.
    • Removed logging of 'Compressed module size'.
Activity
  • No specific activity (comments, reviews, progress) was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the CompressionLogger by removing hardware-specific GPU monitoring code in favor of the torch.cuda interface and adds support for distributed environments. These changes are a good step towards simplification and better maintainability. However, I've identified a critical issue in the new memory usage logging logic which causes incorrect metrics to be reported and can lead to a ZeroDivisionError. I've provided a detailed comment with a suggested fix.

@kylesayrs kylesayrs marked this pull request as ready for review February 25, 2026 22:17
@kylesayrs kylesayrs changed the title Simplify CompressionLogger, support distributed Refactor logging, CompressionLogger, support distributed Feb 25, 2026
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner! Didn't know you could do this all through torch.cuda API

Copy link
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment, otherwise looks good

@brian-dellabetta
Copy link
Collaborator

brian-dellabetta commented Feb 27, 2026

Tried running GPTQ on this branch, on amd device. output looks good

Preparing cache: 100%|███████████████████████████████████████████████████████████| 512/512 [00:00<00:00, 1222.47it/s]
(1/33): Calibrating: 100%|█████████████████████████████████████████████████████████| 512/512 [00:13<00:00, 37.81it/s]
2026-02-27T22:02:21.4799 | compress_module_list | INFO - Quantizing model.layers.0.self_attn.q_proj using 512 samples
2026-02-27T22:02:29.4214 | GPTQ | METRIC - time 7.94s
2026-02-27T22:02:29.4216 | GPTQ | METRIC - error 1121.90
2026-02-27T22:02:29.4218 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:29.4230 | compress_module_list | INFO - Quantizing model.layers.0.self_attn.k_proj using 512 samples
2026-02-27T22:02:30.5821 | GPTQ | METRIC - time 1.16s
2026-02-27T22:02:30.5823 | GPTQ | METRIC - error 593.86
2026-02-27T22:02:30.5825 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:30.5830 | compress_module_list | INFO - Quantizing model.layers.0.self_attn.v_proj using 512 samples
2026-02-27T22:02:31.7442 | GPTQ | METRIC - time 1.16s
2026-02-27T22:02:31.7443 | GPTQ | METRIC - error 17.22
2026-02-27T22:02:31.7445 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:31.7450 | compress_module_list | INFO - Quantizing model.layers.0.self_attn.o_proj using 512 samples
2026-02-27T22:02:32.9174 | GPTQ | METRIC - time 1.17s
2026-02-27T22:02:32.9175 | GPTQ | METRIC - error 0.31
2026-02-27T22:02:32.9177 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:32.9187 | compress_module_list | INFO - Quantizing model.layers.0.mlp.gate_proj using 512 samples
2026-02-27T22:02:34.2249 | GPTQ | METRIC - time 1.31s
2026-02-27T22:02:34.2251 | GPTQ | METRIC - error 663.50
2026-02-27T22:02:34.2253 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:34.2282 | compress_module_list | INFO - Quantizing model.layers.0.mlp.up_proj using 512 samples
2026-02-27T22:02:35.5349 | GPTQ | METRIC - time 1.31s
2026-02-27T22:02:35.5351 | GPTQ | METRIC - error 526.16
2026-02-27T22:02:35.5353 | GPTQ | METRIC - GPU 0 | usage: 1.87% | total memory: 206.1 Gb
2026-02-27T22:02:35.5382 | compress_module_list | INFO - Quantizing model.layers.0.mlp.down_proj using 512 samples
2026-02-27T22:02:40.8984 | GPTQ | METRIC - time 5.36s
2026-02-27T22:02:40.8985 | GPTQ | METRIC - error 1.87
2026-02-27T22:02:40.8988 | GPTQ | METRIC - GPU 0 | usage: 2.53% | total memory: 206.1 Gb
(1/33): Propagating: 100%|████████████████████████████████████████████████████████| 512/512 [00:02<00:00, 182.29it/s]

@mergify
Copy link
Contributor

mergify bot commented Mar 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 2, 2026
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@kylesayrs kylesayrs force-pushed the kylesayrs/better-compression-logger branch from 04ca9bc to 0bcef4f Compare March 5, 2026 15:06
@kylesayrs kylesayrs added the ready When a PR is ready for review label Mar 5, 2026
@mergify mergify bot removed the needs-rebase label Mar 5, 2026
@kylesayrs kylesayrs merged commit 6d73ce6 into main Mar 5, 2026
13 of 18 checks passed
@kylesayrs kylesayrs deleted the kylesayrs/better-compression-logger branch March 5, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review