Skip to content

Deprecate is_rank0#2468

Open
kylesayrs wants to merge 5 commits intomainfrom
kylesayrs/deprecate-isrank0
Open

Deprecate is_rank0#2468
kylesayrs wants to merge 5 commits intomainfrom
kylesayrs/deprecate-isrank0

Conversation

@kylesayrs
Copy link
Copy Markdown
Collaborator

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the llmcompressor codebase by deprecating the is_rank0 function in favor of is_main_process for better consistency with the accelerate library. It also involves a significant refactoring of compression logic, moving the compress_module function to a more central location within compressed_tensors. Crucially, support for sparse compression has been removed from llmcompressor, leading to a cleaner and more focused codebase, reflected in the substantial simplification of the test suite, particularly for model loading and decompression scenarios.

Highlights

  • Deprecation of is_rank0: The is_rank0 utility function has been deprecated and replaced with is_main_process for improved clarity and alignment with accelerate library conventions.
  • Refactoring of Compression Logic: The compress_module function has been moved from llmcompressor.entrypoints.model_free.lifecycle to compressed_tensors.compressors, centralizing core compression functionality.
  • Removal of Sparse Compression Support: Sparse compression is no longer supported within llmcompressor's compressed_tensors_utils, and related inference logic and tests have been removed. A warning is now logged if sparse compression parameters are provided.
  • Test Suite Simplification: Numerous tests related to sparse compression and complex model loading/decompression scenarios have been removed or simplified, streamlining the test suite and focusing on current quantization capabilities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/llmcompressor/entrypoints/model_free/lifecycle.py
    • Removed imports for BaseCompressor and _get_quant_compression_format.
    • Removed compress_module from the module's __all__ export list.
    • Deleted the compress_module function, relocating its functionality.
  • src/llmcompressor/entrypoints/model_free/process.py
    • Added import for compress_module from compressed_tensors.compressors.
    • Removed import for compress_module from llmcompressor.entrypoints.model_free.lifecycle.
  • src/llmcompressor/transformers/compression/compressed_tensors_utils.py
    • Removed import for get_state_dict_offloaded_model from accelerate.accelerator.
    • Replaced import of is_rank0 with is_main_process from compressed_tensors.offload.
    • Removed import for SparsityConfigMetadata.
    • Updated usage of is_rank0() to is_main_process().
    • Refactored get_model_compressor to remove all sparse compression related logic and added a warning for unsupported sparse compression.
    • Removed sparsity_config_or_format argument from ModelCompressor.from_pretrained_model call.
  • tests/llmcompressor/transformers/compression/decompression_configs/fp8_dynamic.yaml
    • Removed skeleton_model_stub entry.
  • tests/llmcompressor/transformers/compression/decompression_configs/w4a16.yaml
    • Removed skeleton_model_stub entry.
  • tests/llmcompressor/transformers/compression/decompression_configs/w8a16_dense.yaml
    • Removed skeleton_model_stub entry.
  • tests/llmcompressor/transformers/compression/decompression_configs/w8a8.yaml
    • Removed skeleton_model_stub entry.
  • tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py
    • Removed imports for math, CompressionFormat, ModelCompressor, BitmaskConfig, DenseSparsityConfig, reset_session, tensor_sparsity, get_model_compressor, and SparsityConfigMetadata.
    • Added import for infer_model_format from compressed_tensors.compressors.format.
    • Removed test_sparse_model_reload, test_dense_model_save, test_compressor_stacking, test_sparse_24_compressor_is_lossless, and test_disable_sparse_compression_flag test functions.
    • Updated test_quant_model_reload to use torch.bfloat16 for dense format, adjusted calibration splits, simplified quant_config parsing, and added _remove_zp helper function.
    • Simplified DummyLinearModel.__init__ to directly assign weight_scale and zero_point parameters.
    • Removed _make_24_sparse helper function.
    • Updated test_correct_compressor_inferred to remove is_24 parameter and related sparsity checks, now asserting against infer_model_format.
  • tests/llmcompressor/transformers/compression/test_decompress.py
    • Removed imports for copy, QUANTIZATION_CONFIG_NAME, QuantizationStatus, and AutoConfig.
    • Simplified test_hf_quantizer_decompress_match_manual_decompress by removing skeleton_model_stub and manual decompression steps, directly comparing generations from hf_quantizer_model and manual_model.
  • tests/llmcompressor/transformers/compression/test_quantization.py
    • Modified _get_quant_info to skip zero points and simplify the returned tuples for weight and input quantization information.
    • Adjusted assertions in test_quantization_reload to align with the changes in _get_quant_info, specifically removing zero-point checks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kylesayrs kylesayrs changed the base branch from main to kylesayrs/remove-sparse-compression March 12, 2026 22:17
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the codebase to align with changes in the compressed-tensors library. The main changes include deprecating is_rank0 in favor of is_main_process and removing support for sparsity compression, which is now reflected by a warning. The tests have been significantly updated to remove sparsity-related checks and adapt to the new APIs.

I've found a couple of issues in the test suite updates. One is a bug in test_quant_model_reload where a function call has no effect due to not using its return value. The other is a potential robustness issue in the DummyLinearModel test helper. Please see my detailed comments.

I am having trouble creating individual review comments. Click here to see my feedback.

tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py (90)

high

The function _remove_zp returns a new dictionary and does not modify its argument in-place. The result of _remove_zp(og_state_dict) is not assigned to any variable, so this line has no effect. This will likely cause the assertion on line 92 to fail if og_state_dict contains zero-point keys that are not present in reconstructed_state_dict.

You should reassign the result back to og_state_dict.

    og_state_dict = _remove_zp(og_state_dict)  # HACK: remove extra zero points added during quant init

tests/llmcompressor/transformers/compression/test_compress_tensor_utils.py (175-176)

medium

The removal of the is not None checks for weight_scale and zero_point makes this test utility less robust. If None is passed for these arguments, nn.Parameter(None) will be created. This changes the behavior from the attribute not existing (as in the previous implementation) to the attribute existing with a None value. This could lead to unexpected behavior in code that checks for the presence of these attributes. It would be safer to restore the if ... is not None checks.

        if weight_scale is not None:
            self.linear.weight_scale = nn.Parameter(weight_scale, requires_grad=False)
        if zero_point is not None:
            self.linear.weight_zero_point = nn.Parameter(zero_point, requires_grad=False)

@kylesayrs kylesayrs force-pushed the kylesayrs/remove-sparse-compression branch from e29a4f1 to e25b87b Compare March 13, 2026 17:29
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 13, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 13, 2026
Copy link
Copy Markdown
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a PR for the is_rank0 changes and another doing the refactor stuff?

Base automatically changed from kylesayrs/remove-sparse-compression to main March 17, 2026 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants