Skip to content

[model_free_ptq] build job cleanup#2545

Merged
brian-dellabetta merged 19 commits intomainfrom
bdellabe/model-free-ptq-cleanup
Mar 31, 2026
Merged

[model_free_ptq] build job cleanup#2545
brian-dellabetta merged 19 commits intomainfrom
bdellabe/model-free-ptq-cleanup

Conversation

@brian-dellabetta
Copy link
Copy Markdown
Collaborator

SUMMARY:
Follow-up to #2498 and pre-cursor to landing #2491.

This PR cleans up a few things:

  • Use the same function signature for building standard jobs, microscale jobs, and validation jobs. These will be needed in DeepSeek V3.2 support #2491.
  • Renamed microscale-specific build_inverse_weights_map -> build_microscale_inverse_weights_map because other reindexing logic will need different functionality when determining fused tensors.
  • Prunes unused _get_all_tensor_names
  • Breaks out loading logic for inverse_weights_map to a helper that can be moved to CT in follow-up DeepSeek V3.2 support #2491

TEST PLAN:
No net new functionality, if all tests pass should be good to go

dzhengAP and others added 12 commits March 25, 2026 01:19
… reads

Each shard is processed independently with full parallelism. When fused
weight sets (q/k/v, gate/up) span multiple shards, only the specific
partner tensors needed for global scale fusion are fetched via targeted
partial safetensors reads using safe_open.

- build_weights_map(): maps tensor names to source files via index.json
- _fetch_fused_partners(): partial reads of only fused partner tensors
- validate_file(): add optional weights_map param for future use
- One job per shard, no grouping, no cross-process coordination required
- validate.py: remove NotImplementedError, cross-shard handled natively

Closes #2497

Signed-off-by: David Zheng <dqzheng1996@gmail.com>
- Always use inverse_weights_map dict format for all jobs
- Standard jobs: {resolved_path: None} = load all tensors
- Microscale jobs: {src: [tensors]} = selective loading with cross-shard partners
- Update process_file and validate_file to accept inverse_weights_map format
- Add backward compatibility: isinstance check BEFORE .keys() call in both functions
- Move build_inverse_weights_map to microscale.py for better code organization (per Brian's feedback)
- Fix _build_validate_jobs to pass inverse_weights_map dict
- Update build_inverse_weights_map to handle empty weight_map
- Fix imports: get_checkpoint_files, is_weights_file from compressed_tensors.entrypoints.convert.file_utils
- Add match_name helper with 're:' regex pattern support to microscale.py
- Fix __all__ syntax in microscale.py
- Remove non-existent update_safetensors_index import and call
- Fix test imports, argument order, shard names, and ALL assertions to match new inverse_weights_map return format

Testing:
- pytest tests/llmcompressor/entrypoints/model_free/ -v: 16 passed, 1 skipped
- make style && make quality: all checks pass

Reviewer Feedback:
- Brian: Unified signature, inverse_weights_map per-job scope, single interface; move build_inverse_weights_map to microscale.py
- Kyle: Precomputed map, safe_open partial reads, partner re-saved
- Gemini: Single-file fallback, top-level imports, simplified discovery

Breaking Changes: None — internal refactoring only. Public API unchanged.

Signed-off-by: David Zheng <dqzheng1996@gmail.com>
Signed-off-by: David Zheng <dqzheng1996@gmail.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the model-free PTQ entry point by unifying job construction for different quantization schemes and centralizing tensor loading logic into a new helper function. Key changes include renaming microscale-specific functions for clarity and updating tests to match the new API. Feedback identifies efficiency issues in the tensor loading loop and potential OOM risks when loading directly to a GPU, as well as a typo in a docstring.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@mergify mergify bot removed the quality-failed label Mar 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 30, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to adddress the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/llm-compressor/blob/main/CONTRIBUTING.md

brian-dellabetta and others added 2 commits March 30, 2026 17:39
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@mergify mergify bot removed the quality-failed label Mar 30, 2026
@brian-dellabetta brian-dellabetta added the ready When a PR is ready for review label Mar 31, 2026
@brian-dellabetta brian-dellabetta merged commit 031d912 into main Mar 31, 2026
13 of 15 checks passed
@brian-dellabetta brian-dellabetta deleted the bdellabe/model-free-ptq-cleanup branch March 31, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants