Skip to content

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Oct 7, 2025

Purpose

  • FP4
    • Fix bug discovered here where dynamic="local" nvfp4 calculations would increment the observer twice as fast as normal
    • Enable MSE observer to be used with FP4
  • Simplification
    • Make supporting attention calibration easier by separating out weight/activation/attention reshaping
    • Improve readability of observer codes by removing many levels of function indirection
    • Drop support for calibration with non-divisible group sizes. This is not really a loss, since forward passes also make this assumption
Before After
before after

Changes

  • Standardize reshaping using flatten_for_calibration
    • This function reshapes all observed values to (num_observations, *qparams_shape, group_size)
    • This function the complexity associated with passing "reduce dims" and trying to handle weights, activations, and attention states all in the same function
    • In the future, this function could be applied to the quantization forward pass, although there's probably no need to outside of standardization
  • Implement get_global_scale on Observer base
    • This function decouples minmax calculations from regular qparam calculations (avoiding the double increment bug)
    • This function enables the MSE observer to be used with FP4 global scales

Testing

  • Added additional minmax tests which check exact values of scales. This test passes both on main and this branch, demonstrating that minmax observer behavior remains unchanged
  • Added additional MSE tests which check exact values of mse losses. This test passes both on main and this branch, demonstrating that MSE observer behavior remains unchanged

Copy link

github-actions bot commented Oct 7, 2025

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/observers-refactor branch from 0027707 to 79c7e86 Compare October 7, 2025 21:41
@kylesayrs kylesayrs changed the title [Observers] Refactor to fix gparam bug, support attention, readability [Observers] Refactor for better FP4 support, easier attention support Oct 7, 2025
@kylesayrs kylesayrs marked this pull request as ready for review October 7, 2025 21:46
Signed-off-by: Kyle Sayers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant