Skip to content

Feat (brevitas_examples/llm): support for fully custom quantizers#1454

Closed
Giuseppe5 wants to merge 6 commits intoXilinx:devfrom
Giuseppe5:custom_quantizer
Closed

Feat (brevitas_examples/llm): support for fully custom quantizers#1454
Giuseppe5 wants to merge 6 commits intoXilinx:devfrom
Giuseppe5:custom_quantizer

Conversation

@Giuseppe5
Copy link
Collaborator

Reason for this PR

Currently, every time we need to support a new quantization type, we need to modify our entry-point in several ways (e.g., new args, new option in the dict), and this process does not scale up with more advanced quantization schemes.

Changes Made in this PR

This PR allows the user to specify a file with custom quantizers to use for our LLM entrypoint.
The user can optionally specify up to seven quantizers:

  • weight_quantizer
  • input_linear_quantizer: quantizer used specifically in linear layers
  • input_quant: for all other layers that are not linear (e.g., if there are conv in the network)
  • q_scaled_quant, k_transposed_quant, v_quant: quantizers for QKV in scaled dot product
  • attn_output_weights_quant: quantization for the output of sigmoid of scaled dot product

These quantizers should be put in a dict, using they specified above.
If any of the keys is not specified, then the quantizer is set to None (equivalent to no quantization)

Testing Summary

TBD

@Giuseppe5 Giuseppe5 changed the title Custom quantizer Feat (brevitas_examples/llm): support for fully custom quantizers Feb 11, 2026
@Giuseppe5 Giuseppe5 requested a review from nickfraser February 11, 2026 18:48
@Giuseppe5 Giuseppe5 added the next release PRs which should be merged for the next release label Feb 11, 2026
@Giuseppe5 Giuseppe5 self-assigned this Feb 11, 2026
@Giuseppe5 Giuseppe5 closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

next release PRs which should be merged for the next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant