fix(embedding): restore default FP16Clip transform for automodel by vbaddi · Pull Request #881 · quic/efficient-transformers

vbaddi · 2026-03-24T13:31:29Z

This PR restores FP16ClipTransform for embedding models (QEFFAutoModel) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories.

What changed

Added per-model support for always-on ONNX transforms in proxy configuration.
Set embedding models to always keep FP16ClipTransform enabled by default.
Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled.

Tests verified

python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

quic-rishinr · 2026-03-25T06:26:52Z

QEfficient/transformers/models/modeling_auto.py

    _hf_auto_class = AutoModel
    _pytorch_transforms = [CustomOpsTransform, AwqToMatmulNbitsTransform, GPTQToMatmulNbitsTransform]
    _onnx_transforms = [FP16ClipTransform, SplitTensorsTransform]
+    _always_on_onnx_transforms = (FP16ClipTransform,)


Can we reuse the _onnx_transforms itself? we can remove [FP16ClipTransform, SplitTensorsTransform] for all auto class and maintain only [FP16ClipTransform] for Embedding model

We cannot remove it fully, we want this transforms to be applied for proxy model, maybe i can try to unify and use hard fp16 clip for embedding using same list., Will update! thanks

I updated this to reuse _onnx_transforms as the source of truth and removed the separate embedding “always-on” path. _proxy_only_onnx_transforms, so embedding keeps FP16ClipTransform from its own _onnx_transforms and only marks SplitTensorsTransform as proxy-only.

If we only use _onnx_transforms, proxy-off keeps both or removes both together (depending on gating), so we can’t express:

- embedding, proxy off: FP16 yes, Split no - embedding, proxy on: FP16 yes, Split yes

To get that behavior cleanly, we need one extra signal for “proxy-only subset” (like _proxy_only_onnx_transforms = (SplitTensorsTransform,)) or a hardcoded class check, thats why we can still reuse _onnx_transforms as base, and keep _proxy_only_onnx_transforms to encode the split-only toggle.

in modelling_utils.py we already have _PROXY_ONLY_ONNX_TRANSFORMS = (FP16ClipTransform, SplitTensorsTransform) so it would be safe to make _onnx_transforms = [FP16ClipTransform] for AutoModel and _onnx_transforms = [] for other automodels. We can completely get rid of _proxy_only_onnx_transforms IMO

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

…nx transforms Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

@anujgupt-github

…c#881) This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

@anujgupt-github

…c#881) This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

@anujgupt-github

This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

@anujgupt-github

This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi self-assigned this Mar 24, 2026

vbaddi requested review from ochougul and quic-rishinr as code owners March 24, 2026 13:31

vbaddi added the bug Something isn't working label Mar 24, 2026

vbaddi requested review from quic-amitraj and quic-hemagnih as code owners March 24, 2026 13:31

quic-rishinr requested changes Mar 25, 2026

View reviewed changes

quic-rishinr approved these changes Mar 26, 2026

View reviewed changes

vbaddi added 3 commits March 26, 2026 13:23

fix(embedding): keep FP16Clip transform enabled by default for AutoModel

a81a740

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

refactor(transforms): reuse _onnx_transforms for embedding proxy gating

3bbc765

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

refactor(transforms): simplify proxy gating with per-class default on…

a9f113c

…nx transforms Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

quic-rishinr force-pushed the bug/fix_embedding_transforms branch from f92c2fa to a9f113c Compare March 26, 2026 07:53

quic-rishinr merged commit a071142 into quic:main Mar 26, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedding): restore default FP16Clip transform for automodel#881

fix(embedding): restore default FP16Clip transform for automodel#881
quic-rishinr merged 3 commits intoquic:mainfrom
vbaddi:bug/fix_embedding_transforms

vbaddi commented Mar 24, 2026 •

edited

Loading

Uh oh!

quic-rishinr Mar 25, 2026

Uh oh!

vbaddi Mar 25, 2026 •

edited

Loading

Uh oh!

vbaddi Mar 25, 2026

Uh oh!

vbaddi Mar 25, 2026

Uh oh!

quic-rishinr Mar 25, 2026

Uh oh!

vbaddi Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Tests verified

Uh oh!

quic-rishinr Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbaddi Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

quic-rishinr Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

vbaddi Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vbaddi commented Mar 24, 2026 •

edited

Loading

vbaddi Mar 25, 2026 •

edited

Loading