Skip to content

fix(embedding): restore default FP16Clip transform for automodel#881

Merged
quic-rishinr merged 3 commits intoquic:mainfrom
vbaddi:bug/fix_embedding_transforms
Mar 26, 2026
Merged

fix(embedding): restore default FP16Clip transform for automodel#881
quic-rishinr merged 3 commits intoquic:mainfrom
vbaddi:bug/fix_embedding_transforms

Conversation

@vbaddi
Copy link
Contributor

@vbaddi vbaddi commented Mar 24, 2026

This PR restores FP16ClipTransform for embedding models (QEFFAutoModel) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories.

What changed

  • Added per-model support for always-on ONNX transforms in proxy configuration.
  • Set embedding models to always keep FP16ClipTransform enabled by default.
  • Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled.

Tests verified

  • python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

@vbaddi vbaddi self-assigned this Mar 24, 2026
@vbaddi vbaddi added the bug Something isn't working label Mar 24, 2026
_hf_auto_class = AutoModel
_pytorch_transforms = [CustomOpsTransform, AwqToMatmulNbitsTransform, GPTQToMatmulNbitsTransform]
_onnx_transforms = [FP16ClipTransform, SplitTensorsTransform]
_always_on_onnx_transforms = (FP16ClipTransform,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the _onnx_transforms itself? we can remove [FP16ClipTransform, SplitTensorsTransform] for all auto class and maintain only [FP16ClipTransform] for Embedding model

Copy link
Contributor Author

@vbaddi vbaddi Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot remove it fully, we want this transforms to be applied for proxy model, maybe i can try to unify and use hard fp16 clip for embedding using same list., Will update! thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to reuse _onnx_transforms as the source of truth and removed the separate embedding “always-on” path. _proxy_only_onnx_transforms, so embedding keeps FP16ClipTransform from its own _onnx_transforms and only marks SplitTensorsTransform as proxy-only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use _onnx_transforms, proxy-off keeps both or removes both together (depending on gating), so we can’t express:

  - embedding, proxy off: FP16 yes, Split no
  - embedding, proxy on: FP16 yes, Split yes

To get that behavior cleanly, we need one extra signal for “proxy-only subset” (like _proxy_only_onnx_transforms = (SplitTensorsTransform,)) or a hardcoded class check, thats why we can still reuse _onnx_transforms as base, and keep _proxy_only_onnx_transforms to encode the split-only toggle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in modelling_utils.py we already have _PROXY_ONLY_ONNX_TRANSFORMS = (FP16ClipTransform, SplitTensorsTransform) so it would be safe to make _onnx_transforms = [FP16ClipTransform] for AutoModel and _onnx_transforms = [] for other automodels. We can completely get rid of _proxy_only_onnx_transforms IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vbaddi added 3 commits March 26, 2026 13:23
…nx transforms

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@quic-rishinr quic-rishinr force-pushed the bug/fix_embedding_transforms branch from f92c2fa to a9f113c Compare March 26, 2026 07:53
@quic-rishinr quic-rishinr merged commit a071142 into quic:main Mar 26, 2026
5 checks passed
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Mar 26, 2026
…c#881)

This PR restores FP16ClipTransform for embedding models
(`QEFFAutoModel`) in the default (non-proxy) path, while preserving
existing proxy-gated behavior for other model categories.

### What changed
- Added per-model support for always-on ONNX transforms in proxy
configuration.
- Set embedding models to always keep FP16ClipTransform enabled by
default.
- Embedding accuracy on HW depends on FP16 clipping, so clip must remain
enabled for embedding even when proxy is disabled.

### Tests verified

- `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py
-k "test_text_embedding_fp16_clip_transform_and_export"`

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Mar 26, 2026
…c#881)

This PR restores FP16ClipTransform for embedding models
(`QEFFAutoModel`) in the default (non-proxy) path, while preserving
existing proxy-gated behavior for other model categories.

### What changed
- Added per-model support for always-on ONNX transforms in proxy
configuration.
- Set embedding models to always keep FP16ClipTransform enabled by
default.
- Embedding accuracy on HW depends on FP16 clipping, so clip must remain
enabled for embedding even when proxy is disabled.

### Tests verified

- `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py
-k "test_text_embedding_fp16_clip_transform_and_export"`

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
quic-akuruvil pushed a commit that referenced this pull request Mar 26, 2026
This PR restores FP16ClipTransform for embedding models
(`QEFFAutoModel`) in the default (non-proxy) path, while preserving
existing proxy-gated behavior for other model categories.

### What changed
- Added per-model support for always-on ONNX transforms in proxy
configuration.
- Set embedding models to always keep FP16ClipTransform enabled by
default.
- Embedding accuracy on HW depends on FP16 clipping, so clip must remain
enabled for embedding even when proxy is disabled.

### Tests verified

- `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py
-k "test_text_embedding_fp16_clip_transform_and_export"`

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
quic-akuruvil pushed a commit that referenced this pull request Mar 26, 2026
This PR restores FP16ClipTransform for embedding models
(`QEFFAutoModel`) in the default (non-proxy) path, while preserving
existing proxy-gated behavior for other model categories.

### What changed
- Added per-model support for always-on ONNX transforms in proxy
configuration.
- Set embedding models to always keep FP16ClipTransform enabled by
default.
- Embedding accuracy on HW depends on FP16 clipping, so clip must remain
enabled for embedding even when proxy is disabled.

### Tests verified

- `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py
-k "test_text_embedding_fp16_clip_transform_and_export"`

cc: @anujgupt-github @quic-rishinr @quic-hemagnih

---------

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants