fix(embedding): restore default FP16Clip transform for automodel#881
fix(embedding): restore default FP16Clip transform for automodel#881quic-rishinr merged 3 commits intoquic:mainfrom
Conversation
| _hf_auto_class = AutoModel | ||
| _pytorch_transforms = [CustomOpsTransform, AwqToMatmulNbitsTransform, GPTQToMatmulNbitsTransform] | ||
| _onnx_transforms = [FP16ClipTransform, SplitTensorsTransform] | ||
| _always_on_onnx_transforms = (FP16ClipTransform,) |
There was a problem hiding this comment.
Can we reuse the _onnx_transforms itself? we can remove [FP16ClipTransform, SplitTensorsTransform] for all auto class and maintain only [FP16ClipTransform] for Embedding model
There was a problem hiding this comment.
We cannot remove it fully, we want this transforms to be applied for proxy model, maybe i can try to unify and use hard fp16 clip for embedding using same list., Will update! thanks
There was a problem hiding this comment.
I updated this to reuse _onnx_transforms as the source of truth and removed the separate embedding “always-on” path. _proxy_only_onnx_transforms, so embedding keeps FP16ClipTransform from its own _onnx_transforms and only marks SplitTensorsTransform as proxy-only.
There was a problem hiding this comment.
If we only use _onnx_transforms, proxy-off keeps both or removes both together (depending on gating), so we can’t express:
- embedding, proxy off: FP16 yes, Split no
- embedding, proxy on: FP16 yes, Split yes
To get that behavior cleanly, we need one extra signal for “proxy-only subset” (like _proxy_only_onnx_transforms = (SplitTensorsTransform,)) or a hardcoded class check, thats why we can still reuse _onnx_transforms as base, and keep _proxy_only_onnx_transforms to encode the split-only toggle.
There was a problem hiding this comment.
in modelling_utils.py we already have _PROXY_ONLY_ONNX_TRANSFORMS = (FP16ClipTransform, SplitTensorsTransform) so it would be safe to make _onnx_transforms = [FP16ClipTransform] for AutoModel and _onnx_transforms = [] for other automodels. We can completely get rid of _proxy_only_onnx_transforms IMO
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…nx transforms Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
f92c2fa to
a9f113c
Compare
…c#881) This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
…c#881) This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
This PR restores FP16ClipTransform for embedding models (`QEFFAutoModel`) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories. ### What changed - Added per-model support for always-on ONNX transforms in proxy configuration. - Set embedding models to always keep FP16ClipTransform enabled by default. - Embedding accuracy on HW depends on FP16 clipping, so clip must remain enabled for embedding even when proxy is disabled. ### Tests verified - `python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"` cc: @anujgupt-github @quic-rishinr @quic-hemagnih --------- Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
This PR restores FP16ClipTransform for embedding models (
QEFFAutoModel) in the default (non-proxy) path, while preserving existing proxy-gated behavior for other model categories.What changed
Tests verified
python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -k "test_text_embedding_fp16_clip_transform_and_export"cc: @anujgupt-github @quic-rishinr @quic-hemagnih