[ML] Add quantized model ops to pytorch_inference allowlist#2991
[ML] Add quantized model ops to pytorch_inference allowlist#2991valeriy42 merged 1 commit intoelastic:mainfrom
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
There was a problem hiding this comment.
Pull request overview
Extends the TorchScript op extraction + allowlist validation tooling to support dynamically quantized HuggingFace models (mirroring Eland’s quantize_dynamic(nn.Linear) behavior) and updates the C++ allowed-ops set accordingly.
Changes:
- Add a
quantizeflag to model specs in the dev tools, and apply dynamic quantization before tracing when enabled. - Add quantized model variants to the reference/validation model configs and include quantized ops in the C++ allowlist.
- Update (part of) the golden reference-model ops JSON to include a quantized variant.
Reviewed changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| dev-tools/extract_model_ops/validation_models.json | Adds quantized model variants to the validation set. |
| dev-tools/extract_model_ops/reference_models.json | Adds quantized model variants and an explanatory comment entry. |
| dev-tools/extract_model_ops/validate_allowlist.py | Supports dict-based model specs and passes quantize through to tracing. |
| dev-tools/extract_model_ops/torchscript_utils.py | Implements optional dynamic quantization before tracing. |
| dev-tools/extract_model_ops/extract_model_ops.py | Normalizes config entries into {model_id, quantize} specs and emits a quantization flag in golden output. |
| bin/pytorch_inference/unittest/testfiles/reference_model_ops.json | Partially updates golden data and adds one quantized model’s op set. |
| bin/pytorch_inference/CSupportedOperations.cc | Allows quantized::linear_dynamic and aten::mul_ for quantized models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. Made-with: Cursor
52c168a to
2c4e51c
Compare
Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. (cherry picked from commit 92432d6)
Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. (cherry picked from commit 92432d6)
Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. (cherry picked from commit 92432d6)
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation and see the Github Action logs for details |
| }, | ||
| "elastic-eis-elser-v2": { | ||
| "model_id": "elastic/eis-elser-v2", | ||
| "quantized": false, |
There was a problem hiding this comment.
nit: maybe the name "quantized": true/false is oversimplified and hides an important nuance
I read this flag as "this model is quantized", while in reality it's "this model has dynamic quantization applied to specific layer types (nn.linear)"
Calling it dynamic_quantization would be a slight improvement - gives a better signal on what this flag controls.
Long term, if there is ever a need, we could replace a boolean with:
"quantize_layers": ["nn.linear"],
"target": "qint8"
| def load_model_config(config_path: Path) -> dict[str, dict]: | ||
| """Load a model config JSON file and normalise entries. | ||
|
|
||
| Each entry is either a plain model-name string or a dict with |
There was a problem hiding this comment.
nit: alternatively we could make the json consistent and drop the normalization all together
…lastic#2991)" This reverts commit 92432d6.
This reverts commit 4f1ec3e.
Summary
aten::mul_andquantized::linear_dynamicto theALLOWED_OPERATIONSset inCSupportedOperations.cc, fixing model graph validation failures for dynamically quantized models (e.g. ELSER v2 imported via Eland withtorch.quantization.quantize_dynamic).extract_model_ops.py,validate_allowlist.py,torchscript_utils.py) to support a"quantize": trueflag in reference model configs, so quantized variants are traced with dynamic quantization applied before graph extraction — mirroring the Eland import pipeline.reference_models.json,validation_models.json, and thereference_model_ops.jsontest fixture.Context
The
kibana-elasticsearch-snapshot-verifypipeline was failing becausepytorch_inferencerejected ELSER v2 models with: "Model graph does not match any supported architecture. Unrecognised operations: aten::mul_, quantized::linear_dynamic". These operations are legitimate — they appear when Eland appliestorch.quantization.quantize_dynamiconnn.Linearlayers during model import.Test plan
test_pytorch_inferencepasses (includesCModelGraphValidatorTestwhich validates ops against the reference golden file)kibana-elasticsearch-snapshot-verifypipeline should no longer fail on ELSER v2 model loadingLabelling as
>non-issueas this relates to an as yet unreleased enhancement.Relates #2936