Skip to content

[ML] Add quantized model ops to pytorch_inference allowlist#2991

Merged
valeriy42 merged 1 commit intoelastic:mainfrom
edsavage:fix/elser-quantized-ops
Mar 13, 2026
Merged

[ML] Add quantized model ops to pytorch_inference allowlist#2991
valeriy42 merged 1 commit intoelastic:mainfrom
edsavage:fix/elser-quantized-ops

Conversation

@edsavage
Copy link
Contributor

@edsavage edsavage commented Mar 13, 2026

Summary

  • Adds aten::mul_ and quantized::linear_dynamic to the ALLOWED_OPERATIONS set in CSupportedOperations.cc, fixing model graph validation failures for dynamically quantized models (e.g. ELSER v2 imported via Eland with torch.quantization.quantize_dynamic).
  • Updates the model extraction tooling (extract_model_ops.py, validate_allowlist.py, torchscript_utils.py) to support a "quantize": true flag in reference model configs, so quantized variants are traced with dynamic quantization applied before graph extraction — mirroring the Eland import pipeline.
  • Adds quantized ELSER v2 entries to reference_models.json, validation_models.json, and the reference_model_ops.json test fixture.

Context

The kibana-elasticsearch-snapshot-verify pipeline was failing because pytorch_inference rejected ELSER v2 models with: "Model graph does not match any supported architecture. Unrecognised operations: aten::mul_, quantized::linear_dynamic". These operations are legitimate — they appear when Eland applies torch.quantization.quantize_dynamic on nn.Linear layers during model import.

Test plan

  • Local build succeeds
  • test_pytorch_inference passes (includes CModelGraphValidatorTest which validates ops against the reference golden file)
  • CI builds pass on all platforms
  • kibana-elasticsearch-snapshot-verify pipeline should no longer fail on ELSER v2 model loading

Labelling as >non-issue as this relates to an as yet unreleased enhancement.

Relates #2936

@prodsecmachine
Copy link

prodsecmachine commented Mar 13, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the TorchScript op extraction + allowlist validation tooling to support dynamically quantized HuggingFace models (mirroring Eland’s quantize_dynamic(nn.Linear) behavior) and updates the C++ allowed-ops set accordingly.

Changes:

  • Add a quantize flag to model specs in the dev tools, and apply dynamic quantization before tracing when enabled.
  • Add quantized model variants to the reference/validation model configs and include quantized ops in the C++ allowlist.
  • Update (part of) the golden reference-model ops JSON to include a quantized variant.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
dev-tools/extract_model_ops/validation_models.json Adds quantized model variants to the validation set.
dev-tools/extract_model_ops/reference_models.json Adds quantized model variants and an explanatory comment entry.
dev-tools/extract_model_ops/validate_allowlist.py Supports dict-based model specs and passes quantize through to tracing.
dev-tools/extract_model_ops/torchscript_utils.py Implements optional dynamic quantization before tracing.
dev-tools/extract_model_ops/extract_model_ops.py Normalizes config entries into {model_id, quantize} specs and emits a quantization flag in golden output.
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json Partially updates golden data and adds one quantized model’s op set.
bin/pytorch_inference/CSupportedOperations.cc Allows quantized::linear_dynamic and aten::mul_ for quantized models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@edsavage edsavage added the auto-backport Automatically merge backport PRs when CI passes label Mar 13, 2026
Add aten::mul_ and quantized::linear_dynamic to the allowed operations
list, fixing validation failures for dynamically quantized models such
as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic.

Also update the model extraction tooling to support a "quantize" flag in
reference_models.json so that quantized variants are traced with dynamic
quantization applied before graph extraction, mirroring the Eland import
pipeline.

Made-with: Cursor
@edsavage edsavage force-pushed the fix/elser-quantized-ops branch from 52c168a to 2c4e51c Compare March 13, 2026 03:31
Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valeriy42 valeriy42 merged commit 92432d6 into elastic:main Mar 13, 2026
16 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 13, 2026
Add aten::mul_ and quantized::linear_dynamic to the allowed operations
list, fixing validation failures for dynamically quantized models such
as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic.

Also update the model extraction tooling to support a "quantize" flag in
reference_models.json so that quantized variants are traced with dynamic
quantization applied before graph extraction, mirroring the Eland import
pipeline.

(cherry picked from commit 92432d6)
github-actions bot pushed a commit that referenced this pull request Mar 13, 2026
Add aten::mul_ and quantized::linear_dynamic to the allowed operations
list, fixing validation failures for dynamically quantized models such
as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic.

Also update the model extraction tooling to support a "quantize" flag in
reference_models.json so that quantized variants are traced with dynamic
quantization applied before graph extraction, mirroring the Eland import
pipeline.

(cherry picked from commit 92432d6)
github-actions bot pushed a commit that referenced this pull request Mar 13, 2026
Add aten::mul_ and quantized::linear_dynamic to the allowed operations
list, fixing validation failures for dynamically quantized models such
as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic.

Also update the model extraction tooling to support a "quantize" flag in
reference_models.json so that quantized variants are traced with dynamic
quantization applied before graph extraction, mirroring the Eland import
pipeline.

(cherry picked from commit 92432d6)
@github-actions
Copy link

💚 All backports created successfully

Status Branch Result
9.3
9.2
8.19

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

},
"elastic-eis-elser-v2": {
"model_id": "elastic/eis-elser-v2",
"quantized": false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe the name "quantized": true/false is oversimplified and hides an important nuance
I read this flag as "this model is quantized", while in reality it's "this model has dynamic quantization applied to specific layer types (nn.linear)"

Calling it dynamic_quantization would be a slight improvement - gives a better signal on what this flag controls.

Long term, if there is ever a need, we could replace a boolean with:

"quantize_layers": ["nn.linear"],
"target": "qint8"

def load_model_config(config_path: Path) -> dict[str, dict]:
"""Load a model config JSON file and normalise entries.

Each entry is either a plain model-name string or a dict with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alternatively we could make the json consistent and drop the normalization all together

valeriy42 added a commit to valeriy42/ml-cpp that referenced this pull request Mar 13, 2026
valeriy42 added a commit that referenced this pull request Mar 13, 2026
* Revert "[ML] Add quantized model ops to pytorch_inference allowlist (#2991)"

This reverts commit 92432d6.

* Revert "[ML] Harden pytorch_inference with TorchScript model graph validation (#2936)"

This reverts commit 38f6653.

* fix run_qa_tests buildkite step
edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants