[ML] Add quantized model ops to pytorch_inference allowlist by edsavage · Pull Request #2991 · elastic/ml-cpp

edsavage · 2026-03-13T03:17:20Z

Summary

Adds aten::mul_ and quantized::linear_dynamic to the ALLOWED_OPERATIONS set in CSupportedOperations.cc, fixing model graph validation failures for dynamically quantized models (e.g. ELSER v2 imported via Eland with torch.quantization.quantize_dynamic).
Updates the model extraction tooling (extract_model_ops.py, validate_allowlist.py, torchscript_utils.py) to support a "quantize": true flag in reference model configs, so quantized variants are traced with dynamic quantization applied before graph extraction — mirroring the Eland import pipeline.
Adds quantized ELSER v2 entries to reference_models.json, validation_models.json, and the reference_model_ops.json test fixture.

Context

The kibana-elasticsearch-snapshot-verify pipeline was failing because pytorch_inference rejected ELSER v2 models with: "Model graph does not match any supported architecture. Unrecognised operations: aten::mul_, quantized::linear_dynamic". These operations are legitimate — they appear when Eland applies torch.quantization.quantize_dynamic on nn.Linear layers during model import.

Test plan

Local build succeeds
test_pytorch_inference passes (includes CModelGraphValidatorTest which validates ops against the reference golden file)
CI builds pass on all platforms
kibana-elasticsearch-snapshot-verify pipeline should no longer fail on ELSER v2 model loading

Labelling as >non-issue as this relates to an as yet unreleased enhancement.

Relates #2936

prodsecmachine · 2026-03-13T03:17:33Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot

Pull request overview

Extends the TorchScript op extraction + allowlist validation tooling to support dynamically quantized HuggingFace models (mirroring Eland’s quantize_dynamic(nn.Linear) behavior) and updates the C++ allowed-ops set accordingly.

Changes:

Add a quantize flag to model specs in the dev tools, and apply dynamic quantization before tracing when enabled.
Add quantized model variants to the reference/validation model configs and include quantized ops in the C++ allowlist.
Update (part of) the golden reference-model ops JSON to include a quantized variant.

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
dev-tools/extract_model_ops/validation_models.json	Adds quantized model variants to the validation set.
dev-tools/extract_model_ops/reference_models.json	Adds quantized model variants and an explanatory comment entry.
dev-tools/extract_model_ops/validate_allowlist.py	Supports dict-based model specs and passes `quantize` through to tracing.
dev-tools/extract_model_ops/torchscript_utils.py	Implements optional dynamic quantization before tracing.
dev-tools/extract_model_ops/extract_model_ops.py	Normalizes config entries into `{model_id, quantize}` specs and emits a quantization flag in golden output.
bin/pytorch_inference/unittest/testfiles/reference_model_ops.json	Partially updates golden data and adds one quantized model’s op set.
bin/pytorch_inference/CSupportedOperations.cc	Allows `quantized::linear_dynamic` and `aten::mul_` for quantized models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dev-tools/extract_model_ops/extract_model_ops.py

dev-tools/extract_model_ops/validate_allowlist.py

Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. Made-with: Cursor

valeriy42

LGTM

Add aten::mul_ and quantized::linear_dynamic to the allowed operations list, fixing validation failures for dynamically quantized models such as ELSER v2 when imported via Eland with torch.quantization.quantize_dynamic. Also update the model extraction tooling to support a "quantize" flag in reference_models.json so that quantized variants are traced with dynamic quantization applied before graph extraction, mirroring the Eland import pipeline. (cherry picked from commit 92432d6)

github-actions · 2026-03-13T09:15:52Z

💚 All backports created successfully

Status	Branch	Result
✅	9.3
✅	9.2
✅	8.19

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

darius-vil · 2026-03-13T10:11:30Z

bin/pytorch_inference/unittest/testfiles/reference_model_ops.json

    },
    "elastic-eis-elser-v2": {
      "model_id": "elastic/eis-elser-v2",
+      "quantized": false,


nit: maybe the name "quantized": true/false is oversimplified and hides an important nuance
I read this flag as "this model is quantized", while in reality it's "this model has dynamic quantization applied to specific layer types (nn.linear)"

Calling it dynamic_quantization would be a slight improvement - gives a better signal on what this flag controls.

Long term, if there is ever a need, we could replace a boolean with:

"quantize_layers": ["nn.linear"], "target": "qint8"

darius-vil · 2026-03-13T10:14:36Z

dev-tools/extract_model_ops/torchscript_utils.py

+def load_model_config(config_path: Path) -> dict[str, dict]:
+    """Load a model config JSON file and normalise entries.
+
+    Each entry is either a plain model-name string or a dict with


nit: alternatively we could make the json consistent and drop the normalization all together

…lastic#2991)" This reverts commit 92432d6.

* Revert "[ML] Add quantized model ops to pytorch_inference allowlist (#2991)" This reverts commit 92432d6. * Revert "[ML] Harden pytorch_inference with TorchScript model graph validation (#2936)" This reverts commit 38f6653. * fix run_qa_tests buildkite step

This reverts commit 4f1ec3e.

edsavage added >bug :ml labels Mar 13, 2026

edsavage added >non-issue v9.4.0 v9.3.3 v9.2.8 v8.19.14 and removed >bug labels Mar 13, 2026

edsavage requested review from Copilot and valeriy42 March 13, 2026 03:21

Copilot started reviewing on behalf of edsavage March 13, 2026 03:22 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

dev-tools/extract_model_ops/extract_model_ops.py Outdated Show resolved Hide resolved

dev-tools/extract_model_ops/extract_model_ops.py Outdated Show resolved Hide resolved

dev-tools/extract_model_ops/validate_allowlist.py Outdated Show resolved Hide resolved

edsavage added the auto-backport Automatically merge backport PRs when CI passes label Mar 13, 2026

edsavage force-pushed the fix/elser-quantized-ops branch from 52c168a to 2c4e51c Compare March 13, 2026 03:31

valeriy42 approved these changes Mar 13, 2026

View reviewed changes

valeriy42 merged commit 92432d6 into elastic:main Mar 13, 2026
16 checks passed

github-actions bot mentioned this pull request Mar 13, 2026

[9.3] [ML] Add quantized model ops to pytorch_inference allowlist (#2991) #2992

Closed

github-actions bot added the backport-pending label Mar 13, 2026

github-actions bot mentioned this pull request Mar 13, 2026

[9.2] [ML] Add quantized model ops to pytorch_inference allowlist (#2991) #2993

Closed

github-actions bot mentioned this pull request Mar 13, 2026

[8.19] [ML] Add quantized model ops to pytorch_inference allowlist (#2991) #2994

Closed

darius-vil reviewed Mar 13, 2026

View reviewed changes

valeriy42 added a commit to valeriy42/ml-cpp that referenced this pull request Mar 13, 2026

Revert "[ML] Add quantized model ops to pytorch_inference allowlist (e…

d03b286

…lastic#2991)" This reverts commit 92432d6.

edsavage added a commit to edsavage/ml-cpp that referenced this pull request Mar 15, 2026

Revert "[ML] Revert elastic#2991 and elastic#2936 (elastic#2995)"

efb9d68

This reverts commit 4f1ec3e.

edsavage mentioned this pull request Mar 15, 2026

[ML] Harden pytorch_inference with TorchScript model graph validation #2999

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add quantized model ops to pytorch_inference allowlist#2991

[ML] Add quantized model ops to pytorch_inference allowlist#2991
valeriy42 merged 1 commit intoelastic:mainfrom
edsavage:fix/elser-quantized-ops

edsavage commented Mar 13, 2026 •

edited

Loading

Uh oh!

prodsecmachine commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valeriy42 left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

darius-vil Mar 13, 2026

Uh oh!

darius-vil Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

edsavage commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Uh oh!

prodsecmachine commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

💚 All backports created successfully

Questions ?

Uh oh!

darius-vil Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

darius-vil Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edsavage commented Mar 13, 2026 •

edited

Loading

prodsecmachine commented Mar 13, 2026 •

edited

Loading