Feature Request: Add lowering support for aten._scaled_dot_product_efficient_attention

Description of the bug:
When attempting to convert a PyTorch model that uses a modern Hugging Face transformers backbone (e.g., facebook/dinov3-vitb16-pretrain-lvd1689m), the conversion process fails. This appears to be because recent versions of the transformers library default to using an efficient attention implementation (like Flash Attention), which is exposed through the aten._scaled_dot_product_efficient_attention.default operator.

The ai-edge-torch converter does not currently have a "lowering" rule for this operator, meaning it cannot be translated into a TFLite-compatible format. This prevents many SOTA vision models from being converted out of the box

Minimal Code to Reproduce:

import torch
import ai_edge_torch
from transformers import AutoModel

# 1. Define device and model name
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = 'facebook/dinov3-vitb16-pretrain-lvd1689m'

# 2. Load the model using the default attention implementation
# This will fail during conversion
model = AutoModel.from_pretrained(model_name).to(device).eval()

# 3. Create a sample input
sample_input = (torch.randn(1, 3, 400, 400).to(device),)

# 4. Attempt to convert the model
# This line will raise the RuntimeError
try:
    edge_model = ai_edge_torch.convert(model, sample_input)
    print("Conversion successful!")
except Exception as e:
    print(f"Conversion failed with error:\n{e}")

### Actual vs expected behavior:

**Actual behavior:**
The ai_edge_torch.convert() call fails with the following RuntimeError:
RuntimeError: Lowering not found: aten._scaled_dot_product_efficient_attention.default
While executing %_scaled_dot_product_efficient_attention : [num_users=1] = call_function[target=torch.ops.aten._scaled_dot_product_efficient_attention.default](args = (...))

**Expected behavior:**
The model should be converted successfully into a TFLite model. Ideally, ai-edge-torch would recognize this operator and lower it to a standard, TFLite-compatible multi-head attention implementation.

### Any other information you'd like to share?


_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add lowering support for aten._scaled_dot_product_efficient_attention #827

1. Define device and model name

2. Load the model using the default attention implementation

This will fail during conversion

3. Create a sample input

4. Attempt to convert the model

This line will raise the RuntimeError

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Add lowering support for aten._scaled_dot_product_efficient_attention #827

Description

1. Define device and model name

2. Load the model using the default attention implementation

This will fail during conversion

3. Create a sample input

4. Attempt to convert the model

This line will raise the RuntimeError

Actual vs expected behavior:

Any other information you'd like to share?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions