-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Description of the bug:
When attempting to convert a PyTorch model that uses a modern Hugging Face transformers backbone (e.g., facebook/dinov3-vitb16-pretrain-lvd1689m), the conversion process fails. This appears to be because recent versions of the transformers library default to using an efficient attention implementation (like Flash Attention), which is exposed through the aten._scaled_dot_product_efficient_attention.default operator.
The ai-edge-torch converter does not currently have a "lowering" rule for this operator, meaning it cannot be translated into a TFLite-compatible format. This prevents many SOTA vision models from being converted out of the box
Minimal Code to Reproduce:
import torch
import ai_edge_torch
from transformers import AutoModel
1. Define device and model name
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = 'facebook/dinov3-vitb16-pretrain-lvd1689m'
2. Load the model using the default attention implementation
This will fail during conversion
model = AutoModel.from_pretrained(model_name).to(device).eval()
3. Create a sample input
sample_input = (torch.randn(1, 3, 400, 400).to(device),)
4. Attempt to convert the model
This line will raise the RuntimeError
try:
edge_model = ai_edge_torch.convert(model, sample_input)
print("Conversion successful!")
except Exception as e:
print(f"Conversion failed with error:\n{e}")
Actual vs expected behavior:
Actual behavior:
The ai_edge_torch.convert() call fails with the following RuntimeError:
RuntimeError: Lowering not found: aten._scaled_dot_product_efficient_attention.default
While executing %_scaled_dot_product_efficient_attention : [num_users=1] = call_function[target=torch.ops.aten._scaled_dot_product_efficient_attention.default](args = (...))
Expected behavior:
The model should be converted successfully into a TFLite model. Ideally, ai-edge-torch would recognize this operator and lower it to a standard, TFLite-compatible multi-head attention implementation.
Any other information you'd like to share?
No response