This repository provides a set of tools and examples for converting and utilizing powerful vision models, DINOv3 and EdgeTAM (SAM2), within the ONNX ecosystem. The focus is on creating efficient, PyTorch-independent inference pipelines for tasks like one-shot segmentation, foreground extraction, and robust video object tracking. Also tried TFLite/LiteRT.
βββ notebooks/
β βββ dinov3_onnx_export.ipynb # Exports DINOv3 to ONNX
β βββ dinov3_tflite_export.ipynb # Exports DINOv3 to TFLite
β βββ edgetam_onnx_export.ipynb # Exports EdgeTAM encoder/decoder to ONNX
β βββ foreground_segmentation_onnx_export.ipynb # Trains and exports a foreground classifier
β βββ dinov3_one_shot_segmentation_onnx.ipynb # Demo for one-shot segmentation with ONNX
β βββ dinov3_one_shot_segmentation_tflite.ipynb # Demo for one-shot segmentation with TFLite
β
βββ scripts/
βββ hybrid_tracker.py Each notebook is self-contained and can be run directly in Google Colab.
| Notebook | Description | Link |
|---|---|---|
dinov3_onnx_export.ipynb |
Converts the DINOv3 Vision Transformer (ViT) feature extractor to ONNX format. | link |
dinov3_tflite_export.ipynb |
Converts the DINOv3 Vision Transformer (ViT) feature extractor to TFLite format. | link |
edgetam_onnx_export.ipynb |
Exports the EdgeTAM image encoder and mask decoder models to ONNX for efficient segmentation. | link |
foreground_segmentation_onnx_export.ipynb |
Trains a logistic regression classifier on DINOv3 features for foreground segmentation and exports it to ONNX | link |
dinov3_one_shot_segmentation_onnx.ipynb |
Demonstrates one-shot segmentation using DINOv3 features and a reference mask, all in ONNX. | link |
dinov3_one_shot_segmentation_tflite.ipynb |
Demonstrates one-shot segmentation using DINOv3 features and a reference mask, all in TFLite. | link |
This work builds upon the official implementations and research from the following projects:
DINOv3: facebookresearch/dinov3
EdgeTAM: facebookresearch/EdgeTAM
Space-Time Correspondence as a Contrastive Random Walk: ajabri/videowalk