A framework for developing, benchmarking, and deploying zero-shot visual prompting algorithms on the edge.
Visual prompting offers a powerful alternative to traditional training. Instead of curating thousands of labeled images, you simply show the model one or a few examples of what you are looking for. The model effectively "learns" instantly, detecting and segmenting similar objects in new images or live video streams without retraining.
- Library & Application: A unified framework providing a modular Python library for research and development, and a Full-Stack Application for deploying those algorithms on live video streams.
- Simple & Modular API: A composable design where developers can mix and match components (backbones, matchers) to create custom pipelines.
- Algorithms & Models: A wide collection of ready-to-use zero-shot and few-shot algorithms (e.g., SAM 2, Matcher, GroundedSAM) and foundation models.
- Hardware Acceleration: Built-in support for model optimization and export to OpenVINO™ for fast inference on Intel hardware (CPU, GPU, NPU).
- Multiple Backends: Seamless switching between PyTorch for flexibility/research and OpenVINO for optimized deployment.
Geti Instant Learn consists of two core components:
- Python Library: The foundation for research and zero/few shots algorithm development.
- Full Stack Application: Leverages the library to enable real-time inference on live streams, video files, and images.
Prerequisites
Install the library:
cd library
uv sync --extra xpu # Intel XPU (recommended)
uv sync --extra cpu # CPU only
uv sync --extra gpu # CUDA supportOr with pip:
pip install ./library[xpu] # or [cpu], [gpu]SAM3 performs zero-shot segmentation using text prompts (category names) or bounding boxes — no reference mask needed. You provide a list of categories you want to segment in any image.
from instantlearn.models import SAM3
from instantlearn.data import Sample
# Initialize SAM3 (device: "xpu", "cuda", or "cpu")
model = SAM3(device="xpu")
# SAM3 is zero-shot — no fit() required. Just provide categories per sample.
predictions = model.predict([
Sample(image_path="library/examples/assets/coco/000000286874.jpg", categories=["elephant"]),
Sample(image_path="library/examples/assets/coco/000000173279.jpg", categories=["elephant"]),
])Tip: Calling
model.fit(sample)is optional for SAM3. If called, the fitted categories are reused for all subsequentpredict()calls so you don't need to specify categories on every target sample. If not called, categories are taken from each target sample directly.
For more examples of SAM3 capabilities, see the SAM3 aerial & maritime notebook.
Since SAM3 requires a text prompt for every sample (unless fit() is used), this is where Matcher comes in —
you fit once with a reference mask (one-shot) and predict on any number of new images without providing prompts again.
from instantlearn.models import Matcher
from instantlearn.data import Sample
# Initialize Matcher (device: "xpu", "cuda", or "cpu")
model = Matcher(device="xpu")
# Create reference sample (auto-loads image and mask from paths)
ref_sample = Sample(
image_path="library/examples/assets/coco/000000286874.jpg",
mask_paths="library/examples/assets/coco/000000286874_mask.png",
)
# Fit once on reference
model.fit(ref_sample)
# Predict on multiple target images — no prompts needed
predictions = model.predict([
"library/examples/assets/coco/000000390341.jpg",
"library/examples/assets/coco/000000173279.jpg",
"library/examples/assets/coco/000000267704.jpg",
])
# Access results for each image
for pred in predictions:
masks = pred["pred_masks"] # Predicted segmentation masksFor interactive mask generation with SAM, CLI usage, and benchmarking, see the Library README.
Full-stack web interface for real-time inference.
Deploy models on live video streams, cameras, and video files.
just application/devAccess at: http://localhost:3000
View Application Documentation →
Geti Instant Learn supports a variety of foundation models and visual prompting algorithms, optimized for different performance needs.
| Algorithm | Description | Paper | Repository |
|---|---|---|---|
| Matcher | Standard feature matching pipeline using SAM. | Matcher | Matcher |
| SoftMatcher | Enhanced matching pipeline with soft feature comparison, inspired by Optimal Transport. | IJCAI 2024 | N/A |
| PerDino | Personalized DINO-based prompting, leveraging DINOv2/v3 features for robust matching. | PerSAM | Personalize-SAM |
| GroundedSAM | Combines Grounding DINO and SAM for text-based visual prompting and segmentation. | Grounding DINO, SAM | GroundedSAM |
| SAM 3 | Open-vocabulary segmentation using concept-based prompts. | SAM 3 | SAM 3 |
| Family | Models | Description | Paper | Repository |
|---|---|---|---|---|
| SAM | SAM-HQ, SAM-HQ-tiny | High-quality variants of the original Segment Anything Model. | Segment Anything, SAM-HQ | SAM, SAM-HQ |
| SAM 2 | SAM2-tiny, SAM2-small, SAM2-base, SAM2-large | The next generation of Segment Anything, offering improved performance and speed. | SAM 2 | sam2 |
| SAM 3 | SAM 3 | Segment Anything with Concepts, supporting open-vocabulary prompts. | SAM 3 | SAM 3 |
| DINOv2 | Small, Base, Large, Giant | Self-supervised vision transformers with registers, used for feature extraction. | DINOv2, Registers | dinov2 |
| DINOv3 | Small, Small+, Base, Large, Huge | The latest iteration of DINO models. | DINOv3 | dinov3 |
| Grounding DINO | (Integrated in GroundedSAM) | Open-set object detection model. | Grounding DINO | GroundingDINO |
| Component | README | Documentation |
|---|---|---|
| Library | library/README.md | library/docs |
| Application | application/README.md | application/docs |
- To report a bug or submit a feature request, please open a GitHub issue.
- Ask questions via GitHub Discussions.
Geti Instant Learn is licensed under the Apache License 2.0.
FFmpeg is an open source project licensed under LGPL and GPL. See https://www.ffmpeg.org/legal.html. You are solely responsible for determining if your use of FFmpeg requires any additional licenses. Intel is not responsible for obtaining any such licenses, nor liable for any licensing fees due, in connection with your use of FFmpeg.


