Skip to content

A modular framework for few-shot visual segmentation using visual prompting techniques. Enables easy experimentation with different algorithms, backbones (SAM, MobileSAM, DinoV2), and pipeline components for finding and segmenting objects from just a few examples.

License

Notifications You must be signed in to change notification settings

open-edge-platform/geti-instant-learn

Geti Instant Learn Logo - A framework for developing, benchmarking, and deploying zero-shot visual prompting algorithms on the edge

A framework for developing, benchmarking, and deploying zero-shot visual prompting algorithms on the edge.

python license

Introduction

Visual prompting offers a powerful alternative to traditional training. Instead of curating thousands of labeled images, you simply show the model one or a few examples of what you are looking for. The model effectively "learns" instantly, detecting and segmenting similar objects in new images or live video streams without retraining.

Key Features

  • Library & Application: A unified framework providing a modular Python library for research and development, and a Full-Stack Application for deploying those algorithms on live video streams.
  • Simple & Modular API: A composable design where developers can mix and match components (backbones, matchers) to create custom pipelines.
  • Algorithms & Models: A wide collection of ready-to-use zero-shot and few-shot algorithms (e.g., SAM 2, Matcher, GroundedSAM) and foundation models.
  • Hardware Acceleration: Built-in support for model optimization and export to OpenVINO™ for fast inference on Intel hardware (CPU, GPU, NPU).
  • Multiple Backends: Seamless switching between PyTorch for flexibility/research and OpenVINO for optimized deployment.

Getting Started

Geti Instant Learn consists of two core components:

  • Python Library: The foundation for research and zero/few shots algorithm development.
  • Full Stack Application: Leverages the library to enable real-time inference on live streams, video files, and images.
Prerequisites
  • Python 3.12+
  • uv (Python package manager)
  • Just (Command runner)
  • Node.js (v24.2.0) (Required only for the UI Application)
  • Docker (Optional, for containerized deployment)

Geti Instant Learn Library

Install the library:

cd library
uv sync --extra xpu    # Intel XPU (recommended)
uv sync --extra cpu    # CPU only
uv sync --extra gpu    # CUDA support

Or with pip:

pip install ./library[xpu]  # or [cpu], [gpu]

SAM3: Zero-Shot Text Prompting

SAM3 performs zero-shot segmentation using text prompts (category names) or bounding boxes — no reference mask needed. You provide a list of categories you want to segment in any image.

SAM3 Example: Text-prompted segmentation on multiple elephant images

from instantlearn.models import SAM3
from instantlearn.data import Sample

# Initialize SAM3 (device: "xpu", "cuda", or "cpu")
model = SAM3(device="xpu")

# SAM3 is zero-shot — no fit() required. Just provide categories per sample.
predictions = model.predict([
    Sample(image_path="library/examples/assets/coco/000000286874.jpg", categories=["elephant"]),
    Sample(image_path="library/examples/assets/coco/000000173279.jpg", categories=["elephant"]),
])

Tip: Calling model.fit(sample) is optional for SAM3. If called, the fitted categories are reused for all subsequent predict() calls so you don't need to specify categories on every target sample. If not called, categories are taken from each target sample directly.

For more examples of SAM3 capabilities, see the SAM3 aerial & maritime notebook.

Since SAM3 requires a text prompt for every sample (unless fit() is used), this is where Matcher comes in — you fit once with a reference mask (one-shot) and predict on any number of new images without providing prompts again.

Matcher: One-Shot Visual Prompting

Matcher Example: Reference Image → Reference Mask → 3 Predictions

from instantlearn.models import Matcher
from instantlearn.data import Sample

# Initialize Matcher (device: "xpu", "cuda", or "cpu")
model = Matcher(device="xpu")

# Create reference sample (auto-loads image and mask from paths)
ref_sample = Sample(
    image_path="library/examples/assets/coco/000000286874.jpg",
    mask_paths="library/examples/assets/coco/000000286874_mask.png",
)

# Fit once on reference
model.fit(ref_sample)

# Predict on multiple target images — no prompts needed
predictions = model.predict([
    "library/examples/assets/coco/000000390341.jpg",
    "library/examples/assets/coco/000000173279.jpg",
    "library/examples/assets/coco/000000267704.jpg",
])

# Access results for each image
for pred in predictions:
    masks = pred["pred_masks"]  # Predicted segmentation masks

For interactive mask generation with SAM, CLI usage, and benchmarking, see the Library README.

Geti Instant Learn Application

Full-stack web interface for real-time inference.

Deploy models on live video streams, cameras, and video files.

just application/dev

Access at: http://localhost:3000

View Application Documentation →

Supported Models and Algorithms

Geti Instant Learn supports a variety of foundation models and visual prompting algorithms, optimized for different performance needs.

Visual Prompting Algorithms

Algorithm Description Paper Repository
Matcher Standard feature matching pipeline using SAM. Matcher Matcher
SoftMatcher Enhanced matching pipeline with soft feature comparison, inspired by Optimal Transport. IJCAI 2024 N/A
PerDino Personalized DINO-based prompting, leveraging DINOv2/v3 features for robust matching. PerSAM Personalize-SAM
GroundedSAM Combines Grounding DINO and SAM for text-based visual prompting and segmentation. Grounding DINO, SAM GroundedSAM
SAM 3 Open-vocabulary segmentation using concept-based prompts. SAM 3 SAM 3

Foundation Models (Backbones)

Family Models Description Paper Repository
SAM SAM-HQ, SAM-HQ-tiny High-quality variants of the original Segment Anything Model. Segment Anything, SAM-HQ SAM, SAM-HQ
SAM 2 SAM2-tiny, SAM2-small, SAM2-base, SAM2-large The next generation of Segment Anything, offering improved performance and speed. SAM 2 sam2
SAM 3 SAM 3 Segment Anything with Concepts, supporting open-vocabulary prompts. SAM 3 SAM 3
DINOv2 Small, Base, Large, Giant Self-supervised vision transformers with registers, used for feature extraction. DINOv2, Registers dinov2
DINOv3 Small, Small+, Base, Large, Huge The latest iteration of DINO models. DINOv3 dinov3
Grounding DINO (Integrated in GroundedSAM) Open-set object detection model. Grounding DINO GroundingDINO

Documentation

Component README Documentation
Library library/README.md library/docs
Application application/README.md application/docs

Community

License

Geti Instant Learn is licensed under the Apache License 2.0.

FFmpeg is an open source project licensed under LGPL and GPL. See https://www.ffmpeg.org/legal.html. You are solely responsible for determining if your use of FFmpeg requires any additional licenses. Intel is not responsible for obtaining any such licenses, nor liable for any licensing fees due, in connection with your use of FFmpeg.

About

A modular framework for few-shot visual segmentation using visual prompting techniques. Enables easy experimentation with different algorithms, backbones (SAM, MobileSAM, DinoV2), and pipeline components for finding and segmenting objects from just a few examples.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 21