Introduction

A framework for developing, benchmarking, and deploying zero-shot visual prompting algorithms on the edge.

Introduction

Visual prompting offers a powerful alternative to traditional training. Instead of curating thousands of labeled images, you simply show the model one or a few examples of what you are looking for. The model effectively "learns" instantly, detecting and segmenting similar objects in new images or live video streams without retraining.

Key Features

Library & Application: A unified framework providing a modular Python library for research and development, and a Full-Stack Application for deploying those algorithms on live video streams.
Simple & Modular API: A composable design where developers can mix and match components (backbones, matchers) to create custom pipelines.
Algorithms & Models: A wide collection of ready-to-use zero-shot and few-shot algorithms (e.g., SAM 2, Matcher, GroundedSAM) and foundation models.
Hardware Acceleration: Built-in support for model optimization and export to OpenVINO™ for fast inference on Intel hardware (CPU, GPU, NPU).
Multiple Backends: Seamless switching between PyTorch for flexibility/research and OpenVINO for optimized deployment.

Getting Started

Geti Instant Learn consists of two core components:

Python Library: The foundation for research and zero/few shots algorithm development.
Full Stack Application: Leverages the library to enable real-time inference on live streams, video files, and images.

Prerequisites

Python 3.12+
uv (Python package manager)
Just (Command runner)
Node.js (v24.2.0) (Required only for the UI Application)
Docker (Optional, for containerized deployment)

Geti Instant Learn Library

Install the library:

cd library
uv sync --extra xpu    # Intel XPU (recommended)
uv sync --extra cpu    # CPU only
uv sync --extra gpu    # CUDA support

Or with pip:

pip install ./library[xpu]  # or [cpu], [gpu]

SAM3: Zero-Shot Text Prompting

SAM3 performs zero-shot segmentation using text prompts (category names) or bounding boxes — no reference mask needed. You provide a list of categories you want to segment in any image.

from instantlearn.models import SAM3
from instantlearn.data import Sample

# Initialize SAM3 (device: "xpu", "cuda", or "cpu")
model = SAM3(device="xpu")

# SAM3 is zero-shot — no fit() required. Just provide categories per sample.
predictions = model.predict([
    Sample(image_path="library/examples/assets/coco/000000286874.jpg", categories=["elephant"]),
    Sample(image_path="library/examples/assets/coco/000000173279.jpg", categories=["elephant"]),
])

Tip: Calling model.fit(sample) is optional for SAM3. If called, the fitted categories are reused for all subsequent predict() calls so you don't need to specify categories on every target sample. If not called, categories are taken from each target sample directly.

For more examples of SAM3 capabilities, see the SAM3 aerial & maritime notebook.

Since SAM3 requires a text prompt for every sample (unless fit() is used), this is where Matcher comes in — you fit once with a reference mask (one-shot) and predict on any number of new images without providing prompts again.

Matcher: One-Shot Visual Prompting

from instantlearn.models import Matcher
from instantlearn.data import Sample

# Initialize Matcher (device: "xpu", "cuda", or "cpu")
model = Matcher(device="xpu")

# Create reference sample (auto-loads image and mask from paths)
ref_sample = Sample(
    image_path="library/examples/assets/coco/000000286874.jpg",
    mask_paths="library/examples/assets/coco/000000286874_mask.png",
)

# Fit once on reference
model.fit(ref_sample)

# Predict on multiple target images — no prompts needed
predictions = model.predict([
    "library/examples/assets/coco/000000390341.jpg",
    "library/examples/assets/coco/000000173279.jpg",
    "library/examples/assets/coco/000000267704.jpg",
])

# Access results for each image
for pred in predictions:
    masks = pred["pred_masks"]  # Predicted segmentation masks

For interactive mask generation with SAM, CLI usage, and benchmarking, see the Library README.

Geti Instant Learn Application

Full-stack web interface for real-time inference.

Deploy models on live video streams, cameras, and video files.

just application/dev

Access at: http://localhost:3000

View Application Documentation →

Supported Models and Algorithms

Geti Instant Learn supports a variety of foundation models and visual prompting algorithms, optimized for different performance needs.

Visual Prompting Algorithms

Algorithm	Description	Paper	Repository
Matcher	Standard feature matching pipeline using SAM.	Matcher	Matcher
SoftMatcher	Enhanced matching pipeline with soft feature comparison, inspired by Optimal Transport.	IJCAI 2024	N/A
PerDino	Personalized DINO-based prompting, leveraging DINOv2/v3 features for robust matching.	PerSAM	Personalize-SAM
GroundedSAM	Combines Grounding DINO and SAM for text-based visual prompting and segmentation.	Grounding DINO, SAM	GroundedSAM
SAM 3	Open-vocabulary segmentation using concept-based prompts.	SAM 3	SAM 3

Foundation Models (Backbones)

Family	Models	Description	Paper	Repository
SAM	SAM-HQ, SAM-HQ-tiny	High-quality variants of the original Segment Anything Model.	Segment Anything, SAM-HQ	SAM, SAM-HQ
SAM 2	SAM2-tiny, SAM2-small, SAM2-base, SAM2-large	The next generation of Segment Anything, offering improved performance and speed.	SAM 2	sam2
SAM 3	SAM 3	Segment Anything with Concepts, supporting open-vocabulary prompts.	SAM 3	SAM 3
DINOv2	Small, Base, Large, Giant	Self-supervised vision transformers with registers, used for feature extraction.	DINOv2, Registers	dinov2
DINOv3	Small, Small+, Base, Large, Huge	The latest iteration of DINO models.	DINOv3	dinov3
Grounding DINO	(Integrated in GroundedSAM)	Open-set object detection model.	Grounding DINO	GroundingDINO

Documentation

Component	README	Documentation
Library	library/README.md	library/docs
Application	application/README.md	application/docs

Community

To report a bug or submit a feature request, please open a GitHub issue.
Ask questions via GitHub Discussions.

License

Geti Instant Learn is licensed under the Apache License 2.0.

FFmpeg is an open source project licensed under LGPL and GPL. See https://www.ffmpeg.org/legal.html. You are solely responsible for determining if your use of FFmpeg requires any additional licenses. Intel is not responsible for obtaining any such licenses, nor liable for any licensing fees due, in connection with your use of FFmpeg.

Name		Name	Last commit message	Last commit date
Latest commit History 698 Commits
.github		.github
application		application
assets		assets
library		library
.cursorrules		.cursorrules
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierrc.yaml		.prettierrc.yaml
.semgrepignore		.semgrepignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
VERSION		VERSION
third-party-programs.txt		third-party-programs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Key Features

Getting Started

Geti Instant Learn Library

SAM3: Zero-Shot Text Prompting

Matcher: One-Shot Visual Prompting

Geti Instant Learn Application

Supported Models and Algorithms

Visual Prompting Algorithms

Foundation Models (Backbones)

Documentation

Community

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 21

Uh oh!

Languages

License

open-edge-platform/geti-instant-learn

Folders and files

Latest commit

History

Repository files navigation

Introduction

Key Features

Getting Started

Geti Instant Learn Library

SAM3: Zero-Shot Text Prompting

Matcher: One-Shot Visual Prompting

Geti Instant Learn Application

Supported Models and Algorithms

Visual Prompting Algorithms

Foundation Models (Backbones)

Documentation

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 21

Uh oh!

Languages

Packages