agfianf · Rajkisan · Feb 7, 2026 · Feb 11, 2026 · Feb 11, 2026 · Feb 12, 2026
diff --git a/.gitignore b/.gitignore
@@ -8,7 +8,13 @@ wheels/
 
 # Virtual environments
 .venv
+venv/
 apps/api-inference/.venv
+apps/api-inference-yolo/.venv
+
+# SAM3 Model Weights
+apps/sam3.pt
+sam3.pt
 
 # Environment variables
 # .env
@@ -76,4 +82,4 @@ htmlcov/
 .codex
 .agent
 backend/docs
-node_modules/.vite
+node_modules/.vite
diff --git a/README.md b/README.md
@@ -122,6 +122,21 @@ Diagram source: `docs/architecture/system-overview.mmd`.
 | PostgreSQL | Core data store | 5432 |
 | Redis | Cache and task queue | - |
 
+### Backend Options
+
+**🚀 Ultralytics SAM3 (Recommended)** - `api-inference-yolo`
+- ✅ Faster inference with FP16 support
+- ✅ Better text prompt segmentation with semantic understanding
+- ✅ Bounding box exemplar-based segmentation for finding similar objects
+- ✅ No HuggingFace account required (but model download is manual)
+- ✅ Supports single, auto-apply, and batch processing modes
+- 📦 Uses: `ultralytics`, PyTorch, SAM3SemanticPredictor
+
+**🔄 HuggingFace SAM3** - `api-inference`
+- ✅ Auto-downloads model on first run
+- ⚠️ Requires HuggingFace account and gated model access
+- 📦 Uses: `transformers`, `huggingface-hub`
+
 ## Quick Start
 
 ### Prerequisites

diff --git a/apps/api-inference-yolo/.env.example b/apps/api-inference-yolo/.env.example
@@ -0,0 +1,28 @@
+# Application Settings
+DEBUG=true
+APP_HOST=0.0.0.0
+APP_PORT=8000
+LOG_LEVEL=INFO
+
+# HuggingFace Authentication (REQUIRED for SAM3)
+# Get your token from: https://huggingface.co/settings/tokens
+HF_TOKEN=hf_your_token_here
+
+# SAM3 Model Settings
+SAM3_MODEL_NAME=facebook/sam3
+SAM3_DEVICE=auto
+SAM3_DEFAULT_THRESHOLD=0.5
+SAM3_DEFAULT_MASK_THRESHOLD=0.5
+
+# API Limits
+MAX_IMAGE_SIZE_MB=10
+MAX_BATCH_SIZE=10
+MAX_IMAGE_DIMENSION=4096
+
+# Visualization
+VISUALIZATION_FORMAT=PNG
+VISUALIZATION_QUALITY=95
+
+# CORS
+# Must be a JSON array. Examples: ["*"], ["http://localhost:3000","http://localhost:5173"]
+ALLOWED_ORIGINS=["*"]
diff --git a/apps/api-inference-yolo/.gitignore b/apps/api-inference-yolo/.gitignore
@@ -0,0 +1,2 @@
+.venv
+.env
diff --git a/apps/api-inference-yolo/.python-version b/apps/api-inference-yolo/.python-version
@@ -0,0 +1 @@
+3.12
diff --git a/apps/api-inference-yolo/Dockerfile b/apps/api-inference-yolo/Dockerfile
@@ -0,0 +1,45 @@
+# 1. Change Base Image to NVIDIA CUDA (Devel version ensures compatibility with Transformers/PyTorch extensions)
+# Note: Use CUDA 12.1 or 12.4 depending on your PyTorch version requirements. 12.1 is widely supported.
+FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
+
+WORKDIR /code
+
+# 2. Install system dependencies
+# We need software-properties-common to install Python easily if needed, 
+# but 'uv' can actually manage Python for us.
+RUN apt-get update && apt-get install -y \
+    git \
+    curl \
+    ca-certificates \
+    libsndfile1 \
+    && rm -rf /var/lib/apt/lists/*
+
+# 3. Install uv package manager
+COPY --from=ghcr.io/astral-sh/uv:0.9.11 /uv /uvx /bin/
+
+# 4. Configure uv to install Python 3.12 automatically
+# Since the base image is Ubuntu without Python 3.12, uv will fetch it.
+ENV UV_PYTHON=3.12
+ENV UV_COMPILE_BYTECODE=1
+
+# 5. Install dependencies
+# Using --frozen ensures we respect the lockfile
+RUN --mount=type=bind,source=uv.lock,target=uv.lock \
+    --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
+    uv sync --frozen --no-cache
+
+# 6. Install transformers (and ensure PyTorch uses CUDA)
+# Note: If you need specific CUDA PyTorch, you might need to define extra-index-url in pyproject.toml
+RUN uv pip install git+https://github.com/huggingface/transformers.git
+
+# Copy application code
+COPY src/ ./src/
+
+EXPOSE 8000
+
+# Set Python path
+ENV PYTHONPATH=/code/src
+
+# 7. Run application
+# We use 'uv run' which ensures the correct python environment is used
+CMD ["uv", "run", "app/main.py"]
diff --git a/apps/api-inference-yolo/README.md b/apps/api-inference-yolo/README.md
@@ -0,0 +1,180 @@
+# SAM3 FastAPI Backend
+
+REST API for **Segment Anything Model 3 (SAM3)** inference using HuggingFace Transformers.
+
+## Features
+
+- ✅ **Text Prompt Inference** - Segment objects using natural language ("cat", "laptop", etc.)
+- ✅ **Bounding Box Inference** - Segment using coordinate-based prompts
+- ✅ **Batch Processing** - Process multiple images in one request
+- ✅ **Mask Polygon Coordinates** - Get precise segmentation masks as polygon points
+- ✅ **Optional Visualizations** - Get images with drawn masks/boxes (base64 encoded)
+- ✅ **Processing Time Metadata** - Track inference performance
+- ✅ **GPU Auto-detection** - Automatic CUDA/CPU selection
+
+## Architecture
+
+```
+backend/
+├── src/app/
+│   ├── main.py                      # FastAPI app + model loading
+│   ├── config.py                    # Settings and environment config
+│   ├── integrations/sam3/
+│   │   ├── inference.py             # SAM3 model inference
+│   │   ├── mask_utils.py            # Mask processing utilities
+│   │   └── visualizer.py            # Mask and box visualization
+│   ├── routers/sam3.py              # API endpoints
+│   ├── schemas/sam3.py              # Pydantic request/response models
+│   └── helpers/
+│       ├── response_api.py          # JSON response formatting
+│       └── logger.py                # Logging setup
+├── docs/                            # API documentation
+├── Dockerfile                       # Python 3.12-slim + uv
+├── pyproject.toml                   # Dependencies
+└── example_sam.py                   # Usage examples
+```
+
+## Prerequisites
+
+### 1. HuggingFace Access Token (REQUIRED)
+
+SAM3 is a **gated model** on HuggingFace. You need to:
+
+1. **Create a HuggingFace account**: https://huggingface.co/join
+2. **Request access to SAM3**: Visit https://huggingface.co/facebook/sam3 and accept the license
+3. **Generate an access token**: https://huggingface.co/settings/tokens
+   - Click "New token"
+   - Name it (e.g., "sam3-api")
+   - Select "Read" permissions
+   - Copy the token (starts with `hf_...`)
+
+4. **Add token to `.env` file**:
+```bash
+cp .env.example .env
+# Edit .env and replace:
+HF_TOKEN=hf_your_actual_token_here
+```
+
+⚠️ **Without a valid HF_TOKEN, the application will fail to load the model!**
+
+### 2. System Requirements
+
+- **Python 3.12+**
+- **NVIDIA GPU with CUDA** (optional, for faster inference)
+
+## Local Development
+
+```bash
+# 1. Setup HuggingFace token
+cp .env.example .env
+nano .env  # Add your HF_TOKEN
+
+# 2. Install uv package manager
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# 3. Install dependencies
+uv sync
+
+# 4. Run application
+uv run app/main.py
+
+# API available at http://localhost:8000
+# Docs at http://localhost:8000/docs
+```
+
+## API Endpoints
+
+### 1. Text Prompt Inference
+
+```bash
+POST /api/v1/sam3/inference/text
+
+curl -X POST http://localhost:8000/api/v1/sam3/inference/text \
+  -F "image=@cat.jpg" \
+  -F "text_prompt=ear" \
+  -F "threshold=0.5" \
+  -F "return_visualization=true"
+```
+
+### 2. Bounding Box Inference
+
+```bash
+POST /api/v1/sam3/inference/bbox
+
+curl -X POST http://localhost:8000/api/v1/sam3/inference/bbox \
+  -F "image=@kitchen.jpg" \
+  -F 'bounding_boxes=[[59, 144, 76, 163, 1]]' \
+  -F "threshold=0.5"
+```
+
+### 3. Batch Processing
+
+```bash
+POST /api/v1/sam3/inference/batch
+
+curl -X POST http://localhost:8000/api/v1/sam3/inference/batch \
+  -F "images=@cat.jpg" \
+  -F "images=@dog.jpg" \
+  -F 'text_prompts=["cat", "dog"]'
+```
+
+### 4. Health Check
+
+```bash
+GET /api/v1/sam3/health
+```
+
+## Configuration
+
+Edit `.env` file:
+
+```bash
+# Application
+DEBUG=true
+APP_HOST=0.0.0.0
+APP_PORT=8000
+LOG_LEVEL=INFO
+
+# HuggingFace (REQUIRED)
+HF_TOKEN=hf_your_token_here
+
+# SAM3 Model
+SAM3_MODEL_NAME=facebook/sam3
+SAM3_DEVICE=auto  # auto, cpu, cuda
+SAM3_DEFAULT_THRESHOLD=0.5
+
+# API Limits
+MAX_IMAGE_SIZE_MB=10
+MAX_BATCH_SIZE=10
+MAX_IMAGE_DIMENSION=4096
+
+# Visualization
+VISUALIZATION_FORMAT=PNG  # PNG or JPEG
+VISUALIZATION_QUALITY=95
+
+# CORS
+ALLOWED_ORIGINS=*  # Comma-separated origins
+```
+
+## Docker Deployment
+
+See root `docker-compose.yml` for container setup.
+
+## Performance
+
+- **GPU (CUDA)**: Fast inference (~200-500ms per image)
+- **CPU**: 5-10x slower than GPU
+
+Batch processing is more efficient for multiple images.
+
+## API Documentation
+
+- **Swagger UI**: http://localhost:8000/docs
+- **ReDoc**: http://localhost:8000/redoc
+- **OpenAPI JSON**: http://localhost:8000/openapi.json
+
+## References
+
+- [SAM3 Documentation](https://huggingface.co/facebook/sam3)
+- [SAM3 GitHub](https://github.com/facebookresearch/sam3)
+- [FastAPI](https://fastapi.tiangolo.com/)