Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,13 @@ wheels/

# Virtual environments
.venv
venv/
apps/api-inference/.venv
apps/api-inference-yolo/.venv

# SAM3 Model Weights
apps/sam3.pt
sam3.pt

# Environment variables
# .env
Expand Down Expand Up @@ -76,4 +82,4 @@ htmlcov/
.codex
.agent
backend/docs
node_modules/.vite
node_modules/.vite
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,21 @@ Diagram source: `docs/architecture/system-overview.mmd`.
| PostgreSQL | Core data store | 5432 |
| Redis | Cache and task queue | - |

### Backend Options

**🚀 Ultralytics SAM3 (Recommended)** - `api-inference-yolo`
- ✅ Faster inference with FP16 support
- ✅ Better text prompt segmentation with semantic understanding
- ✅ Bounding box exemplar-based segmentation for finding similar objects
- ✅ No HuggingFace account required (but model download is manual)
- ✅ Supports single, auto-apply, and batch processing modes
- 📦 Uses: `ultralytics`, PyTorch, SAM3SemanticPredictor

**🔄 HuggingFace SAM3** - `api-inference`
- ✅ Auto-downloads model on first run
- ⚠️ Requires HuggingFace account and gated model access
- 📦 Uses: `transformers`, `huggingface-hub`

## Quick Start

### Prerequisites
Expand Down
28 changes: 28 additions & 0 deletions apps/api-inference-yolo/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Application Settings
DEBUG=true
APP_HOST=0.0.0.0
APP_PORT=8000
LOG_LEVEL=INFO

# HuggingFace Authentication (REQUIRED for SAM3)
# Get your token from: https://huggingface.co/settings/tokens
HF_TOKEN=hf_your_token_here

# SAM3 Model Settings
SAM3_MODEL_NAME=facebook/sam3
SAM3_DEVICE=auto
SAM3_DEFAULT_THRESHOLD=0.5
SAM3_DEFAULT_MASK_THRESHOLD=0.5

# API Limits
MAX_IMAGE_SIZE_MB=10
MAX_BATCH_SIZE=10
MAX_IMAGE_DIMENSION=4096

# Visualization
VISUALIZATION_FORMAT=PNG
VISUALIZATION_QUALITY=95

# CORS
# Must be a JSON array. Examples: ["*"], ["http://localhost:3000","http://localhost:5173"]
ALLOWED_ORIGINS=["*"]
2 changes: 2 additions & 0 deletions apps/api-inference-yolo/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.venv
.env
1 change: 1 addition & 0 deletions apps/api-inference-yolo/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
45 changes: 45 additions & 0 deletions apps/api-inference-yolo/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# 1. Change Base Image to NVIDIA CUDA (Devel version ensures compatibility with Transformers/PyTorch extensions)
# Note: Use CUDA 12.1 or 12.4 depending on your PyTorch version requirements. 12.1 is widely supported.
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04

WORKDIR /code

# 2. Install system dependencies
# We need software-properties-common to install Python easily if needed,
# but 'uv' can actually manage Python for us.
RUN apt-get update && apt-get install -y \
git \
curl \
ca-certificates \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*

# 3. Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:0.9.11 /uv /uvx /bin/

# 4. Configure uv to install Python 3.12 automatically
# Since the base image is Ubuntu without Python 3.12, uv will fetch it.
ENV UV_PYTHON=3.12
ENV UV_COMPILE_BYTECODE=1

# 5. Install dependencies
# Using --frozen ensures we respect the lockfile
RUN --mount=type=bind,source=uv.lock,target=uv.lock \
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
uv sync --frozen --no-cache

# 6. Install transformers (and ensure PyTorch uses CUDA)
# Note: If you need specific CUDA PyTorch, you might need to define extra-index-url in pyproject.toml
RUN uv pip install git+https://github.com/huggingface/transformers.git

# Copy application code
COPY src/ ./src/

EXPOSE 8000

# Set Python path
ENV PYTHONPATH=/code/src

# 7. Run application
# We use 'uv run' which ensures the correct python environment is used
CMD ["uv", "run", "app/main.py"]
Comment on lines +35 to +45

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

6. Docker cmd wrong path 🐞 Bug ⛯ Reliability

• The api-inference-yolo image copies code under /code/src but runs uv run app/main.py, which
  does not exist at that path.
• This will cause the backend container to fail immediately on startup, blocking the recommended
  backend option.
Agent Prompt
### Issue description
The `apps/api-inference-yolo` Docker image will not start because the `CMD` points at `app/main.py`, but the file is located at `src/app/main.py` inside the container.

### Issue Context
The Dockerfile copies `src/` to `/code/src` and sets `PYTHONPATH=/code/src`, so Python imports should use `app.*`.

### Fix Focus Areas
- apps/api-inference-yolo/Dockerfile[35-45]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

180 changes: 180 additions & 0 deletions apps/api-inference-yolo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# SAM3 FastAPI Backend

REST API for **Segment Anything Model 3 (SAM3)** inference using HuggingFace Transformers.

## Features

- ✅ **Text Prompt Inference** - Segment objects using natural language ("cat", "laptop", etc.)
- ✅ **Bounding Box Inference** - Segment using coordinate-based prompts
- ✅ **Batch Processing** - Process multiple images in one request
- ✅ **Mask Polygon Coordinates** - Get precise segmentation masks as polygon points
- ✅ **Optional Visualizations** - Get images with drawn masks/boxes (base64 encoded)
- ✅ **Processing Time Metadata** - Track inference performance
- ✅ **GPU Auto-detection** - Automatic CUDA/CPU selection

## Architecture

```
backend/
├── src/app/
│ ├── main.py # FastAPI app + model loading
│ ├── config.py # Settings and environment config
│ ├── integrations/sam3/
│ │ ├── inference.py # SAM3 model inference
│ │ ├── mask_utils.py # Mask processing utilities
│ │ └── visualizer.py # Mask and box visualization
│ ├── routers/sam3.py # API endpoints
│ ├── schemas/sam3.py # Pydantic request/response models
│ └── helpers/
│ ├── response_api.py # JSON response formatting
│ └── logger.py # Logging setup
├── docs/ # API documentation
├── Dockerfile # Python 3.12-slim + uv
├── pyproject.toml # Dependencies
└── example_sam.py # Usage examples
```

## Prerequisites

### 1. HuggingFace Access Token (REQUIRED)

SAM3 is a **gated model** on HuggingFace. You need to:

1. **Create a HuggingFace account**: https://huggingface.co/join
2. **Request access to SAM3**: Visit https://huggingface.co/facebook/sam3 and accept the license
3. **Generate an access token**: https://huggingface.co/settings/tokens
- Click "New token"
- Name it (e.g., "sam3-api")
- Select "Read" permissions
- Copy the token (starts with `hf_...`)

4. **Add token to `.env` file**:
```bash
cp .env.example .env
# Edit .env and replace:
HF_TOKEN=hf_your_actual_token_here
```

⚠️ **Without a valid HF_TOKEN, the application will fail to load the model!**

### 2. System Requirements

- **Python 3.12+**
- **NVIDIA GPU with CUDA** (optional, for faster inference)

## Local Development

```bash
# 1. Setup HuggingFace token
cp .env.example .env
nano .env # Add your HF_TOKEN

# 2. Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# 3. Install dependencies
uv sync

# 4. Run application
uv run app/main.py

# API available at http://localhost:8000
# Docs at http://localhost:8000/docs
```

## API Endpoints

### 1. Text Prompt Inference

```bash
POST /api/v1/sam3/inference/text

curl -X POST http://localhost:8000/api/v1/sam3/inference/text \
-F "image=@cat.jpg" \
-F "text_prompt=ear" \
-F "threshold=0.5" \
-F "return_visualization=true"
```

### 2. Bounding Box Inference

```bash
POST /api/v1/sam3/inference/bbox

curl -X POST http://localhost:8000/api/v1/sam3/inference/bbox \
-F "image=@kitchen.jpg" \
-F 'bounding_boxes=[[59, 144, 76, 163, 1]]' \
-F "threshold=0.5"
```

### 3. Batch Processing

```bash
POST /api/v1/sam3/inference/batch

curl -X POST http://localhost:8000/api/v1/sam3/inference/batch \
-F "images=@cat.jpg" \
-F "images=@dog.jpg" \
-F 'text_prompts=["cat", "dog"]'
```

### 4. Health Check

```bash
GET /api/v1/sam3/health
```

## Configuration

Edit `.env` file:

```bash
# Application
DEBUG=true
APP_HOST=0.0.0.0
APP_PORT=8000
LOG_LEVEL=INFO

# HuggingFace (REQUIRED)
HF_TOKEN=hf_your_token_here

# SAM3 Model
SAM3_MODEL_NAME=facebook/sam3
SAM3_DEVICE=auto # auto, cpu, cuda
SAM3_DEFAULT_THRESHOLD=0.5

# API Limits
MAX_IMAGE_SIZE_MB=10
MAX_BATCH_SIZE=10
MAX_IMAGE_DIMENSION=4096

# Visualization
VISUALIZATION_FORMAT=PNG # PNG or JPEG
VISUALIZATION_QUALITY=95

# CORS
ALLOWED_ORIGINS=* # Comma-separated origins
```

## Docker Deployment

See root `docker-compose.yml` for container setup.

## Performance

- **GPU (CUDA)**: Fast inference (~200-500ms per image)
- **CPU**: 5-10x slower than GPU

Batch processing is more efficient for multiple images.

## API Documentation

- **Swagger UI**: http://localhost:8000/docs
- **ReDoc**: http://localhost:8000/redoc
- **OpenAPI JSON**: http://localhost:8000/openapi.json

## References

- [SAM3 Documentation](https://huggingface.co/facebook/sam3)
- [SAM3 GitHub](https://github.com/facebookresearch/sam3)
- [FastAPI](https://fastapi.tiangolo.com/)
Loading