-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add Ultralytics SAM3 backend with enhanced features #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Rajkisan
wants to merge
5
commits into
agfianf:main
Choose a base branch
from
Rajkisan:feature/ultralytics-sam3-enhancements
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
bfb6c14
feat: Add Ultralytics SAM3 backend with enhanced features
AIViz-Tech ab8050e
"changes on yolo inference"
AIViz-Tech e7b63e7
feat: add react-router-dom dependency and maintain text prompt persis…
Rajkisan ed8ce79
new bug fixes
Rajkisan 742a2fc
Merge branch 'main' into feature/ultralytics-sam3-enhancements
Rajkisan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Application Settings | ||
| DEBUG=true | ||
| APP_HOST=0.0.0.0 | ||
| APP_PORT=8000 | ||
| LOG_LEVEL=INFO | ||
|
|
||
| # HuggingFace Authentication (REQUIRED for SAM3) | ||
| # Get your token from: https://huggingface.co/settings/tokens | ||
| HF_TOKEN=hf_your_token_here | ||
|
|
||
| # SAM3 Model Settings | ||
| SAM3_MODEL_NAME=facebook/sam3 | ||
| SAM3_DEVICE=auto | ||
| SAM3_DEFAULT_THRESHOLD=0.5 | ||
| SAM3_DEFAULT_MASK_THRESHOLD=0.5 | ||
|
|
||
| # API Limits | ||
| MAX_IMAGE_SIZE_MB=10 | ||
| MAX_BATCH_SIZE=10 | ||
| MAX_IMAGE_DIMENSION=4096 | ||
|
|
||
| # Visualization | ||
| VISUALIZATION_FORMAT=PNG | ||
| VISUALIZATION_QUALITY=95 | ||
|
|
||
| # CORS | ||
| # Must be a JSON array. Examples: ["*"], ["http://localhost:3000","http://localhost:5173"] | ||
| ALLOWED_ORIGINS=["*"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| .venv | ||
| .env |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| 3.12 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # 1. Change Base Image to NVIDIA CUDA (Devel version ensures compatibility with Transformers/PyTorch extensions) | ||
| # Note: Use CUDA 12.1 or 12.4 depending on your PyTorch version requirements. 12.1 is widely supported. | ||
| FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 | ||
|
|
||
| WORKDIR /code | ||
|
|
||
| # 2. Install system dependencies | ||
| # We need software-properties-common to install Python easily if needed, | ||
| # but 'uv' can actually manage Python for us. | ||
| RUN apt-get update && apt-get install -y \ | ||
| git \ | ||
| curl \ | ||
| ca-certificates \ | ||
| libsndfile1 \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # 3. Install uv package manager | ||
| COPY --from=ghcr.io/astral-sh/uv:0.9.11 /uv /uvx /bin/ | ||
|
|
||
| # 4. Configure uv to install Python 3.12 automatically | ||
| # Since the base image is Ubuntu without Python 3.12, uv will fetch it. | ||
| ENV UV_PYTHON=3.12 | ||
| ENV UV_COMPILE_BYTECODE=1 | ||
|
|
||
| # 5. Install dependencies | ||
| # Using --frozen ensures we respect the lockfile | ||
| RUN --mount=type=bind,source=uv.lock,target=uv.lock \ | ||
| --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ | ||
| uv sync --frozen --no-cache | ||
|
|
||
| # 6. Install transformers (and ensure PyTorch uses CUDA) | ||
| # Note: If you need specific CUDA PyTorch, you might need to define extra-index-url in pyproject.toml | ||
| RUN uv pip install git+https://github.com/huggingface/transformers.git | ||
|
|
||
| # Copy application code | ||
| COPY src/ ./src/ | ||
|
|
||
| EXPOSE 8000 | ||
|
|
||
| # Set Python path | ||
| ENV PYTHONPATH=/code/src | ||
|
|
||
| # 7. Run application | ||
| # We use 'uv run' which ensures the correct python environment is used | ||
| CMD ["uv", "run", "app/main.py"] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,180 @@ | ||
| # SAM3 FastAPI Backend | ||
|
|
||
| REST API for **Segment Anything Model 3 (SAM3)** inference using HuggingFace Transformers. | ||
|
|
||
| ## Features | ||
|
|
||
| - ✅ **Text Prompt Inference** - Segment objects using natural language ("cat", "laptop", etc.) | ||
| - ✅ **Bounding Box Inference** - Segment using coordinate-based prompts | ||
| - ✅ **Batch Processing** - Process multiple images in one request | ||
| - ✅ **Mask Polygon Coordinates** - Get precise segmentation masks as polygon points | ||
| - ✅ **Optional Visualizations** - Get images with drawn masks/boxes (base64 encoded) | ||
| - ✅ **Processing Time Metadata** - Track inference performance | ||
| - ✅ **GPU Auto-detection** - Automatic CUDA/CPU selection | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| backend/ | ||
| ├── src/app/ | ||
| │ ├── main.py # FastAPI app + model loading | ||
| │ ├── config.py # Settings and environment config | ||
| │ ├── integrations/sam3/ | ||
| │ │ ├── inference.py # SAM3 model inference | ||
| │ │ ├── mask_utils.py # Mask processing utilities | ||
| │ │ └── visualizer.py # Mask and box visualization | ||
| │ ├── routers/sam3.py # API endpoints | ||
| │ ├── schemas/sam3.py # Pydantic request/response models | ||
| │ └── helpers/ | ||
| │ ├── response_api.py # JSON response formatting | ||
| │ └── logger.py # Logging setup | ||
| ├── docs/ # API documentation | ||
| ├── Dockerfile # Python 3.12-slim + uv | ||
| ├── pyproject.toml # Dependencies | ||
| └── example_sam.py # Usage examples | ||
| ``` | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| ### 1. HuggingFace Access Token (REQUIRED) | ||
|
|
||
| SAM3 is a **gated model** on HuggingFace. You need to: | ||
|
|
||
| 1. **Create a HuggingFace account**: https://huggingface.co/join | ||
| 2. **Request access to SAM3**: Visit https://huggingface.co/facebook/sam3 and accept the license | ||
| 3. **Generate an access token**: https://huggingface.co/settings/tokens | ||
| - Click "New token" | ||
| - Name it (e.g., "sam3-api") | ||
| - Select "Read" permissions | ||
| - Copy the token (starts with `hf_...`) | ||
|
|
||
| 4. **Add token to `.env` file**: | ||
| ```bash | ||
| cp .env.example .env | ||
| # Edit .env and replace: | ||
| HF_TOKEN=hf_your_actual_token_here | ||
| ``` | ||
|
|
||
| ⚠️ **Without a valid HF_TOKEN, the application will fail to load the model!** | ||
|
|
||
| ### 2. System Requirements | ||
|
|
||
| - **Python 3.12+** | ||
| - **NVIDIA GPU with CUDA** (optional, for faster inference) | ||
|
|
||
| ## Local Development | ||
|
|
||
| ```bash | ||
| # 1. Setup HuggingFace token | ||
| cp .env.example .env | ||
| nano .env # Add your HF_TOKEN | ||
|
|
||
| # 2. Install uv package manager | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
|
|
||
| # 3. Install dependencies | ||
| uv sync | ||
|
|
||
| # 4. Run application | ||
| uv run app/main.py | ||
|
|
||
| # API available at http://localhost:8000 | ||
| # Docs at http://localhost:8000/docs | ||
| ``` | ||
|
|
||
| ## API Endpoints | ||
|
|
||
| ### 1. Text Prompt Inference | ||
|
|
||
| ```bash | ||
| POST /api/v1/sam3/inference/text | ||
|
|
||
| curl -X POST http://localhost:8000/api/v1/sam3/inference/text \ | ||
| -F "image=@cat.jpg" \ | ||
| -F "text_prompt=ear" \ | ||
| -F "threshold=0.5" \ | ||
| -F "return_visualization=true" | ||
| ``` | ||
|
|
||
| ### 2. Bounding Box Inference | ||
|
|
||
| ```bash | ||
| POST /api/v1/sam3/inference/bbox | ||
|
|
||
| curl -X POST http://localhost:8000/api/v1/sam3/inference/bbox \ | ||
| -F "image=@kitchen.jpg" \ | ||
| -F 'bounding_boxes=[[59, 144, 76, 163, 1]]' \ | ||
| -F "threshold=0.5" | ||
| ``` | ||
|
|
||
| ### 3. Batch Processing | ||
|
|
||
| ```bash | ||
| POST /api/v1/sam3/inference/batch | ||
|
|
||
| curl -X POST http://localhost:8000/api/v1/sam3/inference/batch \ | ||
| -F "images=@cat.jpg" \ | ||
| -F "images=@dog.jpg" \ | ||
| -F 'text_prompts=["cat", "dog"]' | ||
| ``` | ||
|
|
||
| ### 4. Health Check | ||
|
|
||
| ```bash | ||
| GET /api/v1/sam3/health | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| Edit `.env` file: | ||
|
|
||
| ```bash | ||
| # Application | ||
| DEBUG=true | ||
| APP_HOST=0.0.0.0 | ||
| APP_PORT=8000 | ||
| LOG_LEVEL=INFO | ||
|
|
||
| # HuggingFace (REQUIRED) | ||
| HF_TOKEN=hf_your_token_here | ||
|
|
||
| # SAM3 Model | ||
| SAM3_MODEL_NAME=facebook/sam3 | ||
| SAM3_DEVICE=auto # auto, cpu, cuda | ||
| SAM3_DEFAULT_THRESHOLD=0.5 | ||
|
|
||
| # API Limits | ||
| MAX_IMAGE_SIZE_MB=10 | ||
| MAX_BATCH_SIZE=10 | ||
| MAX_IMAGE_DIMENSION=4096 | ||
|
|
||
| # Visualization | ||
| VISUALIZATION_FORMAT=PNG # PNG or JPEG | ||
| VISUALIZATION_QUALITY=95 | ||
|
|
||
| # CORS | ||
| ALLOWED_ORIGINS=* # Comma-separated origins | ||
| ``` | ||
|
|
||
| ## Docker Deployment | ||
|
|
||
| See root `docker-compose.yml` for container setup. | ||
|
|
||
| ## Performance | ||
|
|
||
| - **GPU (CUDA)**: Fast inference (~200-500ms per image) | ||
| - **CPU**: 5-10x slower than GPU | ||
|
|
||
| Batch processing is more efficient for multiple images. | ||
|
|
||
| ## API Documentation | ||
|
|
||
| - **Swagger UI**: http://localhost:8000/docs | ||
| - **ReDoc**: http://localhost:8000/redoc | ||
| - **OpenAPI JSON**: http://localhost:8000/openapi.json | ||
|
|
||
| ## References | ||
|
|
||
| - [SAM3 Documentation](https://huggingface.co/facebook/sam3) | ||
| - [SAM3 GitHub](https://github.com/facebookresearch/sam3) | ||
| - [FastAPI](https://fastapi.tiangolo.com/) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6. Docker cmd wrong path
🐞 Bug⛯ ReliabilityAgent Prompt
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools