A professional video annotation tool for research and development, powered by SAM2 (Segment Anything Model 2) for intelligent object segmentation and tracking.
FlyMeThrough is a React-based web application that enables researchers to efficiently annotate video sequences with object segmentation masks and bounding boxes. The tool leverages state-of-the-art computer vision models to provide semi-automated annotation capabilities, significantly reducing the time and effort required for video annotation tasks.
- Intelligent Segmentation: Utilizes SAM2 (Segment Anything Model 2) for high-quality object segmentation
- Interactive Annotation: Point-and-click interface for positive/negative point annotations
- Real-time Processing: Live mask generation and refinement as you annotate
- Multi-frame Tracking: Automatic propagation of annotations across video frames
- Video Review Mode: Playback functionality to review annotations at 2fps
- Compression-aware: Supports both compressed and raw frame formats
- Export Capabilities: Structured annotation data export for downstream processing
- Professional UI: Clean, responsive interface built with DaisyUI and Tailwind CSS
- Node.js (v16 or higher)
- npm or yarn
- Backend video server (see Backend Setup section)
- Clone the repository:
git clone <repository-url>
cd diam-annotator
- Install dependencies:
npm install
- Start the development server:
npm start
- Open http://localhost:3000 to view the application in your browser.
The application requires a backend server that provides:
- Video metadata and frame lists via
GET /videos
- Compressed frame images at
http://localhost/video_images_compressed/{video_name}/{frame}
- Raw frame images at
http://localhost/video_images_raw/{video_name}/{frame}
- Annotation processing endpoint at
/process
for SAM2 inference
- Browse available videos from the backend
- Preview video thumbnails
- Select video for annotation
- Point Annotation: Click to add positive (foreground) or negative (background) points
- Real-time Segmentation: SAM2 generates masks instantly based on point inputs
- Mask Refinement: Add additional points to improve segmentation quality
- Annotation Confirmation: Save annotations with descriptive labels
- Frame Navigation: Use arrow keys or frame panel to navigate through video
- Annotation Propagation: Backend processes annotations across multiple frames
- Bounding Box Visualization: View automatically generated bounding boxes
- Review Mode: Automatic playback through annotated frames
- Statistics Display: View annotation count and processing times
- Quality Control: Visual verification of annotation accuracy
src/
├── Components/
│ ├── Annotation.tsx # Core annotation data model
│ ├── Video.tsx # Video data management class
│ ├── AnnotationInterface/ # Annotation UI components
│ │ ├── AnnotationPanel.tsx # Main annotation canvas
│ │ ├── AnnotationResultsPanel.tsx # Results display
│ │ ├── AnnotationToolsPanel.tsx # Annotation tools
│ │ └── FramesPanel.tsx # Frame navigation
│ ├── SelectionInterface/ # Video selection components
│ │ ├── VideoCard.tsx # Individual video preview
│ │ └── VideoList.tsx # Video grid display
│ └── SAM/ # SAM2 model integration
│ ├── encoder.tsx # SAM2 encoder wrapper
│ ├── decoder.tsx # SAM2 decoder/predictor
│ └── mask.tsx # Mask processing utilities
├── Pages/
│ ├── SelectionPage.tsx # Video selection page
│ └── AnnotationPage.tsx # Main annotation interface
├── App.tsx # Root application component
└── index.tsx # Application entry point
- Manages individual annotation instances
- Handles point collections, masks, and bounding boxes
- Provides validation and data export functionality
- Centralized video data management
- Frame navigation and URL generation
- Annotation collection and SAM encoding storage
- Main annotation interface orchestration
- State management for annotation workflow
- Integration with SAM2 models and backend processing
- Encoder: Processes frame images to generate embeddings
- Decoder/Predictor: Generates masks from embeddings and point prompts
- Mask Processing: Handles mask visualization and data compression
- Video Loading: Backend provides video metadata and frame lists
- Frame Processing: SAM2 encoder generates frame embeddings
- User Interaction: Point clicks trigger SAM2 decoder for mask generation
- Annotation Storage: Masks and metadata stored in Video/Annotation objects
- Backend Processing: Compressed annotation data sent for multi-frame processing
- Results Integration: Backend returns bounding boxes integrated into video data
- React 18: Modern React with hooks and functional components
- TypeScript: Type-safe development with comprehensive interfaces
- React Router: Client-side routing for navigation
- Konva/React-Konva: Canvas-based graphics for annotation visualization
- Axios: HTTP client for backend communication
- Tailwind CSS: Utility-first CSS framework
- DaisyUI: Tailwind CSS component library
- ONNX Runtime Web: Client-side model inference
- SAM2: Segment Anything Model 2 for object segmentation
- Create React App: Development and build toolchain
- PostCSS: CSS processing and optimization
- Use TypeScript for type safety
- Document all public methods with JSDoc comments
- Follow React functional component patterns
- Implement proper error handling and validation
- Use meaningful variable and function names
- Separate concerns between data models and UI components
- Use React hooks for state management
- Implement proper cleanup for event listeners and intervals
- Follow single responsibility principle
- Optimize SAM2 model loading and inference
- Implement efficient frame caching strategies
- Use appropriate data structures for large annotation datasets
- Minimize re-renders through proper state management
npm start
: Development server with hot reloadnpm test
: Run test suitenpm run build
: Production buildnpm run eject
: Eject from Create React App (irreversible)
GET /videos
Returns array of video objects with name, image_count, and frame metadata.
POST /process
Accepts compressed annotation data and returns multi-frame bounding box results.
- Follow the established code structure and patterns
- Add comprehensive documentation for new features
- Test annotation workflows thoroughly
- Ensure backend compatibility for new functionality
- Update README documentation for significant changes
This project is designed for research and development purposes. Please ensure appropriate licensing for production use.
- Model Loading: Ensure ONNX model files are accessible in the public directory
- Backend Connection: Verify backend server is running and accessible
- CORS Issues: Configure backend to allow cross-origin requests
- Memory Usage: Large videos may require chunked processing for performance
- Use compressed frames for display, raw frames for processing
- Limit concurrent SAM2 inference operations
- Implement frame preloading for smoother navigation
- Monitor memory usage during long annotation sessions