- Project Overview
- Features
- Tech Stack
- Installation & Setup
- Environment Variables
- Project Structure
- Usage Guide
- API Endpoints
- Caption Styles
- Configuration
- Troubleshooting
- Future Improvements
Captify is a modern web application that automatically generates and overlays captions on video files. It uses AssemblyAI for speech-to-text transcription and Remotion for video rendering with customizable caption styles. The application supports multiple caption styles, Hinglish (Hindi + English) text rendering, and both client-side and server-side video rendering.
- Automatic Transcription: Uses AssemblyAI API for accurate speech-to-text conversion
- Multiple Caption Styles: 4 predefined caption styles (TikTok, Bottom-Centered, Top-Bar, Karaoke)
- Hinglish Support: Renders mixed Hindi (Devanagari) and English text correctly
- Video Rendering: Client-side rendering with canvas overlay and audio capture
- Download Support: Export rendered videos as MP4 files
- Real-time Preview: Live preview using Remotion Player
-
Video Upload
- Upload MP4 files from local device
- Secure upload to AWS S3 using presigned URLs
- Support for various video formats
-
Auto-Captioning
- One-click transcription using AssemblyAI
- Word-level timing information
- Speaker labels support
- Language detection with code-switching support
-
Caption Style Presets
- TikTok Style: Bottom-centered with bold text, transparent background
- Standard Subtitles: Bottom-centered with white text and shadow
- News Bar: Semi-transparent bar at the top
- Karaoke Highlight: Words highlight as they are spoken
-
Hinglish Support
- Proper rendering of mixed Hindi (Devanagari script) and English
- Uses Noto Sans and Noto Sans Devanagari fonts
- Correct text alignment and encoding
-
Video Preview
- Real-time preview with Remotion Player
- Frame-accurate caption timing
- Interactive playback controls
-
Video Rendering
- Client-side rendering with canvas overlay
- Audio capture and synchronization
- Progress tracking during rendering
- Export as MP4 format
-
Download Functionality
- Download rendered videos as MP4 files
- Automatic filename:
Captify_by_Vishal.mp4 - Support for blob URLs, data URLs, and regular URLs
- Next.js 16.0.3 - React framework with App Router
- React 19.2.0 - UI library
- TypeScript 5 - Type safety
- Tailwind CSS 4 - Styling
- Zustand 5.0.8 - State management
- Remotion 4.0.375 - Video rendering library
- Motion 12.23.24 - Animation library
- Next.js API Routes - Server-side endpoints
- AWS SDK v3 - S3 integration
- AssemblyAI 4.19.0 - Speech-to-text API
- @remotion/player - Video preview player
- @remotion/bundler - Server-side rendering (optional)
- @remotion/renderer - Video rendering (optional)
- axios - HTTP client
- lucide-react - Icons
- Node.js 18+ and npm/yarn/pnpm
- AWS S3 bucket with appropriate permissions
- AssemblyAI API key
- FFmpeg (for server-side rendering, optional)
-
Clone the repository
git clone <repository-url> cd catfy
-
Install dependencies
npm install # or yarn install # or pnpm install
-
Set up environment variables Create a
.env.localfile in the root directory:AWS_ACCESS_KEY_ID=your_aws_access_key AWS_SECRET_ACCESS_KEY=your_aws_secret_key AWS_REGION=your_aws_region AWS_S3_BUCKET_NAME=your_bucket_name ASSEMBLYAI_API_KEY=your_assemblyai_api_key
-
Run the development server
npm run dev # or yarn dev # or pnpm dev
-
Open your browser Navigate to http://localhost:3000
npm run build
npm start| Variable | Description | Example |
|---|---|---|
AWS_ACCESS_KEY_ID |
AWS access key for S3 | AKIAIOSFODNN7EXAMPLE |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3 | wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY |
AWS_REGION |
AWS region for S3 bucket | ap-northeast-3 |
AWS_S3_BUCKET_NAME |
S3 bucket name | assembly-ai-bucket |
ASSEMBLYAI_API_KEY |
AssemblyAI API key | your_api_key_here |
NODE_ENV- Environment mode (development/production)NEXT_PUBLIC_*- Public environment variables accessible in client
catfy/
├── app/ # Next.js App Router directory
│ ├── api/ # API routes
│ │ ├── polling/ # Transcription polling endpoint
│ │ ├── render/ # Video rendering endpoint
│ │ ├── transcription/ # Transcription initiation endpoint
│ │ └── upload-url/ # S3 presigned URL generation
│ ├── componentss/ # Custom components
│ │ ├── FileUpload.tsx # File upload component
│ │ ├── Video.tsx # Video player component
│ │ └── ...
│ ├── dashboard/ # Dashboard page
│ ├── generator/ # Video generation page
│ │ ├── assembly.tsx # Main caption generation component
│ │ └── page.tsx # Generator page wrapper
│ ├── store/ # State management
│ │ └── uploadStore.ts # Zustand store
│ ├── layout.tsx # Root layout
│ └── page.tsx # Home page
├── remotion/ # Remotion compositions
│ ├── VideoWithCaptions.tsx # Main video composition
│ ├── TikTokCaption.tsx # TikTok-style captions
│ ├── BottomCenteredCaption.tsx # Standard subtitles
│ ├── TopBarCaption.tsx # News bar style
│ ├── KaraokeCaption.tsx # Karaoke highlighting
│ ├── CaptionStyles.ts # Caption style definitions
│ ├── Root.tsx # Remotion root component
│ ├── fonts.ts # Font configuration
│ └── index.tsx # Remotion entry point
├── components/ # Shared UI components
├── lib/ # Utility functions
├── public/ # Static assets
├── next.config.ts # Next.js configuration
├── tsconfig.json # TypeScript configuration
└── package.json # Dependencies
-
Upload Video
- Click "Upload Video" button on the home page
- Select an MP4 file from your device
- Wait for upload to complete (file is uploaded to S3)
-
Generate Transcription
- Navigate to the generator page
- Click "Auto-generate captions" button
- Wait for transcription to complete (this may take a few minutes)
-
Select Caption Style
- Choose from 4 available caption styles:
- TikTok Style
- Standard Subtitles
- News Bar
- Karaoke Highlight
- Preview updates in real-time
- Choose from 4 available caption styles:
-
Preview Video
- Use the Remotion Player to preview the video with captions
- Scrub through the timeline to see captions at different times
- Verify caption timing and appearance
-
Render Video
- Click "Render Video" button
- Wait for rendering to complete (progress shown in button)
- Rendered video appears below
-
Download Video
- Click "Download Video" button
- File downloads as
Captify_by_Vishal.mp4
- Space: Play/Pause (in Remotion Player)
- Arrow Keys: Seek forward/backward
- M: Mute/Unmute
Generates a presigned URL for uploading videos to S3.
Request Body:
{
"fileName": "video.mp4",
"fileType": "video/mp4"
}Response:
{
"uploadURL": "https://s3.amazonaws.com/...",
"getURL": "https://s3.amazonaws.com/..."
}Initiates transcription with AssemblyAI.
Request Body:
{
"audio_url": "https://s3.amazonaws.com/...",
"speaker_labels": true,
"format_text": true,
"punctuate": true,
"speech_model": "universal",
"language_detection": true,
"language_detection_options": {
"code_switching": true,
"code_switching_confidence_threshold": 0.5
}
}Response:
{
"id": "transcription_id"
}Polls for transcription status and returns results when complete.
Request Body:
{
"id": "transcription_id"
}Response (Completed):
{
"text": "Full transcription text",
"words": [
{
"start": 0.5,
"end": 1.2,
"text": "Hello",
"confidence": 0.95,
"speaker": "A"
}
],
"status": "completed"
}Renders video with captions (currently returns placeholder).
Request Body:
{
"videoUrl": "https://s3.amazonaws.com/...",
"words": [...],
"duration": 25.78,
"captionStyle": "tiktok"
}Response:
{
"message": "Rendering endpoint configured...",
"videoUrl": null,
"progress": 100
}- Position: Bottom center
- Text: Bold, large font
- Background: Transparent (text shadow for readability)
- Font: Noto Sans (supports Hinglish)
- Best for: Social media videos, short-form content
- Position: Bottom center
- Text: White with strong shadow
- Background: Transparent
- Font: Noto Sans (supports Hinglish)
- Best for: Professional videos, documentaries
- Position: Top of video
- Text: White text
- Background: Transparent bar
- Font: Noto Sans (supports Hinglish)
- Best for: News videos, informational content
- Position: Center
- Text: Words highlight as they are spoken
- Background: Transparent with glow effect
- Font: Noto Sans (supports Hinglish)
- Best for: Music videos, karaoke content
The project uses webpack (not Turbopack) for Remotion compatibility:
webpack: (config, { isServer }) => {
// Exclude server-only packages from client bundle
if (!isServer) {
config.resolve.alias = {
"@remotion/bundler": false,
"@remotion/renderer": false,
"esbuild": false,
};
}
// ... additional webpack config
}Remotion compositions are defined in remotion/Root.tsx:
- Resolution: 1080x1920 (vertical video)
- Frame Rate: 30 fps
- Duration: Dynamic (based on video length)
Hinglish support is configured in remotion/fonts.ts:
export const HINGLISH_FONT_FAMILY =
'"Noto Sans Devanagari", "Noto Sans", sans-serif';Fonts are loaded in app/layout.tsx using Next.js font optimization.
Problem: AssemblyAI transcription fails to start
Solutions:
- Verify
ASSEMBLYAI_API_KEYis set correctly - Check that the video URL is accessible
- Ensure video format is supported (MP4 recommended)
Problem: Captions don't show in preview or rendered video
Solutions:
- Verify transcription completed successfully
- Check that
wordsarray is not empty - Ensure caption style is selected
- Check browser console for errors
Problem: Video upload to S3 fails
Solutions:
- Verify AWS credentials are correct
- Check S3 bucket permissions
- Ensure bucket name is correct
- Verify CORS configuration on S3 bucket
Problem: Client-side rendering is slow
Solutions:
- Use shorter videos for testing
- Close other browser tabs
- Consider implementing server-side rendering
- Check browser performance
Problem: Download button doesn't trigger download
Solutions:
- Check browser console for errors
- Verify
renderedVideoUrlis set - Ensure browser allows downloads
- Check file size (very large files may fail)
Problem: Hindi text appears as boxes or doesn't render
Solutions:
- Verify fonts are loaded (check Network tab)
- Ensure
Noto Sans Devanagariis available - Check text encoding in transcription response
- Chrome/Edge: Full support
- Firefox: Full support
- Safari: Full support (may have audio capture limitations)
- Mobile Browsers: Limited support for rendering (preview works)
-
Server-Side Rendering
- Implement full Remotion server-side rendering
- Generate true MP4 files (not WebM)
- Background job processing
-
Additional Caption Styles
- Custom font selection
- Color customization
- Position adjustment
- Animation effects
-
Batch Processing
- Upload multiple videos
- Queue system for rendering
- Progress tracking for multiple videos
-
User Accounts
- Save projects
- History of rendered videos
- Cloud storage integration
-
Advanced Features
- Subtitle file export (SRT, VTT)
- Translation support
- Multiple language captions
- Custom timing adjustments
-
Performance Optimizations
- Video compression options
- Rendering quality presets
- Caching for faster previews
-
UI/UX Improvements
- Drag-and-drop upload
- Timeline editor for captions
- Real-time caption editing
- Preview thumbnails
app/generator/assembly.tsx: Main caption generation and rendering logicremotion/VideoWithCaptions.tsx: Remotion composition for video with captionsapp/store/uploadStore.ts: Global state managementapp/api/transcription/route.ts: AssemblyAI integration
- TypeScript strict mode enabled
- ESLint for code quality
- Prettier for formatting (if configured)
- Component-based architecture
Currently, manual testing is used. Consider adding:
- Unit tests for utility functions
- Integration tests for API routes
- E2E tests for user workflows