Last Updated: October 12, 2025
Current Phase: Phase 1 (MVP)
Overall Progress: 95% Complete
Status: ✅ Backend Complete | ✅ Frontend Complete | ✅ Docker Complete | ❌ Testing Pending
This project has multiple documentation files that should be kept updated:
- PROJECT_OUTLINE.md (this file) - Complete task breakdown with checkboxes
- docs/API.md - Complete REST API documentation (✅ Complete - update when adding endpoints)
- docs/SETUP.md - Development setup guide (✅ Complete - update when adding setup steps)
- README.md - Project overview (✅ Complete - update with new features)
- docs/DOCKER-QUICKSTART.md - Docker deployment quick start guide (✅ Complete)
- docs/DOCKER.md - Docker detailed reference (✅ Complete)
Note: When adding new features, update relevant documentation files.
Project Name: Hard Word Extractor
Purpose: A web application that processes audio/video files to extract and classify vocabulary words by CEFR language levels (A1-C2), providing transcription and vocabulary analysis for language learners.
- Backend: Django 5.2+ with Django REST Framework
- Frontend: React 18+ with TypeScript, Material-UI
- AI/ML: OpenAI Whisper (local), Groq API (Phase 1), Local LLM (Phase 3)
- Task Queue: Celery with Redis
- Database: PostgreSQL (production), SQLite (development)
- Deployment: Docker, Docker Compose
- Server: Gunicorn + Nginx
- ✅ Audio transcription with word-level timestamps
- ✅ CEFR level classification (A1-C2)
- ✅ Word context extraction
- ✅ Vocabulary statistics and analytics
- ✅ Interactive results display
- ❌ Video support (Phase 2)
- ❌ User authentication (Phase 2)
- ❌ Local LLM (Phase 3)
- ✅ Backend (100%)
- ✅ API (100%)
- ✅ Frontend (95%)
- ✅ Docker (100%)
- ❌ Testing (0%)
- Not started
- Not started
- Not started
Goal: Create a functional prototype with core features using external APIs where necessary.
Completion: 88% ✅✅✅✅✅✅✅✅▒▒
- 1.1.1 Create main project directory structure
HardWordExtractor/ ├── backend/ ✅ ├── frontend/ ✅ ├── docker/ ✅ (empty) ├── docs/ ✅ ├── scripts/ ✅ (empty) ├── .gitignore ✅ ├── .env.example ✅ └── README.md ✅ - 1.1.2 Initialize Git repository
- Create .gitignore for Python, Node.js, and Docker
- Create initial commits with project structure
- 1.1.3 Create documentation structure
- API documentation (docs/API.md)
- Development setup guide (docs/SETUP.md)
- README.md with project overview
Status: ✅ COMPLETE
-
1.2.1 Create Django project
- Created
config/Django project - Created
transcription/Django app - Created
requirements.txtwith all dependencies:Django>=4.2.0 djangorestframework>=3.14.0 django-cors-headers>=4.3.0 python-dotenv>=1.0.0 openai-whisper>=20231117 groq>=0.4.0 celery>=5.3.0 redis>=5.0.0 psycopg2-binary>=2.9.9 gunicorn>=21.2.0 spacy>=3.7.0 pytest>=7.4.0 pytest-django>=4.5.0
- Created
-
1.2.2 Configure Django settings
- Created
config/settings/directory - Split settings into:
base.py,development.py,production.py,__init__.py - Configured CORS settings
- Set up media file handling for uploads
- Configured REST framework
- Configured logging
- Created
-
1.2.3 Set up environment variables
- Created
.env.examplefile with all required variables - Configured environment-based settings
- Created
Status: ✅ COMPLETE
-
1.3.1 Create database models
-
AudioFilemodel - File uploads and processing status- File field with validation
- Status tracking (pending → processing → transcribing → analyzing → completed/failed)
- Timestamps and duration
- Error message storage
-
Transcriptionmodel - Transcription results- One-to-one with AudioFile
- Full text storage
- Language detection
- Word counts
-
Wordmodel - Unique words database- CEFR level classification (A1-C2)
- Lemmatized form
- Global frequency counter
- Indexed for fast lookups
-
ExtractedWordmodel - Word occurrences in transcriptions- Links words to transcriptions
- Context storage (surrounding sentence)
- Timestamp and position
- Frequency per transcription
-
WordStatisticsmodel - Aggregated statistics- Counts by CEFR level
- Distribution percentages
- One-to-one with Transcription
-
-
1.3.2 Create and run migrations
- Run:
python manage.py makemigrations - Run:
python manage.py migrate - Create superuser for admin access
- Run:
-
1.3.3 Create database indexes
- Added indexes on frequently queried fields
- Optimized for status queries and lookups
Status: ✅ COMPLETE
Files Created:
backend/transcription/models.py(5 models, ~200 lines)
-
1.4.1 Create Celery configuration
- Created
config/celery.py - Configured Celery with Redis broker
- Set up task routing and time limits
- Created
-
1.4.2 Integrate with Django
- Updated
config/__init__.pyto load Celery - Configured auto-discovery of tasks
- Updated
-
1.4.3 Configure Celery settings in Django
- Set task time limits (30 minutes)
- Configure result backend
- Set up task serialization
Status: ✅ COMPLETE
Files Created:
backend/config/celery.pybackend/config/__init__.py(updated)
- 1.5.1 Register models in admin
-
AudioFileAdmin- File management interface- Custom list display with file size in MB
- Status filtering
- Processing time calculation
- Organized fieldsets
-
TranscriptionAdmin- Transcription management- Language filtering
- Word count display
- Full text search
-
WordAdmin- Word database management- CEFR level filtering with color badges
- Frequency display
- Search by word or lemma
-
ExtractedWordAdmin- Word occurrence management- Context display
- Timestamp and position tracking
-
WordStatisticsAdmin- Statistics overview- Level distribution display
- Total word calculations
-
Status: ✅ COMPLETE
Files Created:
backend/transcription/admin.py(~150 lines)
-
2.1.1 Create Whisper service module
- Created
transcription/services/whisper_service.py - Implemented
WhisperTranscriberclass - Model initialization with device selection (GPU/CPU)
- Transcription method with word-level timestamps
- Created
-
2.1.2 Handle audio file preprocessing
- Audio format validation
- Support for MP3, WAV, M4A formats
- Duration extraction
-
2.1.3 Implement transcription features
- Load and transcribe audio files
- Extract word-level timestamps
- Language detection
- Segment extraction
-
2.1.4 Add error handling
- Handle unsupported formats
- Handle corrupted files
- Handle timeout scenarios
- Comprehensive logging
-
2.1.5 Memory management
- Model loading/unloading
- GPU cache clearing
Status: ✅ COMPLETE
Files Created:
backend/transcription/services/whisper_service.py(~150 lines)
-
2.2.1 Create Groq service module
- Created
transcription/services/groq_service.py - Implemented
GroqClassifierclass - Configured API client with retry logic
- Created
-
2.2.2 Design prompt for word classification
- Created prompt template with CEFR descriptions
- Included A1-C2 level guidelines
- Request structured JSON output
- Example format in prompt
-
2.2.3 Implement word classification
- Extract unique words from transcription
- Batch words for API efficiency (50 words per batch)
- Call Groq API with classification prompt
- Parse and validate JSON response
- Handle invalid classifications
-
2.2.4 Implement caching mechanism
- Check cache before API calls
- Store classifications in database
- Reduce redundant API calls
-
2.2.5 Add rate limiting and error handling
- Implement exponential backoff
- Handle API rate limits
- Retry logic (3 attempts)
- Fallback to 'unknown' on failure
Status: ✅ COMPLETE
Files Created:
backend/transcription/services/groq_service.py(~200 lines)
-
2.3.1 Create word processing service
- Created
transcription/services/word_processor.py - Implemented text tokenization
- Implemented lemmatization (basic + spaCy support)
- Created
-
2.3.2 Implement filtering logic
- Remove punctuation and special characters
- Convert to lowercase
- Filter stop words (comprehensive English list)
- Remove numbers
- Keep only alphabetic words
- Minimum word length validation
-
2.3.3 Extract word context
- Extract surrounding sentences for each word
- Store context with word reference
- Link to timestamp in audio
-
2.3.4 Calculate word statistics
- Count word frequency in transcription
- Calculate unique words
- Group by CEFR level
- Generate text statistics
Status: ✅ COMPLETE
Files Created:
backend/transcription/services/word_processor.py(~300 lines)backend/transcription/services/__init__.py
-
2.4.1 Create tasks module
- Created
transcription/tasks.py
- Created
-
2.4.2 Implement main orchestration task
-
process_audio_file- Main task- Chains transcription and word extraction
- Updates processing status
- Error handling with retries (3 attempts)
- Status tracking at each step
-
-
2.4.3 Implement transcription task
-
transcribe_audio- Whisper processing- Loads audio file
- Runs Whisper transcription
- Extracts duration
- Creates Transcription record
- Returns transcription ID
-
-
2.4.4 Implement word extraction task
-
extract_and_classify_words- Word analysis- Extracts words using WordProcessor
- Checks cache for existing classifications
- Classifies new words with Groq
- Creates Word records
- Creates ExtractedWord records
- Stores context for each word
- Atomic transactions
-
-
2.4.5 Implement statistics generation
-
create_word_statistics- Analytics- Counts words by CEFR level
- Calculates distribution
- Creates/updates WordStatistics record
-
Status: ✅ COMPLETE
Files Created:
backend/transcription/tasks.py(~300 lines)
-
3.1.1 Create serializers module
- Created
transcription/serializers.py
- Created
-
3.1.2 Implement model serializers
-
AudioFileSerializer- File uploads and status- File validation (format and size)
- Calculated fields (file_size_mb, processing_time)
- Read-only fields for metadata
-
TranscriptionSerializer- Basic transcription- Nested audio file data
- Statistics included
-
TranscriptionDetailSerializer- With extracted words- Extends TranscriptionSerializer
- Includes all extracted words
-
WordSerializer- Word details- CEFR level display
- Global frequency
-
ExtractedWordSerializer- Word occurrences- Nested word data
- Context and timestamps
-
WordStatisticsSerializer- Analytics- Level distribution (counts and percentages)
- Total word calculation
-
AudioFileUploadSerializer- Upload validation- File size validation
- Format validation
-
Status: ✅ COMPLETE
Files Created:
backend/transcription/serializers.py(~160 lines)
-
3.2.1 Create views module
- Updated
transcription/views.py
- Updated
-
3.2.2 Implement ViewSets
-
AudioFileViewSet- File management- List/retrieve audio files
- Create (upload) with async processing
- Status filtering
- Custom
statusaction for progress
-
TranscriptionViewSet- Transcription access (read-only)- List/retrieve transcriptions
- Prefetch related objects for performance
- Custom
wordsaction with CEFR filtering - Custom
statisticsaction
-
WordViewSet- Word database (read-only)- List/retrieve words
- Filter by CEFR level
- Search by word text or lemma
-
-
3.2.3 Implement function-based views
-
api_root- API overview endpoint -
upload_audio- Simple upload endpoint -
get_status- Processing status endpoint
-
-
3.2.4 Add request validation
- File size validation (100MB max)
- File format validation
- CEFR level parameter validation
-
3.2.5 Implement response formatting
- Consistent JSON structure
- Error messages
- Pagination for lists
Status: ✅ COMPLETE
Files Created:
backend/transcription/views.py(~270 lines)
-
3.3.1 Create app URLs
- Created
transcription/urls.py - Registered ViewSets with router
- Added custom endpoints
- Created
-
3.3.2 Configure main URLs
- Updated
config/urls.py - Included app URLs under
/api/ - Added admin URLs
- Configured media file serving for development
- Updated
-
3.3.3 URL structure created:
/api/ - API root /api/upload/ - Upload audio /api/status/<id>/ - Check status /api/audio/ - List audio files /api/audio/<id>/ - Audio details /api/audio/<id>/status/ - Status action /api/transcriptions/ - List transcriptions /api/transcriptions/<id>/ - Transcription details /api/transcriptions/<id>/words/ - Get words (with CEFR filter) /api/transcriptions/<id>/statistics/ - Get statistics /api/words/ - List all words /api/words/<id>/ - Word details /admin/ - Django admin
Status: ✅ COMPLETE
Files Created:
backend/transcription/urls.pybackend/config/urls.py(updated)
- 3.4.1 Create API documentation
- Created
docs/API.md - Document all endpoints with examples
- Request/response formats
- Error handling guide
- cURL examples
- JavaScript/Axios examples
- Query parameters documentation
- Status codes and error messages
- Complete workflow examples
- Pagination documentation
- Created
Status: ✅ COMPLETE
Note: Keep API.md updated when adding new endpoints.
-
4.1.1 Create React app
- Created React app with TypeScript template
- Cleaned up boilerplate code
-
4.1.2 Install dependencies
- Installed Axios for API calls
- Installed React Router for navigation
- Installed Material-UI (@mui/material, @emotion/react, @emotion/styled)
- Installed MUI Icons (@mui/icons-material)
-
4.1.3 Configure environment
- Created
.envfile withREACT_APP_API_URL - Configured for development
- Created
-
4.1.4 Set up project structure
src/ ├── components/ ✅ - Reusable components ├── pages/ ✅ - Page components ├── services/ ✅ - API services ├── hooks/ ✅ - Custom React hooks (directory created) ├── types/ ✅ - TypeScript interfaces ├── utils/ ✅ - Utility functions ├── theme/ ✅ - MUI theme configuration └── App.tsx ✅ - Main app component
Status: ✅ COMPLETE
Files Created:
frontend/.envfrontend/src/types/frontend/src/services/frontend/src/components/frontend/src/pages/frontend/src/utils/frontend/src/theme/
-
4.2.1 Create types file
- Created
src/types/index.ts
- Created
-
4.2.2 Define TypeScript interfaces
interface AudioFile { id: number; original_filename: string; file_size_mb: number; duration?: number; status: 'pending' | 'processing' | 'transcribing' | 'analyzing' | 'completed' | 'failed'; error_message?: string; uploaded_at: string; processing_time?: number; } interface Transcription { id: number; audio_file: AudioFile; text: string; language: string; word_count: number; unique_word_count: number; statistics?: WordStatistics; } interface Word { id: number; text: string; lemma: string; cefr_level: string; cefr_level_display: string; global_frequency: number; } interface ExtractedWord { id: number; word: Word; context: string; timestamp?: number; position: number; frequency: number; } interface WordStatistics { id: number; a1_count: number; a2_count: number; b1_count: number; b2_count: number; c1_count: number; c2_count: number; unknown_count: number; total_words: number; level_distribution: { A1: number; A2: number; B1: number; B2: number; C1: number; C2: number; Unknown: number; }; } interface ProcessingStatus { id: number; status: string; progress: number; error_message?: string; has_transcription: boolean; transcription_id?: number; }
Status: ✅ COMPLETE
Files Created:
frontend/src/types/index.ts(~90 lines with all interfaces including AudioFile, Transcription, Word, ExtractedWord, WordStatistics, ProcessingStatus, UploadResponse, ApiError)
-
4.3.1 Create API client
- Created
src/services/api.ts - Configured Axios instance with base URL
- Added request/response interceptors
- Added comprehensive error handling
- Configured 30-second timeout
- Created
-
4.3.2 Create audio service
- Created
src/services/audioService.ts - Implemented all API methods:
uploadAudio()- Upload file with FormDatagetAudioStatus()- Get processing statusgetAudioFile()- Get audio file detailsgetTranscription()- Get transcriptiongetWords()- Get words with optional CEFR filteringgetStatistics()- Get word statisticspollStatus()- Auto-polling utility function
- Created
-
4.3.3 Add error handling
- Handle network errors
- Handle API errors
- Parse error messages
- User-friendly error messages with ApiError interface
Status: ✅ COMPLETE
Files Created:
frontend/src/services/api.ts(~45 lines)frontend/src/services/audioService.ts(~105 lines)
Priority: HIGH - These are the main UI components
-
4.4.1 Create AudioUpload component
- Created
src/components/AudioUpload.tsx - Implemented drag-and-drop zone
- Added file input button
- Show file preview (name, size)
- File validation (format, size)
- Upload progress indicator
- Error display with Alert
- Success callback to parent
- Created
-
4.4.2 Create StatusIndicator component
- Created
src/components/StatusIndicator.tsx - Progress bar with percentage
- Status messages (pending, processing, transcribing, analyzing, completed, failed)
- Error display
- Completion notification
- Auto-refresh/polling logic (2 second intervals)
- Status icons (CheckCircle, Error, Hourglass)
- Created
-
4.4.3 Create TranscriptionView component
- Created
src/components/TranscriptionView.tsx - Display full transcription text
- Scrollable text area (max 400px height)
- Copy to clipboard button
- Download as text file button
- Search functionality with highlighting
- Shows language, word count, unique word count
- Created
-
4.4.4 Create WordList component
- Created
src/components/WordList.tsx - Display words grouped by CEFR level with Accordions
- Show word frequency with chips
- Display word context in tooltips
- Filter by CEFR level with ToggleButtons
- Sort options (frequency, alphabetical)
- Search within words (text and lemma)
- Expandable sections per level
- Shows lemma when different from text
- Created
-
4.4.5 Create Statistics component
- Created
src/components/Statistics.tsx - Display word count by level with color-coded boxes
- Show distribution percentages
- Total words summary
- Visual representation with color boxes
- CEFR level legend
- Created
-
4.4.6 Create Layout components
- Created
src/components/Layout.tsx- Main layout with flex column - Created
src/components/Header.tsx- App header with logo and title - Created
src/components/Footer.tsx- App footer with copyright
- Created
Status: ✅ COMPLETE
Files Created:
frontend/src/components/AudioUpload.tsx(~180 lines)frontend/src/components/StatusIndicator.tsx(~140 lines)frontend/src/components/TranscriptionView.tsx(~135 lines)frontend/src/components/WordList.tsx(~245 lines)frontend/src/components/Statistics.tsx(~90 lines)frontend/src/components/Layout.tsx(~35 lines)frontend/src/components/Header.tsx(~30 lines)frontend/src/components/Footer.tsx(~30 lines)
-
4.5.1 Create Home page
- Created
src/pages/Home.tsx - Includes AudioUpload component
- Includes StatusIndicator component
- Handles file upload with state management
- Navigates to results on completion
- Error handling display
- Created
-
4.5.2 Create Results page
- Created
src/pages/Results.tsx - Fetches transcription data from API
- Displays TranscriptionView component
- Displays WordList component
- Displays Statistics component
- Handles loading states with CircularProgress
- Handles error states with Alerts
- "New Analysis" button to return home
- Parallel data fetching (transcription, words, statistics)
- Created
-
4.5.3 Create NotFound page
- Created
src/pages/NotFound.tsx - 404 error message with large typography
- "Go to Home" button
- Created
Status: ✅ COMPLETE
Files Created:
frontend/src/pages/Home.tsx(~55 lines)frontend/src/pages/Results.tsx(~145 lines)frontend/src/pages/NotFound.tsx(~45 lines)
-
4.6.1 Set up React Router
- Updated
src/App.tsx - Defined routes:
/- Home page (upload)/results/:id- Results page*- 404 page
- Wrapped app with Layout component
- Added ThemeProvider and CssBaseline
- Updated
-
4.6.2 Create MUI theme
- Created
src/theme/theme.ts - Defined color palette for CEFR levels:
- A1: Green (#4CAF50)
- A2: Light Green (#8BC34A)
- B1: Yellow/Amber (#FFC107)
- B2: Orange (#FF9800)
- C1: Deep Orange (#FF5722)
- C2: Red (#F44336)
- Unknown: Grey (#9E9E9E)
- Configured typography with system fonts
- Configured breakpoints (xs, sm, md, lg, xl)
- Customized component styles (Button, Card, Chip)
- Created
-
4.6.3 Add responsive design
- Used MUI Grid with responsive breakpoints
- Implemented responsive layouts in components
- Container maxWidth for different pages
- Mobile-friendly component designs
Status: ✅ COMPLETE
Files Created/Updated:
frontend/src/App.tsx(updated, ~27 lines)frontend/src/theme/theme.ts(~100 lines)frontend/src/utils/helpers.ts(~150 lines with utility functions)
-
4.7.1 Test complete workflow
- Upload audio file (needs backend running)
- Watch processing status (needs backend running)
- View results (needs backend running)
- Filter words by level (needs backend running)
- Test all interactions (needs backend running)
-
4.7.2 Add loading states
- CircularProgress in Results page
- LinearProgress in upload and status
- Progress indicators with percentages
-
4.7.3 Add error handling
- Network error messages with Alerts
- API error messages with ApiError type
- User-friendly error display
- Error callbacks in components
-
4.7.4 Polish UI
- Consistent spacing with sx props
- Smooth transitions on drag-drop
- Hover effects on buttons
- Focus states
- Mobile-friendly with responsive Grid
Status: ⏳ NEEDS INTEGRATION TESTING (with backend running)
Priority: HIGH - Needed for deployment
Completion Date: October 11-12, 2025
-
5.1.1 Create backend Dockerfile
- Create
backend/Dockerfile(production, 450MB optimized) - Use Python 3.11-slim base image
- Install system dependencies (ffmpeg, libsndfile1)
- Copy requirements and install Python packages
- Multi-stage build for size optimization
- Set up working directory
- Expose port 8000
- Set entrypoint for Gunicorn
- Create
-
5.1.2 Create development Dockerfile
- Create
backend/Dockerfile.dev(development, ~3.5GB with ML stack) - Include full requirements.txt (PyTorch, CUDA, Whisper, Spacy)
- Support for local Whisper transcription
- Development tools and debugging support
- Create
-
5.1.3 Create .dockerignore
- Exclude:
__pycache__,*.pyc,.env,db.sqlite3,media/,staticfiles/,venv/
- Exclude:
-
5.1.4 Optimize image size
- Use multi-stage build (production: 450MB from 3.5GB)
- Clean up apt cache
- Remove unnecessary files
- Non-root user for security
Status: ✅ COMPLETE
Files Created:
backend/Dockerfile(production)backend/Dockerfile.dev(development)backend/.dockerignore
-
5.2.1 Create frontend Dockerfile
- Create
frontend/Dockerfile(production, 25MB optimized) - Use Node.js 18-alpine for build stage
- Use nginx:alpine for production stage
- Build React app with production optimizations
- Copy build to nginx html directory
- Expose port 80
- Create
-
5.2.2 Create nginx configuration
- Create
frontend/nginx.conf - Configure reverse proxy to backend
- Set up client_max_body_size for uploads (100MB)
- Enable gzip compression
- Configure SPA routing with fallback to index.html
- Security headers configured
- Create
-
5.2.3 Create .dockerignore
- Exclude:
node_modules/,build/,.env,.git/
- Exclude:
Status: ✅ COMPLETE
Files Created:
frontend/Dockerfile(production)frontend/nginx.conffrontend/.dockerignore
-
5.3.1 Create docker-compose.yml
- Create
docker-compose.ymlin root directory (production) - Define services:
backend,frontend,db,redis,celery - Configure networks for service isolation
- Set up volumes for persistence:
- PostgreSQL data
- Redis data with AOF persistence
- Media files
- Static files
- Define health checks for all services
- Set environment variables via .env file
- Create
-
5.3.2 Create development docker-compose
- Create
docker-compose.dev.ymlfor development - Hot reload for backend (runserver instead of gunicorn)
- Volume mounts for live code changes
- Django-filter auto-installation on startup
- Development environment variables
- Create
-
5.3.3 Create environment file for Docker
- Environment variables documented in README
- Use Docker service names for hosts (db, redis)
- Groq API key configuration
- Database credentials
- Debug settings
-
5.3.4 Configure service dependencies
- Backend depends on db and redis (with health checks)
- Celery depends on backend and redis
- Frontend depends on backend
- All services start in correct order (30-40s startup time)
Status: ✅ COMPLETE
Files Created:
docker-compose.yml(production)docker-compose.dev.yml(development)- Environment variables documented in docs/DOCKER-QUICKSTART.md
Performance:
- Startup time: 30-40 seconds (all services)
- Resource usage: 3.7GB RAM idle, ~5GB during transcription
- Build time: 706s clean build, 30s with cache
-
5.4.1 Docker Compose commands as deployment interface
- Start services:
docker compose up -d - Stop services:
docker compose down - Rebuild:
docker compose build - View logs:
docker compose logs -f - Automatic migrations on startup via entrypoint scripts
- Automatic static file collection
- Start services:
-
5.4.2 Health checks integrated
- PostgreSQL health check (pg_isready)
- Redis health check (redis-cli ping)
- Backend health check (HTTP endpoint)
- Frontend health check (nginx status)
- All services report health status via
docker compose ps
-
5.4.3 Automatic dependency handling
- Django-filter auto-installed on backend/celery startup
- Services wait for dependencies via health checks
- Graceful degradation on service failures
Status: ✅ COMPLETE (Integrated into Docker Compose)
Implementation:
- Health checks defined in docker-compose.yml
- Startup scripts integrated into Dockerfile entrypoints
- Django migrations run automatically on backend startup
- Static files collected automatically
Note: Traditional shell scripts replaced with Docker Compose orchestration and container entrypoints for better reliability and portability.
-
5.5.1 Create Docker deployment documentation
- Create
docs/DOCKER-QUICKSTART.md(400+ lines comprehensive guide) - Server requirements documented (CPU, RAM, disk)
- Docker installation instructions for Linux/macOS/Windows
- Quick Start guide (3 simple steps)
- Environment configuration with examples
- Architecture diagram included
- Resource management guidelines
- Create
-
5.5.2 Update existing DOCKER.md
- Update
docs/DOCKER.md(600+ lines detailed reference) - Added redirect to DOCKER-QUICKSTART.md
- Marked as complete and operational
- Last updated: October 12, 2025
- Update
-
5.5.3 Document operations
- Common commands (start, stop, logs, rebuild)
- Troubleshooting section (7 common issues with solutions):
- Port conflicts
- Permission errors
- Build failures
- Service crashes
- Database connection issues
- Frontend/backend communication issues
- Whisper transcription issues
- Performance tips (4 optimization strategies)
- Resource usage table (idle vs transcription)
- Whisper model options (tiny/base/small/medium)
- Image sizes documented
- Verification checklist
Status: ✅ COMPLETE
Documentation Created:
docs/DOCKER-QUICKSTART.md(comprehensive, user-friendly guide)docs/DOCKER.md(updated detailed reference)
Tested & Verified:
- ✅ Complete clean restart tested (docker compose down -v)
- ✅ All 5 services running healthy
- ✅ Whisper transcription working (version 20250625)
- ✅ Resource usage measured and documented
- ✅ Startup time: 30-40 seconds
- ✅ Build time: 706s clean, 30s cached
Priority: MEDIUM - Important but can be done in parallel
-
6.1.1 Set up test configuration
- Create
backend/conftest.pyfor pytest - Configure test database
- Create test fixtures
- Create
-
6.1.2 Write model tests
- Create
transcription/tests/test_models.py - Test AudioFile model
- Test Transcription model
- Test Word model
- Test ExtractedWord model
- Test WordStatistics model
- Test model relationships
- Test model methods
- Create
-
6.1.3 Write service tests
- Create
transcription/tests/test_services.py - Mock Whisper service
- Mock Groq service
- Test WordProcessor logic
- Test error handling
- Create
-
6.1.4 Write API tests
- Create
transcription/tests/test_api.py - Test file upload endpoint
- Test status endpoint
- Test transcription endpoints
- Test words endpoints
- Test error responses
- Create
-
6.1.5 Run tests and check coverage
- Run:
pytest - Run:
pytest --cov - Aim for >80% code coverage
- Run:
Status: ❌ NOT STARTED
Files to Create:
backend/conftest.pybackend/transcription/tests/test_models.pybackend/transcription/tests/test_services.pybackend/transcription/tests/test_api.py
-
6.2.1 Set up testing environment
- React Testing Library is included with CRA
- Install additional tools if needed
-
6.2.2 Write component tests
- Test AudioUpload component
- Test StatusIndicator component
- Test TranscriptionView component
- Test WordList component
- Test Statistics component
- Mock API calls
-
6.2.3 Write service tests
- Test API service methods
- Test error handling
- Mock axios requests
-
6.2.4 Run tests
- Run:
npm test - Check coverage:
npm test -- --coverage
- Run:
Status: ❌ NOT STARTED
-
6.3.1 Test complete upload workflow
- Upload audio file
- Verify processing
- Check transcription result
- Verify word extraction
- Check statistics
-
6.3.2 Test error scenarios
- Invalid file format
- File too large
- Network errors
- API failures
- Processing failures
-
6.3.3 Test performance
- Upload large files (50MB+)
- Process multiple files concurrently
- Monitor memory usage
- Check response times
-
6.3.4 Create test data
- Prepare sample audio files
- Create test cases document
Status: ❌ NOT STARTED
-
7.1.1 Write README.md
- Project description
- Features list
- Tech stack
- Quick start guide
- Project structure overview
-
7.1.2 Create SETUP.md
- Prerequisites
- Local development setup
- Environment variables
- Database setup
- Running tests
- Common issues and solutions
-
7.1.3 Create API documentation
- Created
docs/API.md - Document all endpoints
- Include request/response examples
- Document error codes
- Example workflows
- Created
Status: ✅ COMPLETE
Note: Keep README.md and SETUP.md updated with new features.
-
7.2.1 Write DEPLOYMENT.md
- Server requirements (CPU, RAM, disk)
- Docker installation
- Docker Compose setup
- Environment configuration
- SSL/TLS setup (using Let's Encrypt)
- Domain and DNS configuration
- Nginx configuration
- Security best practices
-
7.2.2 Create operations guide
- Monitoring and logging setup
- Backup and restore procedures
- Scaling guidelines
- Troubleshooting common issues
- Log locations and analysis
-
7.2.3 Create upgrade guide
- Version update procedures
- Database migration steps
- Rollback procedures
- Zero-downtime deployment
Status: ❌ NOT STARTED
Files to Create:
docs/DEPLOYMENT.md
-
7.3.1 Create user guide
- Create
docs/USER_GUIDE.md - How to upload audio
- How to interpret results
- CEFR level explanations
- FAQ section
- Tips for best results
- Create
-
7.3.2 Create screenshots
- Take screenshots of all major features
- Add to documentation
- Create demo video (optional)
Status: ❌ NOT STARTED
Files to Create:
docs/USER_GUIDE.md
-
8.1.1 Run all tests
- Backend unit tests
- Frontend unit tests
- Integration tests
- Fix any failing tests
-
8.1.2 Manual testing
- Test on different browsers (Chrome, Firefox, Safari)
- Test on mobile devices
- Test with various audio files
- Test error scenarios
- Test edge cases
-
8.1.3 Performance testing
- Load testing with multiple users
- Test with large audio files (50MB+)
- Monitor resource usage
- Optimize if needed
-
8.1.4 Security testing
- Test file upload restrictions
- Check for SQL injection vulnerabilities
- Verify CORS configuration
- Test error messages (no sensitive info)
Status: ❌ NOT STARTED
-
8.2.1 Prepare server
- Set up Linux server (Ubuntu 22.04 recommended)
- Install Docker and Docker Compose
- Configure firewall (allow ports 80, 443)
- Set up domain name (optional)
-
8.2.2 Clone repository
- SSH into server
- Clone Git repository
- Checkout main branch
-
8.2.3 Configure environment
- Copy
.env.exampleto.env - Set production values
- Set strong SECRET_KEY
- Configure GROQ_API_KEY
- Set ALLOWED_HOSTS
- Configure database credentials
- Copy
-
8.2.4 Build and start services
- Run:
docker-compose build - Run:
docker-compose up -d - Check service status:
docker-compose ps
- Run:
-
8.2.5 Run migrations and setup
- Run:
docker-compose exec backend python manage.py migrate - Create superuser:
docker-compose exec backend python manage.py createsuperuser - Collect static files:
docker-compose exec backend python manage.py collectstatic
- Run:
-
8.2.6 Configure SSL (optional but recommended)
- Install Certbot
- Generate Let's Encrypt certificate
- Update nginx configuration
- Enable HTTPS redirect
-
8.2.7 Set up monitoring
- Configure log aggregation
- Set up alerts for errors
- Monitor resource usage
Status: ❌ NOT STARTED
-
8.3.1 Monitor application
- Check application logs
- Check Celery logs
- Check database logs
- Monitor for errors
-
8.3.2 Monitor resource usage
- Check CPU usage
- Check memory usage
- Check disk space
- Check network traffic
-
8.3.3 Test core functionality
- Upload test audio file
- Verify transcription
- Check word extraction
- Verify results display
-
8.3.4 Set up automated backups
- Configure daily database backups
- Set up backup retention policy (keep 7 days)
- Test restore procedure
-
8.3.5 Document any issues
- Create issue tracker (GitHub Issues)
- Document bugs and feature requests
- Prioritize fixes
Status: ❌ NOT STARTED
Overall: 88% Complete
Backend Setup: ████████████████████████████ 100% ✅
Database Models: ████████████████████████████ 100% ✅
Backend Services: ████████████████████████████ 100% ✅
Celery Tasks: ████████████████████████████ 100% ✅
REST API: ████████████████████████████ 100% ✅
Admin Interface: ████████████████████████████ 100% ✅
Frontend Setup: ████████████████████████████ 100% ✅
Frontend Dev: ███████████████████████████▒ 95% ✅
Docker Config: ████████████████████████████ 100% ✅
Testing: ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 0% ❌
Documentation: ███████████████████████████▒ 95% ✅
-
Project Foundation (100%)
- Directory structure
- Git repository
- Environment configuration
- Documentation structure
-
Backend (100%)
- Django project with split settings
- 5 database models
- Celery configuration
- Admin interface
- 3 backend services (Whisper, Groq, WordProcessor)
- 4 Celery tasks
- Complete REST API (11 endpoints)
- Serializers and views
- URL routing
-
Documentation (95%)
- README.md (updated with Docker)
- docs/API.md (complete)
- docs/SETUP.md (complete)
- docs/DOCKER-QUICKSTART.md (complete, 400+ lines)
- docs/DOCKER.md (complete, 600+ lines)
- docs/ARCHITECTURE.md (complete)
- docs/GROQ_SETUP.md (complete)
-
Frontend (95%)
- React app initialized with TypeScript
- All dependencies installed (Axios, React Router, MUI)
- TypeScript types and interfaces
- API service layer with error handling
- MUI theme with CEFR colors
- Utility helpers
- 8 components (AudioUpload, StatusIndicator, TranscriptionView, WordList, Statistics, Layout, Header, Footer)
- 3 pages (Home, Results, NotFound)
- Routing configured
- Responsive design implemented
-
Frontend (5%)
- Integration testing with backend (needs backend + Celery + Redis running)
- Minor UI refinements
-
Testing (0%)
- Backend unit tests
- Frontend unit tests
- Integration tests
- End-to-end tests
-
Testing (100%)
- Backend unit tests
- Frontend unit tests
- Integration tests
-
Final Launch (100%)
- Final testing
- Server deployment
- Monitoring setup
- Frontend Integration Testing: 30 minutes (with backend running)
- Docker Configuration: 1-2 hours
- Testing: 2-3 hours
- Deployment & Polish: 1-2 hours
Total Remaining: ~4-8 hours to complete Phase 1 MVP
Goal: Add video support, user authentication, and improved UX
Estimated Time: 2-3 weeks
- Video upload and audio extraction (FFmpeg)
- User authentication (JWT)
- User dashboard with processing history
- Enhanced UI/UX with animations
- Export functionality (PDF, CSV)
- Word cloud visualization
- Statistics charts
Status: ❌ NOT STARTED (will be detailed when Phase 1 is complete)
Goal: Replace external APIs with local solutions
Estimated Time: 2-3 weeks
- Local LLM integration (replace Groq)
- Performance optimizations
- Advanced caching
- GPU acceleration
- Model optimization
Status: ❌ NOT STARTED (will be detailed when Phase 2 is complete)
Goal: Production-grade features and scaling
Estimated Time: 3-4 weeks
- Multi-language support
- Kubernetes deployment
- Advanced analytics
- Public API with authentication
- Webhook support
- CI/CD pipeline
- Monitoring (Prometheus, Grafana)
Status: ❌ NOT STARTED (will be detailed when Phase 3 is complete)
If you need to resume this project, share this instruction:
"Read PROJECT_OUTLINE.md and continue with the unchecked tasks. Start with Section 5 (Docker Configuration) as backend and frontend are complete."
Current Focus:
- Section 5: Docker Configuration (Backend + Frontend Dockerfiles, docker-compose.yml)
- Section 6: Testing & Quality Assurance
- Section 8: Final Integration & Launch
Priority Order:
- Docker Configuration (Section 5) - HIGH PRIORITY
- Integration Testing with Backend Running (Section 4.7.1)
- Testing & QA (Section 6)
- Final Launch (Section 8)
- Backend can transcribe audio files
- Words are classified by CEFR level
- REST API is functional
- Frontend displays results
- Application is dockerized
- Deployment documentation is complete
- Proper error handling
- Comprehensive logging
- Type hints (Python)
- Docstrings
- Type safety (TypeScript)
- Test coverage >80%
- Whisper Model: Using 'base' model for MVP (good speed/accuracy balance)
- LLM: Groq API for Phase 1 (fast), will replace with local LLM in Phase 3
- Word Processing: Basic for MVP, optional spaCy for advanced features
- Caching: Word classifications cached in database to reduce API calls
- Batch Size: 50 words per Groq API call (balances efficiency and token limits)
- Frontend: React + TypeScript + Material-UI for modern, type-safe UI
- Deployment: Docker Compose for easy deployment and scaling
- TODO in tasks.py: Calculate actual word position in transcription (line 228)
- Missing spaCy model: Need to download
en_core_web_smif using spaCy - No authentication: Phase 2 will add JWT authentication
- No rate limiting: Phase 2 will add rate limiting
- Single language: English only, Phase 4 for multi-language
- External API dependency: Groq API, Phase 3 for local LLM
- API Documentation:
docs/API.md - Setup Guide:
docs/SETUP.md - Django Docs: https://docs.djangoproject.com/
- React Docs: https://react.dev/
- Material-UI: https://mui.com/
Backend:
cd backend
source venv/bin/activate
python manage.py runserver # Start Django
celery -A config worker -l info # Start Celery
python manage.py shell # Django shell
pytest # Run tests
Frontend:
cd frontend
npm start # Start dev server
npm test # Run tests
npm run build # Build for production
Docker:
docker-compose up --build # Build and start all services
docker-compose down # Stop all services
docker-compose logs -f backend # View backend logs
docker-compose ps # Check service status
Last Updated: October 8, 2025
Version: 1.1
Maintainer: Project Team
Status: ✅ Ready for Frontend Development!
END OF PROJECT OUTLINE Django>=4.2.0 djangorestframework>=3.14.0 django-cors-headers>=4.3.0 python-dotenv>=1.0.0 openai-whisper>=20231117 groq>=0.4.0 celery>=5.3.0 redis>=5.0.0 psycopg2-binary>=2.9.9 gunicorn>=21.2.0 ```
- 1.2.2 Create Django app for core functionality
- Run:
python manage.py startapp transcription - Register app in
config/settings.py
- Run:
- 1.2.3 Configure Django settings
- Create
config/settings/directory - Split settings into:
base.py,development.py,production.py - Configure CORS settings
- Set up media file handling for uploads
- Configure REST framework
- Create
- 1.2.4 Set up environment variables
- Create
.env.examplefile - Add:
SECRET_KEY,DEBUG,ALLOWED_HOSTS,GROQ_API_KEY,DATABASE_URL
- Create
Acceptance Criteria:
- Django project runs successfully
- Settings are properly configured
- Environment variables are set up
Task: Configure PostgreSQL database
Steps:
- 1.3.1 Create database models
- Create
transcription/models.pywith:AudioFilemodel (file, uploaded_at, status, user)Transcriptionmodel (audio_file, text, language, created_at)Wordmodel (text, cefr_level, frequency)ExtractedWordmodel (transcription, word, timestamp, context)
- Create
- 1.3.2 Create and run migrations
- Run:
python manage.py makemigrations - Run:
python manage.py migrate
- Run:
- 1.3.3 Create database indexes
- Add indexes for frequently queried fields
- Add full-text search indexes
Acceptance Criteria:
- Models are created and migrated
- Database schema is properly indexed
Task: Set up Celery for asynchronous task processing
Steps:
- 1.4.1 Create Celery configuration
- Create
config/celery.py - Configure Celery with Redis broker
- Set up task routing
- Create
- 1.4.2 Create tasks module
- Create
transcription/tasks.py - Define task:
process_audio_file - Define task:
transcribe_audio - Define task:
extract_and_classify_words
- Create
- 1.4.3 Configure Celery settings
- Set task time limits
- Configure result backend
- Set up task queues
Acceptance Criteria:
- Celery is properly configured
- Tasks can be queued and executed
- Redis connection works
Task: Integrate local Whisper for audio transcription
Steps:
- 2.1.1 Create Whisper service module
- Create
transcription/services/whisper_service.py - Implement
WhisperTranscriberclass - Add model initialization (use 'base' model for MVP)
- Implement transcription method with word-level timestamps
- Create
- 2.1.2 Handle audio file preprocessing
- Install ffmpeg for audio conversion
- Create audio format validation
- Implement audio file conversion to WAV
- 2.1.3 Implement transcription task
- Load audio file
- Run Whisper transcription
- Extract word-level timestamps
- Store transcription in database
- 2.1.4 Add error handling
- Handle unsupported audio formats
- Handle corrupted files
- Handle timeout scenarios
- Log errors properly
Acceptance Criteria:
- Whisper successfully transcribes audio files
- Word-level timestamps are captured
- Errors are handled gracefully
Task: Integrate Groq API for word classification
Steps:
- 2.2.1 Create Groq service module
- Create
transcription/services/groq_service.py - Implement
GroqClassifierclass - Configure API client with retry logic
- Create
- 2.2.2 Design prompt for word classification
- Create prompt template for CEFR classification
- Include context about CEFR levels (A1-C2)
- Request structured JSON output
- Example prompt:
Classify the following words by CEFR level (A1, A2, B1, B2, C1, C2). Return JSON format: {"word": "level"} Words: [list of words]
- 2.2.3 Implement word extraction and classification
- Extract unique words from transcription
- Filter out common stop words
- Batch words for API efficiency (max 50 words per request)
- Call Groq API with classification prompt
- Parse and validate JSON response
- 2.2.4 Implement caching mechanism
- Cache classified words in database
- Check cache before API calls
- Update cache with new classifications
- 2.2.5 Add rate limiting and error handling
- Implement exponential backoff
- Handle API rate limits
- Log API errors
Acceptance Criteria:
- Words are successfully classified by CEFR level
- Caching reduces API calls
- Rate limiting prevents API errors
Task: Implement word extraction and filtering logic
Steps:
- 2.3.1 Create word processing service
- Create
transcription/services/word_processor.py - Implement text tokenization
- Implement lemmatization (use spaCy or NLTK)
- Create
- 2.3.2 Implement filtering logic
- Remove punctuation and special characters
- Convert to lowercase
- Filter stop words
- Remove numbers
- Keep only alphabetic words
- 2.3.3 Extract word context
- For each word, extract surrounding sentence
- Store context with word reference
- Link to timestamp in audio
- 2.3.4 Calculate word statistics
- Count word frequency in transcription
- Calculate unique words
- Group by CEFR level
Acceptance Criteria:
- Words are properly extracted and cleaned
- Context is captured for each word
- Statistics are calculated correctly
Task: Create REST API endpoints for frontend
Steps:
- 2.4.1 Create serializers
- Create
transcription/serializers.py AudioFileSerializerTranscriptionSerializerExtractedWordSerializerWordStatisticsSerializer
- Create
- 2.4.2 Create API views
- Create
transcription/views.py AudioFileUploadView(POST)TranscriptionDetailView(GET)WordsByLevelView(GET)ProcessingStatusView(GET)
- Create
- 2.4.3 Configure URL routing
- Create
transcription/urls.py - Routes:
POST /api/upload/- Upload audio fileGET /api/transcription/<id>/- Get transcriptionGET /api/words/<id>/?level=<cefr>- Get words by levelGET /api/status/<id>/- Get processing status
- Create
- 2.4.4 Add request validation
- Validate file size (max 100MB for MVP)
- Validate file format (mp3, wav, m4a)
- Validate CEFR level parameter
- 2.4.5 Implement response formatting
- Return consistent JSON structure
- Include error messages
- Add pagination for word lists
Acceptance Criteria:
- All endpoints are functional
- Request validation works
- Response format is consistent
Task: Initialize React frontend application
Steps:
- 3.1.1 Create React app
- Navigate to
frontend/directory - Run:
npx create-react-app . --template typescript - Clean up boilerplate code
- Navigate to
- 3.1.2 Install dependencies
- Install Axios:
npm install axios - Install React Router:
npm install react-router-dom - Install UI library:
npm install @mui/material @emotion/react @emotion/styled - Install icons:
npm install @mui/icons-material
- Install Axios:
- 3.1.3 Configure proxy for development
- Add proxy in
package.json:"proxy": "http://backend:8000" - Create
.envfile withREACT_APP_API_URL
- Add proxy in
- 3.1.4 Set up project structure
src/ ├── components/ ├── pages/ ├── services/ ├── hooks/ ├── types/ ├── utils/ └── App.tsx
Acceptance Criteria:
- React app runs successfully
- Dependencies are installed
- Project structure is organized
Task: Create API service for backend communication
Steps:
- 3.2.1 Create API client
- Create
src/services/api.ts - Configure Axios instance with base URL
- Add request/response interceptors
- Create
- 3.2.2 Create TypeScript interfaces
- Create
src/types/index.ts - Define:
AudioFile,Transcription,Word,WordStatistics
- Create
- 3.2.3 Implement API methods
uploadAudio(file: File): Promise<AudioFile>getTranscription(id: string): Promise<Transcription>getWordsByLevel(id: string, level: string): Promise<Word[]>getProcessingStatus(id: string): Promise<Status>
- 3.2.4 Add error handling
- Handle network errors
- Handle API errors
- Parse error messages
Acceptance Criteria:
- API service communicates with backend
- TypeScript types are defined
- Error handling works
Task: Create audio file upload interface
Steps:
- 3.3.1 Create upload component
- Create
src/components/AudioUpload.tsx - Implement drag-and-drop zone
- Add file input button
- Show file preview
- Create
- 3.3.2 Implement file validation
- Check file format (mp3, wav, m4a)
- Check file size (max 100MB)
- Show validation errors
- 3.3.3 Add CEFR level selector
- Create dropdown with A1-C2 options
- Allow multiple level selection
- Set default to all levels
- 3.3.4 Implement upload progress
- Show upload progress bar
- Display processing status
- Handle upload cancellation
- 3.3.5 Add loading states
- Show spinner during upload
- Disable submit during processing
- Show success/error messages
Acceptance Criteria:
- File upload works smoothly
- Validation prevents invalid uploads
- Progress is visible to user
Task: Create component to display transcription and word results
Steps:
- 3.4.1 Create results page
- Create
src/pages/Results.tsx - Fetch data on component mount
- Handle loading state
- Create
- 3.4.2 Create transcription display
- Create
src/components/TranscriptionView.tsx - Show full transcription text
- Make text scrollable
- Add copy button
- Create
- 3.4.3 Create word list component
- Create
src/components/WordList.tsx - Display words grouped by CEFR level
- Show word frequency
- Display word context on hover
- Create
- 3.4.4 Add filtering and sorting
- Filter by CEFR level
- Sort by frequency or alphabetically
- Search within words
- 3.4.5 Implement word highlighting
- Highlight selected level words in transcription
- Color-code by CEFR level
- Add click-to-highlight functionality
Acceptance Criteria:
- Results are displayed clearly
- Filtering and sorting work
- User can interact with words
Task: Create app layout and navigation
Steps:
- 3.5.1 Create header component
- Create
src/components/Header.tsx - Add app logo and title
- Add navigation links (for future phases)
- Create
- 3.5.2 Create main layout
- Create
src/components/Layout.tsx - Include header
- Add main content area
- Add footer with credits
- Create
- 3.5.3 Set up routing
- Create
src/App.tsxwith routes - Route:
/- Upload page - Route:
/results/:id- Results page - Route:
*- 404 page
- Create
- 3.5.4 Add responsive design
- Make layout mobile-friendly
- Test on different screen sizes
- Use Material-UI breakpoints
Acceptance Criteria:
- Navigation works correctly
- Layout is responsive
- UI is consistent across pages
Task: Create Dockerfile for Django backend
Steps:
- 4.1.1 Create backend Dockerfile
- Create
backend/Dockerfile - Use Python 3.11 base image
- Install system dependencies (ffmpeg, libsndfile1)
- Copy requirements and install Python packages
- Download Whisper model during build
- Set up working directory
- Expose port 8000
- Create
- 4.1.2 Create .dockerignore
- Exclude:
__pycache__,*.pyc,.env,db.sqlite3,media/,staticfiles/
- Exclude:
- 4.1.3 Optimize image size
- Use multi-stage build
- Clean up apt cache
- Remove unnecessary files
Dockerfile Structure:
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Download Whisper model
RUN python -c "import whisper; whisper.load_model('base')"
# Copy application
COPY . .
# Run migrations and collect static files
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000"]
Acceptance Criteria:
- Backend Docker image builds successfully
- Image includes all dependencies
- Whisper model is pre-downloaded
Task: Create Dockerfile for React frontend
Steps:
- 4.2.1 Create frontend Dockerfile
- Create
frontend/Dockerfile - Use Node.js 18 for build stage
- Use nginx for production stage
- Build React app
- Copy build to nginx html directory
- Create
- 4.2.2 Create nginx configuration
- Create
frontend/nginx.conf - Configure reverse proxy to backend
- Set up client_max_body_size for uploads
- Enable gzip compression
- Create
- 4.2.3 Create .dockerignore
- Exclude:
node_modules/,build/,.env
- Exclude:
Dockerfile Structure:
# Build stage
FROM node:18-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
COPY --from=build /app/build /usr/share/nginx/html
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Acceptance Criteria:
- Frontend Docker image builds successfully
- Nginx serves React app
- API proxy works correctly
Task: Create docker-compose.yml for orchestration
Steps:
- 4.3.1 Create docker-compose.yml
- Define services:
backend,frontend,db,redis,celery,celery-beat - Configure networks
- Set up volumes for persistence
- Define health checks
- Define services:
- 4.3.2 Configure environment variables
- Create
.env.dockerfile - Set database credentials
- Set Groq API key
- Set Django secret key
- Create
- 4.3.3 Set up volumes
- PostgreSQL data volume
- Redis data volume
- Media files volume
- Static files volume
- 4.3.4 Configure service dependencies
- Backend depends on db and redis
- Celery depends on backend and redis
- Frontend depends on backend
Docker Compose Structure:
version: '3.8'
services:
db:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
- POSTGRES_DB=${DB_NAME}
- POSTGRES_USER=${DB_USER}
- POSTGRES_PASSWORD=${DB_PASSWORD}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
backend:
build: ./backend
command: gunicorn config.wsgi:application --bind 0.0.0.0:8000 --workers 4
volumes:
- ./backend:/app
- media_files:/app/media
- static_files:/app/staticfiles
environment:
- DATABASE_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db:5432/${DB_NAME}
- REDIS_URL=redis://redis:6379/0
- GROQ_API_KEY=${GROQ_API_KEY}
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
ports:
- "8000:8000"
celery:
build: ./backend
command: celery -A config worker -l info
volumes:
- ./backend:/app
- media_files:/app/media
environment:
- DATABASE_URL=postgresql://${DB_USER}:${DB_PASSWORD}@db:5432/${DB_NAME}
- REDIS_URL=redis://redis:6379/0
- GROQ_API_KEY=${GROQ_API_KEY}
depends_on:
- backend
- redis
frontend:
build: ./frontend
ports:
- "80:80"
depends_on:
- backend
volumes:
postgres_data:
redis_data:
media_files:
static_files:
Acceptance Criteria:
- All services start successfully
- Services can communicate with each other
- Data persists in volumes
Task: Create deployment and management scripts
Steps:
- 4.4.1 Create startup script
- Create
scripts/start.sh - Check environment variables
- Run migrations
- Collect static files
- Start services
- Create
- 4.4.2 Create deployment script
- Create
scripts/deploy.sh - Pull latest changes
- Build Docker images
- Run database migrations
- Restart services with zero downtime
- Create
- 4.4.3 Create backup script
- Create
scripts/backup.sh - Backup PostgreSQL database
- Backup media files
- Create timestamped archives
- Create
- 4.4.4 Create health check script
- Create
scripts/healthcheck.sh - Check service status
- Verify database connection
- Test API endpoints
- Create
- 4.4.5 Make scripts executable
- Run:
chmod +x scripts/*.sh
- Run:
Acceptance Criteria:
- Scripts execute without errors
- Services start and stop correctly
- Backups are created successfully
Task: Write unit tests for backend services
Steps:
- 5.1.1 Set up test configuration
- Create
backend/conftest.pyfor pytest - Configure test database
- Create test fixtures
- Create
- 5.1.2 Write model tests
- Create
transcription/tests/test_models.py - Test model creation
- Test model relationships
- Test model methods
- Create
- 5.1.3 Write service tests
- Create
transcription/tests/test_services.py - Mock Whisper service
- Mock Groq service
- Test word processing logic
- Create
- 5.1.4 Write API tests
- Create
transcription/tests/test_api.py - Test file upload endpoint
- Test retrieval endpoints
- Test error handling
- Create
- 5.1.5 Run tests
- Install pytest:
pip install pytest pytest-django - Run:
pytest - Aim for >80% code coverage
- Install pytest:
Acceptance Criteria:
- All tests pass
- Code coverage is >80%
- Edge cases are tested
Task: Write unit tests for React components
Steps:
- 5.2.1 Set up testing environment
- React Testing Library is included with CRA
- Install additional tools if needed
- 5.2.2 Write component tests
- Test
AudioUploadcomponent - Test
TranscriptionViewcomponent - Test
WordListcomponent - Mock API calls
- Test
- 5.2.3 Write service tests
- Test API service methods
- Test error handling
- Mock axios requests
- 5.2.4 Run tests
- Run:
npm test - Check coverage:
npm test -- --coverage
- Run:
Acceptance Criteria:
- Component tests pass
- User interactions are tested
- API mocking works correctly
Task: Test end-to-end workflows
Steps:
- 5.3.1 Test complete upload workflow
- Upload audio file
- Verify processing
- Check transcription result
- Verify word extraction
- 5.3.2 Test error scenarios
- Invalid file format
- File too large
- Network errors
- API failures
- 5.3.3 Test performance
- Upload large files
- Process multiple files concurrently
- Monitor memory usage
- 5.3.4 Create test data
- Prepare sample audio files
- Create test cases document
Acceptance Criteria:
- End-to-end workflows work
- Error handling is robust
- Performance is acceptable
Task: Create comprehensive development documentation
Steps:
- 6.1.1 Write README.md
- Project description
- Features list
- Tech stack
- Quick start guide
- Project structure overview
- 6.1.2 Create SETUP.md
- Prerequisites
- Local development setup
- Environment variables
- Database setup
- Running tests
- 6.1.3 Create API documentation
- Create
docs/API.md - Document all endpoints
- Include request/response examples
- Document error codes
- Create
- 6.1.4 Create architecture documentation
- Create
docs/ARCHITECTURE.md - System architecture diagram
- Data flow diagram
- Technology decisions
- Create
Acceptance Criteria:
- Documentation is complete
- Examples are accurate
- New developers can follow setup
Task: Create deployment and operations documentation
Steps:
- 6.2.1 Write DEPLOYMENT.md
- Server requirements (CPU, RAM, disk)
- Docker installation
- Docker Compose setup
- Environment configuration
- SSL/TLS setup (using Let's Encrypt)
- 6.2.2 Create operations guide
- Create
docs/OPERATIONS.md - Monitoring and logging
- Backup and restore procedures
- Scaling guidelines
- Troubleshooting common issues
- Create
- 6.2.3 Create upgrade guide
- Version update procedures
- Database migration steps
- Rollback procedures
- 6.2.4 Security guidelines
- Create
docs/SECURITY.md - Environment variable security
- API key management
- File upload security
- Rate limiting
- Create
Acceptance Criteria:
- Deployment steps are clear
- Operations procedures are documented
- Security guidelines are comprehensive
Task: Create end-user documentation
Steps:
- 6.3.1 Create user guide
- Create
docs/USER_GUIDE.md - How to upload audio
- How to interpret results
- CEFR level explanations
- FAQ section
- Create
- 6.3.2 Create video tutorial
- Record screen capture of workflow
- Add narration explaining steps
- Upload to YouTube (optional)
- 6.3.3 Create troubleshooting guide
- Common errors and solutions
- Supported file formats
- File size limitations
Acceptance Criteria:
- User guide is easy to follow
- Screenshots/videos are included
- FAQ covers common questions
Task: Perform comprehensive testing before launch
Steps:
- 7.1.1 Run all tests
- Backend unit tests
- Frontend unit tests
- Integration tests
- Fix any failing tests
- 7.1.2 Manual testing
- Test on different browsers
- Test on mobile devices
- Test with various audio files
- Test error scenarios
- 7.1.3 Performance testing
- Load testing with multiple users
- Test with large audio files (50MB+)
- Monitor resource usage
- Optimize if needed
- 7.1.4 Security testing
- Test file upload restrictions
- Test API authentication (for future)
- Check for SQL injection vulnerabilities
- Verify CORS configuration
Acceptance Criteria:
- All tests pass
- No critical bugs found
- Performance is acceptable
- Security is verified
Task: Deploy application to production server
Steps:
- 7.2.1 Prepare server
- Set up Linux server (Ubuntu 22.04 recommended)
- Install Docker and Docker Compose
- Configure firewall (allow ports 80, 443)
- Set up domain name (optional)
- 7.2.2 Clone repository
- SSH into server
- Clone Git repository
- Checkout main branch
- 7.2.3 Configure environment
- Copy
.env.exampleto.env - Set production values
- Set strong SECRET_KEY
- Configure GROQ_API_KEY
- Set ALLOWED_HOSTS
- Copy
- 7.2.4 Build and start services
- Run:
docker-compose build - Run:
docker-compose up -d - Check service status:
docker-compose ps
- Run:
- 7.2.5 Run migrations
- Run:
docker-compose exec backend python manage.py migrate - Create superuser:
docker-compose exec backend python manage.py createsuperuser
- Run:
- 7.2.6 Configure SSL (optional but recommended)
- Install Certbot
- Generate Let's Encrypt certificate
- Update nginx configuration
- Enable HTTPS redirect
- 7.2.7 Set up monitoring
- Install monitoring tools (optional)
- Configure log aggregation
- Set up alerts for errors
Acceptance Criteria:
- Application is accessible via web browser
- All services are running
- SSL/HTTPS is working (if configured)
- Logs are being collected
Task: Monitor application after launch
Steps:
- 7.3.1 Monitor application logs
- Backend logs:
docker-compose logs -f backend - Celery logs:
docker-compose logs -f celery - Database logs:
docker-compose logs -f db - Check for errors
- Backend logs:
- 7.3.2 Monitor resource usage
- Check CPU usage
- Check memory usage
- Check disk space
- Check network traffic
- 7.3.3 Test core functionality
- Upload test audio file
- Verify transcription
- Check word extraction
- Verify results display
- 7.3.4 Set up automated backups
- Configure daily database backups
- Set up backup retention policy
- Test restore procedure
- 7.3.5 Document any issues
- Create issue tracker (GitHub Issues)
- Document bugs and feature requests
- Prioritize fixes
Acceptance Criteria:
- Application runs stably
- No critical errors in logs
- Resources are within limits
- Backups are running
Task: Add video file upload and audio extraction
Steps:
- 1.1.1 Update backend models
- Add
VideoFilemodel - Add video format field
- Link to extracted audio
- Add
- 1.1.2 Implement audio extraction
- Use ffmpeg to extract audio from video
- Support formats: mp4, avi, mov, mkv
- Convert to WAV for processing
- 1.1.3 Update API endpoints
- Modify upload endpoint to accept video
- Add video validation
- Update file size limit
- 1.1.4 Update frontend
- Update upload component for video
- Add video preview
- Show extraction progress
Acceptance Criteria:
- Video files can be uploaded
- Audio is successfully extracted
- Processing continues as with audio
Task: Implement user authentication system
Steps:
- 2.1.1 Set up Django authentication
- Install:
djangorestframework-simplejwt - Create custom User model (if needed)
- Configure JWT authentication
- Install:
- 2.1.2 Create authentication endpoints
- POST
/api/auth/register/ - POST
/api/auth/login/ - POST
/api/auth/logout/ - POST
/api/auth/refresh/
- POST
- 2.1.3 Update models with user relationships
- Add user ForeignKey to AudioFile
- Add user permissions
- 2.1.4 Create frontend authentication
- Create login page
- Create registration page
- Store JWT token
- Add authentication to API calls
- Implement protected routes
Acceptance Criteria:
- Users can register and login
- JWT authentication works
- Files are associated with users
Task: Create dashboard to view processing history
Steps:
- 3.1.1 Create history API endpoint
- GET
/api/history/- List user's files - Add pagination
- Add filtering by date, status
- GET
- 3.1.2 Create dashboard page
- Create
src/pages/Dashboard.tsx - Display list of processed files
- Show processing status
- Add search functionality
- Create
- 3.1.3 Add file management
- View details
- Delete files
- Re-download results
Acceptance Criteria:
- Users can view their history
- Files can be managed
- Dashboard is responsive
Task: Enhance user interface and experience
Steps:
- 4.1.1 Implement better styling
- Create consistent theme
- Add color scheme for CEFR levels
- Improve typography
- 4.1.2 Add animations
- Upload progress animations
- Loading spinners
- Smooth transitions
- 4.1.3 Improve results visualization
- Add word cloud
- Add statistics charts (using Chart.js)
- Add exportable reports
- 4.1.4 Add download functionality
- Export transcription as TXT
- Export words as CSV
- Export full report as PDF
Acceptance Criteria:
- UI is polished and professional
- User experience is smooth
- Results can be exported
Task: Implement local language model for word classification
Steps:
- 1.1.1 Choose and set up local LLM
- Options: Llama 3, Mistral, or smaller model
- Use Ollama or llama.cpp
- Download model during Docker build
- 1.1.2 Create local LLM service
- Create
transcription/services/local_llm_service.py - Implement model loading
- Implement inference method
- Create
- 1.1.3 Update classification logic
- Replace Groq calls with local LLM
- Optimize prompts for local model
- Implement batching for efficiency
- 1.1.4 Update Docker configuration
- Add GPU support (optional)
- Increase memory allocation
- Download model in Dockerfile
Acceptance Criteria:
- Local LLM runs successfully
- Classification accuracy is maintained
- Performance is acceptable
Task: Optimize application performance
Steps:
- 2.1.1 Implement advanced caching
- Cache transcriptions
- Cache word classifications
- Use Redis for caching
- 2.1.2 Optimize database queries
- Add database indexes
- Use select_related and prefetch_related
- Implement query optimization
- 2.1.3 Optimize Whisper processing
- Use faster Whisper model (tiny or small)
- Implement GPU acceleration
- Optimize audio preprocessing
- 2.1.4 Implement rate limiting
- Limit uploads per user
- Limit API requests
- Add queue management
Acceptance Criteria:
- Response times are improved
- Database queries are optimized
- Rate limiting prevents abuse
- Multi-language support (Spanish, French, German, etc.)
- Subtitle generation with CEFR-colored words
- Vocabulary flashcard generation
- Progress tracking for language learners
- Spaced repetition system integration
- Public REST API with authentication
- Webhook support
- Third-party integrations (Anki, Notion, etc.)
- Usage statistics dashboard
- Word difficulty trends
- Learning recommendations
- A/B testing framework
- Kubernetes deployment
- Horizontal scaling
- Load balancing
- CDN integration
- Advanced monitoring (Prometheus, Grafana)
- Automated CI/CD pipeline
- Application successfully transcribes audio files
- Words are correctly classified by CEFR level
- Application is fully dockerized
- Deployment documentation is complete
- Basic UI is functional and responsive
- Video processing works correctly
- User authentication is secure
- Users can manage their processing history
- UI is polished and professional
- Full local processing (no external APIs)
- Performance is optimized
- Application runs efficiently on modest hardware
- Production-ready with all advanced features
- Public API is documented and functional
- Application is scalable and monitored
- Framework: Django 4.2+
- API: Django REST Framework
- Task Queue: Celery + Redis
- Database: PostgreSQL
- AI/ML: Whisper (local), Groq API (Phase 1), Local LLM (Phase 3)
- Server: Gunicorn + Nginx
- Framework: React 18+ with TypeScript
- UI Library: Material-UI (MUI)
- HTTP Client: Axios
- Routing: React Router
- State Management: React Context (Phase 1), Redux (later phases)
- Containerization: Docker, Docker Compose
- Orchestration (Phase 4): Kubernetes
- CI/CD: GitHub Actions (Phase 4)
- Monitoring: Prometheus + Grafana (Phase 4)
- Phase 1 (MVP): 3-4 weeks
- Phase 2 (Enhanced Features): 2-3 weeks
- Phase 3 (Full Local): 2-3 weeks
- Phase 4 (Production Ready): 3-4 weeks
Total: 10-14 weeks (2.5-3.5 months)
- Review and approve this project outline
- Set up development environment
- Start with Phase 1, Step 1.1: Project Setup and Architecture
- Follow each task sequentially, checking off completed items
- Regularly commit changes to Git
- Document any deviations or issues encountered
Document Version: 1.0
Last Updated: October 8, 2025
Status: Ready for Implementation