Skip to content

Latest commit

 

History

History
433 lines (363 loc) · 15.2 KB

File metadata and controls

433 lines (363 loc) · 15.2 KB

Architecture Overview

System Architecture

Hard Word Extractor follows a modern client-server architecture with async task processing.

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   React     │────▶│   Django     │────▶│ PostgreSQL  │
│   Frontend  │◀────│   Backend    │◀────│  Database   │
└─────────────┘     └──────────────┘     └─────────────┘
                          │ │
                          │ └──────────▶ ┌────────────┐
                          │              │   Redis    │
                          └─────────────▶│   Cache    │
                                         └────────────┘
                                              │
                                              ▼
                                        ┌────────────┐
                                        │   Celery   │
                                        │   Worker   │
                                        └────────────┘
                                              │
                          ┌──────────────────┼────────────────────┐
                          ▼                  ▼                    ▼
                    ┌─────────┐        ┌──────────┐        ┌─────────┐
                    │ Whisper │        │ Groq API │        │  spaCy  │
                    │   AI    │        │   LLM    │        │   NLP   │
                    └─────────┘        └──────────┘        └─────────┘

Backend Architecture

Layer Organization

The backend follows a clean architecture pattern with clear separation of concerns:

backend/
├── config/                 # Django settings & configuration
│   ├── settings.py
│   ├── urls.py
│   └── celery.py
└── transcription/         # Main Django app
    ├── models/            # Data models (Domain Layer)
    │   ├── audio.py
    │   ├── transcription.py
    │   ├── word.py
    │   ├── statistics.py
    │   └── processing.py
    ├── serializers/       # API serialization (API Layer)
    │   ├── audio.py
    │   ├── transcription.py
    │   ├── word.py
    │   ├── statistics.py
    │   └── processing.py
    ├── views/             # API endpoints (API Layer)
    │   ├── audio.py
    │   ├── transcription.py
    │   ├── word.py
    │   ├── processing.py
    │   └── status.py
    ├── services/          # Business logic (Service Layer)
    │   ├── audio/         # Audio processing services
    │   │   ├── storage_service.py
    │   │   ├── validation_service.py
    │   │   └── transcription_service.py
    │   ├── transcription/ # Transcription services
    │   │   ├── whisper_service.py
    │   │   └── processing_service.py
    │   ├── words/         # Word extraction & analysis
    │   │   ├── extraction_service.py
    │   │   ├── context_service.py
    │   │   └── statistics_service.py
    │   └── ai/            # AI/LLM services
    │       ├── groq_service.py
    │       └── classification_service.py
    ├── utils/             # Shared utilities
    │   ├── constants.py   # Constants & configuration
    │   ├── exceptions.py  # Custom exceptions
    │   ├── validators.py  # Validation helpers
    │   ├── responses.py   # Response formatters
    │   └── pagination.py  # Pagination utilities
    └── tests/             # Test suite
        ├── test_models/
        ├── test_serializers/
        ├── test_views/
        └── test_services/

Key Design Patterns

1. Service Layer Pattern

Business logic is separated into service classes in services/:

  • Audio Services: Handle file storage, validation, transcription initiation
  • Transcription Services: Process Whisper API calls and results
  • Word Services: Extract words, contexts, and generate statistics
  • AI Services: Interface with Groq LLM for classification

2. Repository Pattern

Models in models/ act as repositories, providing data access:

  • AudioFile: Uploaded audio files
  • Transcription: Transcription results with timestamps
  • Word: Extracted vocabulary words
  • Statistics: Aggregated word statistics
  • ProcessingTask: Async task tracking

3. API Layer Separation

  • Serializers: Handle data validation and transformation
  • Views: HTTP request handling and routing
  • Utils: Cross-cutting concerns (pagination, responses)

4. Dependency Injection

Services are injected into views, making testing easier and reducing coupling.

API Design

The API follows REST principles:

  • /api/audio/ - Audio file management
  • /api/transcriptions/ - Transcription results
  • /api/words/ - Word queries with filtering
  • /api/statistics/ - Aggregated statistics
  • /api/processing/ - Task status and control

Features:

  • Pagination on all list endpoints
  • Filtering and ordering on word lists
  • Consistent APIResponse wrapper format
  • Comprehensive error handling

Frontend Architecture

Component Organization

The frontend follows a feature-based architecture:

frontend/src/
├── components/          # Reusable UI components
│   ├── audio/          # Audio-related components
│   │   └── AudioUpload.tsx
│   ├── transcription/  # Transcription display
│   │   └── TranscriptionView.tsx
│   ├── words/          # Word display & stats
│   │   ├── WordList.tsx
│   │   └── Statistics.tsx
│   ├── layout/         # Layout components
│   │   ├── Header.tsx
│   │   ├── Footer.tsx
│   │   └── Layout.tsx
│   └── common/         # Shared components
│       └── StatusIndicator.tsx
├── features/           # Feature modules (hooks + logic)
│   ├── audio/
│   │   └── hooks/
│   │       ├── useAudioUpload.ts
│   │       └── useAudioStatus.ts
│   ├── transcription/
│   │   └── hooks/
│   │       └── useTranscription.ts
│   └── words/
│       └── hooks/
│           └── useWords.ts
├── hooks/              # Global custom hooks
│   ├── useDebounce.ts
│   ├── useLocalStorage.ts
│   ├── useCache.ts
│   └── useCachedApi.ts
├── services/           # API communication
│   └── audioService.ts
├── pages/              # Page components (routes)
│   ├── Home.tsx
│   └── Results.tsx
├── types/              # TypeScript types
│   └── api.ts
├── theme/              # MUI theme configuration
│   └── theme.ts
└── utils/              # Utility functions
    └── formatters.ts

Key Design Patterns

1. Feature-Based Structure

Related components, hooks, and logic are grouped by feature:

  • Audio: Upload, status tracking
  • Transcription: Display and search
  • Words: Filtering, sorting, statistics

2. Custom Hooks Pattern

Business logic is extracted into reusable hooks:

  • useAudioUpload: File upload with progress tracking
  • useAudioStatus: Polling for processing status
  • useTranscription: Fetch and manage transcription data
  • useWords: Word filtering and pagination
  • useDebounce: Debounced search inputs
  • useCache: API response caching

3. Service Layer

API calls are centralized in services/audioService.ts:

  • Consistent error handling
  • Response formatting
  • Token management (future)

4. Performance Optimizations

  • Code Splitting: React.lazy() for route-based splitting
  • Memoization: React.memo() on expensive components
  • Caching: useCachedApi hook for repeated requests
  • Debouncing: useDebounce for search inputs

Data Flow

Upload & Processing Flow

1. User uploads file
   ├─▶ Frontend: AudioUpload component
   └─▶ Backend: POST /api/audio/

2. File stored & validated
   ├─▶ AudioStorageService.save()
   └─▶ AudioValidationService.validate()

3. User triggers transcription
   ├─▶ Frontend: POST /api/audio/{id}/transcribe/
   └─▶ Backend: Celery task created

4. Celery worker processes
   ├─▶ WhisperService.transcribe()
   └─▶ TranscriptionProcessingService.process()

5. User triggers word extraction
   ├─▶ POST /api/transcription/{id}/extract-words/
   └─▶ WordExtractionService.extract()

6. User triggers classification
   ├─▶ POST /api/words/classify/
   ├─▶ GroqService.classify_batch()
   └─▶ ClassificationService.apply()

7. User views results
   ├─▶ GET /api/words/?filters
   └─▶ GET /api/statistics/{id}/

State Management

Frontend State:

  • React hooks for local component state
  • useLocalStorage for persistent UI preferences
  • useCache for API response caching
  • No global state management (Redux/MobX) needed yet

Backend State:

  • PostgreSQL for persistent data
  • Redis for:
    • Celery task queue
    • API caching (future)
    • Session storage (future)

Technology Choices & Rationale

Backend

Technology Purpose Why?
Django 5.2 Web framework Robust, batteries included, excellent ORM
DRF REST API Best-in-class REST framework for Django
Celery Async tasks Industry standard for background processing
Redis Message broker & cache Fast, reliable, simple
PostgreSQL Database ACID compliant, excellent for structured data
Whisper Transcription Best open-source speech-to-text
Groq LLM inference Ultra-fast, cost-effective

Frontend

Technology Purpose Why?
React 18 UI framework Component-based, large ecosystem
TypeScript Type safety Catches errors early, better DX
MUI UI components Professional look, comprehensive
React Router Routing Standard routing solution
Axios HTTP client Simple API, good error handling

Deployment Architecture (Future)

┌─────────────────────────────────────────────────────┐
│                  Nginx (Reverse Proxy)               │
│              SSL Termination (Let's Encrypt)         │
└────────────┬──────────────────────────┬─────────────┘
             │                          │
      ┌──────▼──────┐           ┌──────▼──────┐
      │   Static    │           │  Gunicorn   │
      │   Files     │           │  (Django)   │
      │  (React)    │           │  Workers    │
      └─────────────┘           └──────┬──────┘
                                       │
                        ┌──────────────┼──────────────┐
                        │              │              │
                  ┌─────▼────┐   ┌────▼────┐   ┌────▼────┐
                  │PostgreSQL│   │  Redis  │   │ Celery  │
                  │          │   │         │   │ Workers │
                  └──────────┘   └─────────┘   └─────────┘

Deployment Options:

  1. Docker Compose (recommended for simple deployments)
  2. Kubernetes (for scaling)
  3. VPS (DigitalOcean, Linode, etc.)

Security Considerations

Current (Phase 1 - MVP)

  • File size limits (100MB)
  • File type validation
  • CORS configuration
  • SQL injection prevention (Django ORM)
  • XSS prevention (React escaping)

Future (Phase 2+)

  • JWT authentication
  • Rate limiting
  • API key management
  • User data isolation
  • File encryption at rest
  • HTTPS only
  • CSP headers

Scalability Considerations

Current Bottlenecks

  1. Whisper Processing: CPU-intensive, blocks Celery worker
  2. File Storage: Local filesystem (not scalable)
  3. Database: Single PostgreSQL instance

Future Improvements

  1. Processing:

    • GPU-enabled Whisper processing
    • Multiple Celery workers
    • Task priority queues
  2. Storage:

    • S3/MinIO for file storage
    • CDN for static files
  3. Database:

    • Read replicas for queries
    • Connection pooling
    • Query optimization
  4. Caching:

    • Redis for API responses
    • Browser caching for static content

Testing Strategy

Backend Tests

  • Unit Tests: Models, serializers, services
  • Integration Tests: API endpoints
  • Task Tests: Celery tasks
  • Coverage Target: 80%+

Frontend Tests

  • Unit Tests: Hooks, utilities
  • Component Tests: React Testing Library
  • Integration Tests: User flows
  • Coverage Target: 70%+

Testing Tools

  • Backend: pytest, pytest-django, factory_boy
  • Frontend: Jest, React Testing Library, MSW

Development Workflow

Local Development

  1. Redis running in Docker
  2. Celery worker in terminal
  3. Django dev server
  4. React dev server (hot reload)

Code Quality

  • Linting: ESLint (frontend), pylint/flake8 (backend)
  • Formatting: Prettier (frontend), black (backend)
  • Type Checking: TypeScript, mypy (future)
  • Pre-commit Hooks: lint-staged (future)

Documentation Standards

Code Documentation

  • Docstrings for all Python functions/classes
  • JSDoc comments for complex TypeScript functions
  • README in each major directory

API Documentation

  • OpenAPI/Swagger (future)
  • Postman collection (future)
  • docs/API.md (current)

User Documentation

  • Setup guides
  • API documentation
  • User guides
  • Architecture overview (this file)

Future Architecture Plans

Phase 2: Authentication & Multi-user

  • JWT token authentication
  • User models and permissions
  • User-specific data isolation
  • Usage quotas and limits

Phase 3: Local LLM

  • Replace Groq with local LLM (Ollama/LLaMA)
  • GPU acceleration
  • Model caching and optimization
  • Fallback to Groq if local fails

Phase 4: Advanced Features

  • Video processing support
  • Real-time collaboration
  • Export functionality (CSV, PDF)
  • Study mode and flashcards
  • Progress tracking and analytics

Resources