Development Setup Guide

Prerequisites

Python 3.11+
Node.js 18+
PostgreSQL 15+ (for production)
Redis 7+ (for Celery)
FFmpeg (for audio processing)
Git

Note: This project has been refactored with clean architecture patterns. See ARCHITECTURE.md for design details.

Project Structure Overview

The project follows a clean architecture with clear separation of concerns:

Backend Structure (Refactored)

backend/transcription/
├── models/              # Domain models (6 files)
│   ├── audio.py        # AudioFile model
│   ├── transcription.py # Transcription model
│   ├── word.py         # Word model
│   ├── statistics.py   # Statistics model
│   └── processing.py   # ProcessingTask model
├── views/               # API endpoints (6 files)
│   ├── audio.py        # Audio upload & management
│   ├── transcription.py # Transcription endpoints
│   ├── word.py         # Word queries & filtering
│   ├── processing.py   # Manual processing triggers
│   └── status.py       # Task status checking
├── serializers/         # Data serialization (6 files)
│   └── ... (matching views structure)
├── services/            # Business logic layer
│   ├── audio/          # Audio processing services
│   │   ├── storage_service.py
│   │   ├── validation_service.py
│   │   └── transcription_service.py
│   ├── transcription/  # Transcription services
│   │   ├── whisper_service.py
│   │   └── processing_service.py
│   ├── words/          # Word extraction & analysis
│   │   ├── extraction_service.py
│   │   ├── context_service.py
│   │   └── statistics_service.py
│   └── ai/             # AI/LLM services
│       ├── groq_service.py
│       └── classification_service.py
├── utils/               # Shared utilities (6 files)
│   ├── constants.py    # Constants & config
│   ├── exceptions.py   # Custom exceptions
│   ├── validators.py   # Validation helpers
│   ├── responses.py    # Response formatters
│   └── pagination.py   # Pagination utilities
└── tests/               # Test suite (organized by layer)
    ├── test_models/
    ├── test_views/
    ├── test_serializers/
    └── test_services/

Frontend Structure (Refactored)

frontend/src/
├── components/          # UI components (feature-based)
│   ├── audio/          # AudioUpload.tsx
│   ├── transcription/  # TranscriptionView.tsx (React.memo)
│   ├── words/          # WordList.tsx, Statistics.tsx (React.memo)
│   ├── layout/         # Header, Footer, Layout
│   └── common/         # StatusIndicator, etc.
├── features/            # Feature modules with hooks
│   ├── audio/hooks/    # useAudioUpload, useAudioStatus
│   ├── transcription/hooks/  # useTranscription
│   └── words/hooks/    # useWords
├── hooks/               # Global custom hooks
│   ├── useDebounce.ts  # Debounced inputs
│   ├── useLocalStorage.ts # Persistent state
│   ├── useCache.ts     # Caching utility
│   └── useCachedApi.ts # API caching hook
├── services/            # API communication
│   └── audioService.ts
├── pages/               # Route components (lazy-loaded)
│   ├── Home.tsx
│   └── Results.tsx
└── types/               # TypeScript types
    └── api.ts

Key Improvements:

✅ Clean separation of concerns (models, views, serializers, services)
✅ Service layer for business logic
✅ Reusable utilities and validation
✅ Feature-based frontend organization
✅ Custom hooks for reusable logic
✅ Performance optimizations (code splitting, memoization, caching)

Backend Setup

1. Clone Repository

git clone https://github.com/yourusername/HardWordExtractor.git
cd HardWordExtractor

2. Set Up Python Virtual Environment

cd backend
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Python Dependencies

pip install -r requirements.txt

4. Download spaCy Model (Optional)

python -m spacy download en_core_web_sm

5. Set Up Environment Variables

cp ../.env.example .env

Edit .env and set:

SECRET_KEY - Generate with: python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
GROQ_API_KEY - Get from https://groq.com
DEBUG=True for development

6. Run Database Migrations

For local development (SQLite):

python manage.py makemigrations
python manage.py migrate

For production (PostgreSQL):

# Make sure PostgreSQL is running and database is created
python manage.py makemigrations
python manage.py migrate

7. Create Superuser

python manage.py createsuperuser

8. Create Required Directories

mkdir -p logs media static

9. Start Development Server

python manage.py runserver

Visit:

API: http://localhost:8000/api/
Admin: http://localhost:8000/admin/

10. Start Celery Worker (Separate Terminal)

cd backend
source venv/bin/activate
celery -A config worker -l info

11. Start Redis (Separate Terminal)

redis-server

Or using Docker:

docker run -d -p 6379:6379 redis:7-alpine

Frontend Setup

1. Install Dependencies

cd frontend
npm install

2. Install Additional Packages

npm install axios @mui/material @emotion/react @emotion/styled @mui/icons-material react-router-dom
npm install --save-dev @types/node

3. Configure Environment

Create .env in frontend directory:

REACT_APP_API_URL=http://localhost:8000

4. Start Development Server

npm start

Visit: http://localhost:3000

Testing Backend Manually

Using Django Shell

python manage.py shell

# Test Word Processor
from transcription.services import WordProcessor

processor = WordProcessor()
text = "Hello world! This is a test with difficult paradigms."
words = processor.extract_words(text)
print(words)

# Test Groq Classifier (requires API key)
from transcription.services import GroqClassifier

classifier = GroqClassifier()
result = classifier.classify_words(['hello', 'paradigm', 'difficult'])
print(result)

# Test Whisper (requires audio file)
# from transcription.services import WhisperTranscriber
# transcriber = WhisperTranscriber()
# result = transcriber.transcribe('/path/to/audio.mp3')
# print(result['text'])

Using cURL

# Upload audio file
curl -X POST http://localhost:8000/api/upload/ \
  -F "file=@/path/to/audio.mp3"

# Check status
curl http://localhost:8000/api/status/1/

# Get transcription
curl http://localhost:8000/api/transcriptions/1/

# Get words by level
curl "http://localhost:8000/api/transcriptions/1/words/?level=B1,B2,C1,C2"

Using Postman

Import the API documentation as a collection
Set base URL to http://localhost:8000
Test endpoints manually

Common Issues

Issue: "Unable to import 'celery'"

Solution: Make sure Celery is installed and virtual environment is activated:

pip install celery redis

Issue: "Unable to import 'whisper'"

Solution: Install OpenAI Whisper:

pip install openai-whisper

Issue: "No module named 'groq'"

Solution: Install Groq:

pip install groq

Issue: "Connection refused" when connecting to Redis

Solution: Make sure Redis is running:

# Check if Redis is running
redis-cli ping

# If not running, start it
redis-server

Issue: FFmpeg not found

Solution: Install FFmpeg:

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Issue: PostgreSQL connection error

Solution: Check PostgreSQL is running and database exists:

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Create database
sudo -u postgres psql
CREATE DATABASE hardwordextractor;
CREATE USER hardwordextractor WITH PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE hardwordextractor TO hardwordextractor;
\q

Development Workflow

1. Start All Services

Terminal 1 - Redis:

redis-server

Terminal 2 - Django:

cd backend
source venv/bin/activate
python manage.py runserver

Terminal 3 - Celery:

cd backend
source venv/bin/activate
celery -A config worker -l info

Terminal 4 - React:

cd frontend
npm start

2. Make Changes

Edit backend code in backend/transcription/
Edit frontend code in frontend/src/
Changes auto-reload in development

3. Run Tests

Backend:

cd backend
pytest

Frontend:

cd frontend
npm test

4. Commit Changes

git add .
git commit -m "Description of changes"
git push

Database Management

Create Migration

python manage.py makemigrations

Apply Migration

python manage.py migrate

Reset Database (Development Only!)

rm db.sqlite3
python manage.py migrate
python manage.py createsuperuser

Export Data

python manage.py dumpdata transcription > data.json

Import Data

python manage.py loaddata data.json

Running Tests

Backend Tests

The backend uses Django's test framework with comprehensive test coverage:

cd backend
source venv/bin/activate

# Run all tests
python manage.py test

# Run specific test file
python manage.py test transcription.tests.test_models

# Run with verbosity
python manage.py test --verbosity=2

# Run with coverage (if pytest-cov installed)
pytest --cov=transcription --cov-report=html

Test Structure:

backend/transcription/tests/
├── test_models/         # Model tests
│   ├── test_audio.py
│   ├── test_transcription.py
│   └── test_word.py
├── test_views/          # API endpoint tests
│   ├── test_audio_views.py
│   └── test_word_views.py
├── test_serializers/    # Serializer tests
│   └── test_serializers.py
└── test_services/       # Service layer tests
    └── ... (to be added)

Frontend Tests

The frontend uses Jest and React Testing Library:

cd frontend

# Run all tests
npm test

# Run with coverage
npm test -- --coverage

# Run in watch mode
npm test -- --watch

# Run specific test file
npm test -- AudioUpload.test.tsx