Skip to content

Latest commit

 

History

History
666 lines (487 loc) · 12.9 KB

File metadata and controls

666 lines (487 loc) · 12.9 KB

Development Setup Guide

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • PostgreSQL 15+ (for production)
  • Redis 7+ (for Celery)
  • FFmpeg (for audio processing)
  • Git

Note: This project has been refactored with clean architecture patterns. See ARCHITECTURE.md for design details.

Project Structure Overview

The project follows a clean architecture with clear separation of concerns:

Backend Structure (Refactored)

backend/transcription/
├── models/              # Domain models (6 files)
│   ├── audio.py        # AudioFile model
│   ├── transcription.py # Transcription model
│   ├── word.py         # Word model
│   ├── statistics.py   # Statistics model
│   └── processing.py   # ProcessingTask model
├── views/               # API endpoints (6 files)
│   ├── audio.py        # Audio upload & management
│   ├── transcription.py # Transcription endpoints
│   ├── word.py         # Word queries & filtering
│   ├── processing.py   # Manual processing triggers
│   └── status.py       # Task status checking
├── serializers/         # Data serialization (6 files)
│   └── ... (matching views structure)
├── services/            # Business logic layer
│   ├── audio/          # Audio processing services
│   │   ├── storage_service.py
│   │   ├── validation_service.py
│   │   └── transcription_service.py
│   ├── transcription/  # Transcription services
│   │   ├── whisper_service.py
│   │   └── processing_service.py
│   ├── words/          # Word extraction & analysis
│   │   ├── extraction_service.py
│   │   ├── context_service.py
│   │   └── statistics_service.py
│   └── ai/             # AI/LLM services
│       ├── groq_service.py
│       └── classification_service.py
├── utils/               # Shared utilities (6 files)
│   ├── constants.py    # Constants & config
│   ├── exceptions.py   # Custom exceptions
│   ├── validators.py   # Validation helpers
│   ├── responses.py    # Response formatters
│   └── pagination.py   # Pagination utilities
└── tests/               # Test suite (organized by layer)
    ├── test_models/
    ├── test_views/
    ├── test_serializers/
    └── test_services/

Frontend Structure (Refactored)

frontend/src/
├── components/          # UI components (feature-based)
│   ├── audio/          # AudioUpload.tsx
│   ├── transcription/  # TranscriptionView.tsx (React.memo)
│   ├── words/          # WordList.tsx, Statistics.tsx (React.memo)
│   ├── layout/         # Header, Footer, Layout
│   └── common/         # StatusIndicator, etc.
├── features/            # Feature modules with hooks
│   ├── audio/hooks/    # useAudioUpload, useAudioStatus
│   ├── transcription/hooks/  # useTranscription
│   └── words/hooks/    # useWords
├── hooks/               # Global custom hooks
│   ├── useDebounce.ts  # Debounced inputs
│   ├── useLocalStorage.ts # Persistent state
│   ├── useCache.ts     # Caching utility
│   └── useCachedApi.ts # API caching hook
├── services/            # API communication
│   └── audioService.ts
├── pages/               # Route components (lazy-loaded)
│   ├── Home.tsx
│   └── Results.tsx
└── types/               # TypeScript types
    └── api.ts

Key Improvements:

  • ✅ Clean separation of concerns (models, views, serializers, services)
  • ✅ Service layer for business logic
  • ✅ Reusable utilities and validation
  • ✅ Feature-based frontend organization
  • ✅ Custom hooks for reusable logic
  • ✅ Performance optimizations (code splitting, memoization, caching)

Backend Setup

1. Clone Repository

git clone https://github.com/yourusername/HardWordExtractor.git
cd HardWordExtractor

2. Set Up Python Virtual Environment

cd backend
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Python Dependencies

pip install -r requirements.txt

4. Download spaCy Model (Optional)

python -m spacy download en_core_web_sm

5. Set Up Environment Variables

cp ../.env.example .env

Edit .env and set:

  • SECRET_KEY - Generate with: python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
  • GROQ_API_KEY - Get from https://groq.com
  • DEBUG=True for development

6. Run Database Migrations

For local development (SQLite):

python manage.py makemigrations
python manage.py migrate

For production (PostgreSQL):

# Make sure PostgreSQL is running and database is created
python manage.py makemigrations
python manage.py migrate

7. Create Superuser

python manage.py createsuperuser

8. Create Required Directories

mkdir -p logs media static

9. Start Development Server

python manage.py runserver

Visit:

10. Start Celery Worker (Separate Terminal)

cd backend
source venv/bin/activate
celery -A config worker -l info

11. Start Redis (Separate Terminal)

redis-server

Or using Docker:

docker run -d -p 6379:6379 redis:7-alpine

Frontend Setup

1. Install Dependencies

cd frontend
npm install

2. Install Additional Packages

npm install axios @mui/material @emotion/react @emotion/styled @mui/icons-material react-router-dom
npm install --save-dev @types/node

3. Configure Environment

Create .env in frontend directory:

REACT_APP_API_URL=http://localhost:8000

4. Start Development Server

npm start

Visit: http://localhost:3000


Testing Backend Manually

Using Django Shell

python manage.py shell
# Test Word Processor
from transcription.services import WordProcessor

processor = WordProcessor()
text = "Hello world! This is a test with difficult paradigms."
words = processor.extract_words(text)
print(words)

# Test Groq Classifier (requires API key)
from transcription.services import GroqClassifier

classifier = GroqClassifier()
result = classifier.classify_words(['hello', 'paradigm', 'difficult'])
print(result)

# Test Whisper (requires audio file)
# from transcription.services import WhisperTranscriber
# transcriber = WhisperTranscriber()
# result = transcriber.transcribe('/path/to/audio.mp3')
# print(result['text'])

Using cURL

# Upload audio file
curl -X POST http://localhost:8000/api/upload/ \
  -F "file=@/path/to/audio.mp3"

# Check status
curl http://localhost:8000/api/status/1/

# Get transcription
curl http://localhost:8000/api/transcriptions/1/

# Get words by level
curl "http://localhost:8000/api/transcriptions/1/words/?level=B1,B2,C1,C2"

Using Postman

  1. Import the API documentation as a collection
  2. Set base URL to http://localhost:8000
  3. Test endpoints manually

Common Issues

Issue: "Unable to import 'celery'"

Solution: Make sure Celery is installed and virtual environment is activated:

pip install celery redis

Issue: "Unable to import 'whisper'"

Solution: Install OpenAI Whisper:

pip install openai-whisper

Issue: "No module named 'groq'"

Solution: Install Groq:

pip install groq

Issue: "Connection refused" when connecting to Redis

Solution: Make sure Redis is running:

# Check if Redis is running
redis-cli ping

# If not running, start it
redis-server

Issue: FFmpeg not found

Solution: Install FFmpeg:

# Ubuntu/Debian
sudo apt-get install ffmpeg

# macOS
brew install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Issue: PostgreSQL connection error

Solution: Check PostgreSQL is running and database exists:

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Create database
sudo -u postgres psql
CREATE DATABASE hardwordextractor;
CREATE USER hardwordextractor WITH PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE hardwordextractor TO hardwordextractor;
\q

Development Workflow

1. Start All Services

Terminal 1 - Redis:

redis-server

Terminal 2 - Django:

cd backend
source venv/bin/activate
python manage.py runserver

Terminal 3 - Celery:

cd backend
source venv/bin/activate
celery -A config worker -l info

Terminal 4 - React:

cd frontend
npm start

2. Make Changes

  • Edit backend code in backend/transcription/
  • Edit frontend code in frontend/src/
  • Changes auto-reload in development

3. Run Tests

Backend:

cd backend
pytest

Frontend:

cd frontend
npm test

4. Commit Changes

git add .
git commit -m "Description of changes"
git push

Database Management

Create Migration

python manage.py makemigrations

Apply Migration

python manage.py migrate

Reset Database (Development Only!)

rm db.sqlite3
python manage.py migrate
python manage.py createsuperuser

Export Data

python manage.py dumpdata transcription > data.json

Import Data

python manage.py loaddata data.json

Running Tests

Backend Tests

The backend uses Django's test framework with comprehensive test coverage:

cd backend
source venv/bin/activate

# Run all tests
python manage.py test

# Run specific test file
python manage.py test transcription.tests.test_models

# Run with verbosity
python manage.py test --verbosity=2

# Run with coverage (if pytest-cov installed)
pytest --cov=transcription --cov-report=html

Test Structure:

backend/transcription/tests/
├── test_models/         # Model tests
│   ├── test_audio.py
│   ├── test_transcription.py
│   └── test_word.py
├── test_views/          # API endpoint tests
│   ├── test_audio_views.py
│   └── test_word_views.py
├── test_serializers/    # Serializer tests
│   └── test_serializers.py
└── test_services/       # Service layer tests
    └── ... (to be added)

Frontend Tests

The frontend uses Jest and React Testing Library:

cd frontend

# Run all tests
npm test

# Run with coverage
npm test -- --coverage

# Run in watch mode
npm test -- --watch

# Run specific test file
npm test -- AudioUpload.test.tsx

Test Structure:

frontend/src/
├── components/
│   └── audio/
│       └── AudioUpload.test.tsx
├── features/
│   └── audio/hooks/
│       └── useAudioUpload.test.ts
└── hooks/
    └── useDebounce.test.ts

Test Examples:

  • Component Tests: User interactions, rendering, state changes
  • Hook Tests: Custom hook logic, state management
  • Integration Tests: API calls, data flow

Production Deployment

See DEPLOYMENT.md for detailed production deployment instructions.


Useful Commands

Django

# Run server
python manage.py runserver

# Create superuser
python manage.py createsuperuser

# Open Django shell
python manage.py shell

# Check for errors
python manage.py check

# Collect static files
python manage.py collectstatic

Celery

# Start worker
celery -A config worker -l info

# Start worker with concurrency
celery -A config worker -l info --concurrency=4

# Monitor tasks
celery -A config events

# Purge all tasks
celery -A config purge

Git

# Check status
git status

# View changes
git diff

# Stage files
git add .

# Commit
git commit -m "Message"

# Push
git push origin main

# Pull
git pull origin main

IDE Setup

VS Code Extensions

Recommended extensions:

  • Python
  • Pylance
  • Django
  • ESLint
  • Prettier
  • ES7+ React/Redux/React-Native snippets

PyCharm Configuration

  1. Set Python interpreter to virtual environment
  2. Mark backend as sources root
  3. Enable Django support
  4. Configure run configurations for Django and Celery

Environment Variables Reference

Backend (.env)

# Django
SECRET_KEY=your-secret-key
DEBUG=True
ALLOWED_HOSTS=localhost,127.0.0.1

# Database
DB_NAME=hardwordextractor
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=localhost
DB_PORT=5432

# Redis
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# APIs
GROQ_API_KEY=your-groq-api-key

# Whisper
WHISPER_MODEL=base

# File Upload
MAX_UPLOAD_SIZE=104857600

# CORS
CORS_ALLOWED_ORIGINS=http://localhost:3000,http://localhost:80

Frontend (.env)

REACT_APP_API_URL=http://localhost:8000

Last Updated: October 8, 2025