Skip to content

aliarabbasi5155/HardWordExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hard Word Extractor

A web application that processes audio and video files to extract and classify vocabulary words by CEFR language levels (A1-C2), providing transcription and vocabulary analysis for language learners.

🎯 Features

  • Manual Step-by-Step Workflow: Full control over each processing step
    • Upload audio/video files
    • Manually trigger transcription with Whisper AI
    • Extract and review words
    • Classify with Groq AI (see raw API responses)
    • Save and analyze results
  • CEFR Classification: Words are automatically classified by difficulty level (A1-C2)
  • Word Context: See each word in context with timestamps
  • Full Transcription: View complete transcription of your audio
  • Vocabulary Statistics: Get insights about word frequency and difficulty distribution
  • Transparency: See exactly what's happening at each step

New in v2.0: We've redesigned the workflow to give you complete control! Instead of automatic processing, you now manually trigger each step and can review intermediate results. See MANUAL_WORKFLOW.md for details.

πŸ› οΈ Tech Stack

Backend

  • Django 4.2+
  • Django REST Framework
  • Celery + Redis for async processing
  • PostgreSQL database
  • Whisper AI for transcription
  • Groq API for LLM processing

Frontend

  • React 18+ with TypeScript
  • Material-UI (MUI)
  • Axios for API calls
  • React Router

DevOps

  • Docker & Docker Compose
  • Gunicorn + Nginx
  • Let's Encrypt SSL (optional)

πŸš€ Quick Start

Docker Deployment (Recommended)

Prerequisites:

  • Docker and Docker Compose installed
  • Groq API key (get one at groq.com)

3 Simple Steps:

  1. Clone and configure

    git clone https://github.com/yourusername/HardWordExtractor.git
    cd HardWordExtractor
    # Set your GROQ_API_KEY in docker-compose.dev.yml
  2. Start all services

    docker compose -f docker-compose.dev.yml up --build
  3. Access the application

See docs/DOCKER-QUICKSTART.md for detailed Docker deployment guide.

Manual Development Setup

For development without Docker, see QUICKSTART.md for running services manually.

πŸ“– Documentation

Deployment & Setup

API & Architecture

Configuration

πŸ—οΈ Architecture Highlights

Backend (Refactored)

backend/transcription/
β”œβ”€β”€ models/              # Data models (6 files)
β”‚   β”œβ”€β”€ audio.py, transcription.py, word.py
β”‚   β”œβ”€β”€ statistics.py, processing.py
β”œβ”€β”€ serializers/         # API serialization (6 files)
β”œβ”€β”€ views/               # API endpoints (6 files)
β”œβ”€β”€ services/            # Business logic (organized by domain)
β”‚   β”œβ”€β”€ audio/          # Audio processing
β”‚   β”œβ”€β”€ transcription/  # Whisper & processing
β”‚   β”œβ”€β”€ words/          # Extraction & context
β”‚   └── ai/             # Groq & classification
β”œβ”€β”€ utils/               # Shared utilities (6 files)
β”‚   β”œβ”€β”€ constants.py, exceptions.py
β”‚   β”œβ”€β”€ validators.py, responses.py, pagination.py
└── tests/               # Comprehensive test suite

Frontend (Refactored)

frontend/src/
β”œβ”€β”€ components/          # UI components (organized by feature)
β”‚   β”œβ”€β”€ audio/, transcription/, words/
β”‚   β”œβ”€β”€ layout/, common/
β”œβ”€β”€ features/            # Feature modules with hooks
β”‚   β”œβ”€β”€ audio/hooks/    # useAudioUpload, useAudioStatus
β”‚   β”œβ”€β”€ transcription/hooks/  # useTranscription
β”‚   └── words/hooks/    # useWords
β”œβ”€β”€ hooks/               # Global hooks
β”‚   β”œβ”€β”€ useDebounce, useLocalStorage
β”‚   β”œβ”€β”€ useCache, useCachedApi
β”œβ”€β”€ services/            # API communication
└── pages/               # Route components (lazy-loaded)

Performance Features:

  • βœ… Code splitting (44% bundle size reduction)
  • βœ… React.memo on expensive components
  • βœ… API caching with custom hooks
  • βœ… Debounced search inputs

See docs/ARCHITECTURE.md for detailed design patterns and data flow.

πŸ§ͺ Development

Local Development Setup

See SETUP.md for detailed instructions.

Running Tests

# Backend tests
cd backend
python manage.py test

# Frontend tests
cd frontend
npm test

# Test coverage
cd backend && pytest --cov
cd frontend && npm test -- --coverage

πŸ“¦ Project Structure

HardWordExtractor/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ config/              # Django settings & Celery
β”‚   └── transcription/       # Main app (refactored)
β”‚       β”œβ”€β”€ models/          # 6 model files
β”‚       β”œβ”€β”€ views/           # 6 view files
β”‚       β”œβ”€β”€ serializers/     # 6 serializer files
β”‚       β”œβ”€β”€ services/        # Business logic (4 domains)
β”‚       β”œβ”€β”€ utils/           # Shared utilities (6 files)
β”‚       └── tests/           # Test suite (organized by layer)
β”œβ”€β”€ frontend/
β”‚   └── src/
β”‚       β”œβ”€β”€ components/      # UI components (5 domains)
β”‚       β”œβ”€β”€ features/        # Feature hooks (3 domains)
β”‚       β”œβ”€β”€ hooks/           # Global hooks (4 files)
β”‚       β”œβ”€β”€ services/        # API services
β”‚       β”œβ”€β”€ pages/           # Route components
β”‚       └── types/           # TypeScript types
β”œβ”€β”€ docker/                  # Docker configurations
β”œβ”€β”€ docs/                    # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md     # System architecture
β”‚   β”œβ”€β”€ API.md              # API documentation
β”‚   β”œβ”€β”€ SETUP.md            # Setup guide
β”‚   └── GROQ_SETUP.md       # Groq API guide
β”œβ”€β”€ scripts/                 # Utility scripts
β”œβ”€β”€ docker-compose.yml       # Docker orchestration
└── README.md

πŸ—ΊοΈ Roadmap

  • Phase 1: MVP (βœ… 100% Complete)

    • Audio transcription with Whisper
    • CEFR word classification with Groq API
    • Manual step-by-step workflow
    • React frontend with TypeScript
    • Docker deployment
    • Complete documentation
  • Phase 2: Video support and user authentication

  • Phase 3: Full local LLM processing

  • Phase 4: Production features and scaling

Current Status: Phase 1 MVP is complete and production-ready! Docker deployment tested and verified with comprehensive documentation.

See PROJECT_OUTLINE.md and PROJECT_STATUS.md for detailed progress tracking.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“§ Contact

For questions or support, please open an issue on GitHub.

πŸ™ Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors