Skip to content
This repository was archived by the owner on Jul 19, 2025. It is now read-only.

Swecha-Foundation/corpus-te

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project is migrated to https://code.swecha.org/corpus/corpus-backend


Telugu Corpus Collections API

A FastAPI-based backend service for managing Telugu corpus collections, supporting text, audio, video, and image submissions with PostgreSQL database and JWT authentication.

Features

  • Multi-media Support: Handle text, audio, video, and image submissions
  • User Management: Many-to-many role-based user system (admin/user/reviewer)
  • Authentication: JWT-based authentication and authorization
  • OTP Authentication: SMS-based OTP verification for secure phone number authentication
  • Category Management: Organize submissions by categories
  • Record Review System: Support for content review workflows
  • Geolocation & PostGIS: Advanced geographic data handling with spatial queries and indexing
  • PostgreSQL Database: Robust database with proper foreign key constraints
  • File Storage: Support for local and MinIO/S3 storage
  • RESTful API: Full CRUD operations with OpenAPI documentation

Project Structure

corpus-te/
├── app/
│   ├── main.py              # FastAPI application
│   ├── core/
│   │   ├── config.py        # Settings and configuration
│   │   ├── auth.py          # JWT authentication utilities
│   │   ├── exceptions.py    # Custom exceptions
│   │   ├── logging_config.py # Logging setup
│   │   └── rbac_fastapi.py  # Role-based access control utilities
│   ├── db/
│   │   └── session.py       # Database session
│   ├── models/
│   │   ├── __init__.py      # Database models
│   │   ├── associations.py  # Many-to-many association tables
│   │   ├── user.py          # User model
│   │   ├── role.py          # Role model
│   │   ├── category.py      # Category model
│   │   ├── record.py        # Record model
│   │   └── otp.py           # OTP model for authentication
│   ├── schemas/
│   │   ├── __init__.py      # Pydantic schemas
│   │   ├── geo_schemas.py   # Geographic coordinate schemas
│   │   └── otp.py           # OTP request/response schemas
│   ├── api/
│   │   ├── __init__.py
│   │   ├── auth.py          # Legacy auth endpoints
│   │   └── v1/
│   │       ├── api.py       # API router
│   │       ├── __init__.py
│   │       └── endpoints/
│   │           ├── __init__.py
│   │           ├── auth.py      # Authentication endpoints (with OTP)
│   │           ├── users.py     # User management endpoints
│   │           ├── roles.py     # Role management endpoints
│   │           ├── categories.py # Category endpoints
│   │           ├── records.py   # Record endpoints
│   │           └── system_rbac.py # RBAC system endpoints
│   ├── services/            # Business logic services
│   │   └── otp_service.py   # OTP authentication service
│   └── utils/               # Utility modules
│       ├── __init__.py
│       ├── cleanup_storage.py      # Storage cleanup utilities
│       ├── hetzner_storage.py      # Hetzner object storage integration
│       ├── postgis_utils.py        # PostGIS geographic utilities
│       └── record_file_generator.py # Record file generation utilities
├── alembic/                 # Database migrations
│   ├── versions/            # Migration files
│   ├── alembic.ini          # Alembic configuration
│   └── env.py              # Migration environment
├── docs/                    # Documentation and guides
│   ├── demo_rbac_optimization.py            # RBAC demo script
│   ├── example_hetzner_storage.py           # Hetzner storage examples
│   ├── example_record_file_generator.py     # Record generator examples
│   ├── generate_record_files.py             # File generation script
│   ├── otp_demo.py                          # OTP demo script
│   ├── HETZNER_STORAGE_GUIDE.md            # Hetzner storage setup guide
│   ├── OTP_AUTHENTICATION_GUIDE.md         # OTP authentication guide
│   ├── OTP_IMPLEMENTATION_SUMMARY.md       # OTP implementation summary
│   ├── OTP_TESTING_RESULTS.md              # OTP testing results
│   ├── Plan.md                             # Project development plan
│   ├── POSTGIS_INTEGRATION_SUMMARY.md      # PostGIS integration summary
│   ├── POSTGRESQL_SETUP.md                 # PostgreSQL setup guide
│   ├── RBAC_GUIDE.md                       # Role-based access control guide
│   ├── RBAC_OPTIMIZATION_SUMMARY.md        # RBAC optimization summary
│   ├── RECORD_FILE_GENERATOR_GUIDE.md      # Record file generator guide
│   └── RECORD_FILE_GENERATOR_COMPLETION_SUMMARY.md # Generator completion summary
├── tests/                   # Test files
│   ├── create_test_data.py              # Test data creation script
│   ├── test_hetzner_storage.py          # Hetzner storage tests
│   ├── test_otp_api.py                  # OTP API tests
│   ├── test_postgis_api.py              # PostGIS API tests
│   ├── test_postgis_integration.py      # PostGIS integration tests
│   ├── test_updated_api_endpoints.py    # Updated API endpoint tests
│   └── verify_test_data.py              # Test data verification
├── logs/                    # Application logs
├── main.py                  # Application entry point
├── setup_postgresql.py      # PostgreSQL setup automation script
├── pyproject.toml           # Project dependencies
├── uv.lock                  # UV dependency lock file
├── LICENSE                  # License file
└── README.md               # This file

Key Features

🔐 OTP Authentication System

Complete SMS-based One-Time Password authentication system for secure phone number verification:

  • SMS Integration: Real SMS delivery via Ozonetel API
  • Security: HMAC-SHA256 OTP hashing with salts and time-based expiry
  • Rate Limiting: Built-in protection against spam and abuse
  • Phone Validation: International phone number format validation
  • JWT Integration: Seamless token generation after successful verification

📖 Detailed OTP Authentication Guide

🌍 PostGIS Geographic Integration

Advanced spatial data handling with PostGIS for location-based features:

  • Spatial Queries: Efficient geographic data operations and indexing
  • Location Services: Precise coordinate handling and validation
  • Performance Optimization: Specialized indexes for geographic queries
  • Data Integrity: Robust validation for coordinate formats and ranges

📖 PostGIS Integration Guide

👥 Role-Based Access Control (RBAC)

Comprehensive user management system with flexible permissions:

  • Multi-Role Support: Admin, User, and Reviewer roles with granular permissions
  • Performance Optimized: Efficient user-role queries and caching strategies
  • Scalable Architecture: Designed for large-scale user management

📖 RBAC Performance Optimization Guide

Quick Start

  1. Clone and navigate to the project:

    cd corpus-te
  2. Create a virtual environment:

    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    uv pip install -e .
  4. Install development dependencies:

    uv pip install -e ".[dev]"

    Alternative: Install all dependencies with uv sync (if using uv.lock):

    uv sync --dev
  5. Set up PostgreSQL database:

    See POSTGRESQL_SETUP.md for detailed PostgreSQL installation and setup instructions.

  6. Set up environment variables:

    cp .env.example .env
    # Edit .env with your configuration (see Configuration section below)
  7. Generate a secure secret key for JWT: For Bash

    openssl rand -hex 32

    OR using Python

          import secrets
          secrets.token_urlsafe(32)

    Update the APP_SECRET_KEY in your .env file with the generated key.

    Example:

    APP_SECRET_KEY="your-generated-secret-key"
  8. Set up PostgreSQL database with automated script (Recommended):

    Use the provided setup script to automatically create the database and run initial setup:

    # Run complete database setup (recommended for first-time setup)
    python setup_postgresql.py --all

    Or run individual steps:

    # Test PostgreSQL connection
    python setup_postgresql.py --test-connection
    
    # Create database if it doesn't exist
    python setup_postgresql.py --create-db
    
    # Run database migrations
    python setup_postgresql.py --migrate
    
    # Seed initial data (roles)
    python setup_postgresql.py --seed

    Alternative: Manual database setup:

    If you prefer manual setup, see POSTGRESQL_SETUP.md for detailed PostgreSQL installation and setup instructions, then run:

    alembic upgrade head
  9. Run the application:

    python main.py
  10. Access the API documentation:

Development

Installing additional dependencies:

uv pip install package-name

Installing development dependencies:

uv pip install -e ".[dev]"

Running with auto-reload:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

PostgreSQL Setup Script

The setup_postgresql.py script provides automated database setup and testing functionality:

# Show current database configuration
python setup_postgresql.py

# Run complete setup (creates DB, runs migrations, seeds data)
python setup_postgresql.py --all

# Individual operations:
python setup_postgresql.py --test-connection    # Test PostgreSQL server connection
python setup_postgresql.py --create-db          # Create database if missing
python setup_postgresql.py --migrate            # Run Alembic migrations
python setup_postgresql.py --seed               # Seed initial roles data

What the script does:

  • Connection Testing: Verifies PostgreSQL server accessibility
  • Database Creation: Creates the target database if it doesn't exist
  • Migration Execution: Runs all pending Alembic migrations
  • Data Seeding: Creates initial roles (admin, user, reviewer)
  • Error Handling: Provides clear feedback on setup status

Prerequisites:

  • PostgreSQL server running and accessible
  • Correct database credentials in .env file
  • psycopg2-binary installed (included in project dependencies)

Database migrations:

Before running the migrations check POSTGRESQL_SETUP.md for PostgreSQL setup.

# Create new migration
alembic revision --autogenerate -m "description"

# Apply migrations
alembic upgrade head

# Check migration status
alembic current

# Downgrade to previous migration
alembic downgrade -1

Running tests:

pytest

Code formatting with uv:

uv pip install black isort
black .
isort .

Updating dependencies:

uv pip install --upgrade package-name

Configuration

Key environment variables in .env file:

Database Configuration

  • DATABASE_URL: PostgreSQL connection string
  • DB_HOST: PostgreSQL host (default: localhost)
  • DB_PORT: PostgreSQL port (default: 5432)
  • DB_NAME: Database name
  • DB_USER: PostgreSQL username
  • DB_PASSWORD: PostgreSQL password

Application Settings

  • PROJECT_NAME: Application name
  • LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR)

Authentication

  • APP_SECRET_KEY: JWT secret key (change in production)
  • ALGORITHM: JWT algorithm (default: HS256)
  • ACCESS_TOKEN_EXPIRE_MINUTES: Token expiration time (default: 30)

CORS Configuration

  • BACKEND_CORS_ORIGINS: Comma-separated list of allowed origins

Object Storage (Optional)

  • HZ_OBJ_ACCESS_KEY: Object storage access key
  • HZ_OBJ_SECRET_KEY: Object storage secret key
  • HZ_OBJ_API_TOKEN: Object storage API token
  • HZ_OBJ_ENDPOINT: Object storage endpoint
  • HZ_OBJ_BUCKET_NAME: Object storage bucket name

File Upload Settings

  • MAX_FILE_SIZE: Maximum file size in bytes (default: 104857600 = 100MB)

Example .env file:

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=corpus_te
DB_USER=postgres
DB_PASSWORD=your_password_here
DATABASE_URL="postgresql://postgres:your_password_here@localhost:5432/corpus_te"

# Application Settings
PROJECT_NAME="Telugu Corpus Collections API"
LOG_LEVEL="WARNING"

# JWT Configuration
APP_SECRET_KEY="your-secret-key-here"
ALGORITHM="HS256"
ACCESS_TOKEN_EXPIRE_MINUTES=30

# CORS Origins
BACKEND_CORS_ORIGINS="http://localhost:3000,http://localhost:8080"

# File Upload Settings
MAX_FILE_SIZE=104857600

API Endpoints

Core Endpoints

  • GET /: Welcome message
  • GET /health: Health check
  • GET /docs: Swagger UI API documentation
  • GET /redoc: ReDoc API documentation

Authentication (/api/v1/auth/)

  • POST /api/v1/auth/login: User login (returns JWT token)
  • POST /api/v1/auth/register: User registration

User Management (/api/v1/users/)

  • GET /api/v1/users/: List all users (with pagination)
  • POST /api/v1/users/: Create a new user
  • GET /api/v1/users/{user_id}: Get user by ID
  • PUT /api/v1/users/{user_id}: Update user
  • DELETE /api/v1/users/{user_id}: Delete user
  • GET /api/v1/users/{user_id}/with-roles: Get user with roles populated
  • GET /api/v1/users/phone/{phone}: Get user by phone number

User Role Management (/api/v1/users/{user_id}/roles/)

  • GET /api/v1/users/{user_id}/roles: Get user's roles
  • POST /api/v1/users/{user_id}/roles: Assign roles to user (replace all)
  • PUT /api/v1/users/{user_id}/roles/add: Add a role to user
  • DELETE /api/v1/users/{user_id}/roles/{role_id}: Remove role from user

Role Management (/api/v1/roles/)

  • GET /api/v1/roles/: List all roles
  • POST /api/v1/roles/: Create a new role
  • GET /api/v1/roles/{role_id}: Get role by ID
  • PUT /api/v1/roles/{role_id}: Update role
  • DELETE /api/v1/roles/{role_id}: Delete role

Category Management (/api/v1/categories/)

  • GET /api/v1/categories/: List all categories
  • POST /api/v1/categories/: Create a new category
  • GET /api/v1/categories/{category_id}: Get category by ID
  • PUT /api/v1/categories/{category_id}: Update category
  • DELETE /api/v1/categories/{category_id}: Delete category

Record Management (/api/v1/records/)

  • GET /api/v1/records/: List all records
  • POST /api/v1/records/: Create a new record
  • GET /api/v1/records/{record_id}: Get record by ID
  • PUT /api/v1/records/{record_id}: Update record
  • DELETE /api/v1/records/{record_id}: Delete record

Database Setup

This application uses PostgreSQL as the primary database. For detailed setup instructions, see POSTGRESQL_SETUP.md.

Quick PostgreSQL Setup

  1. Install PostgreSQL on your system
  2. Create a database and user:
    CREATE DATABASE corpus_te;
    CREATE USER corpus_user WITH PASSWORD 'your_password';
    GRANT ALL PRIVILEGES ON DATABASE corpus_te TO corpus_user;
  3. Update your .env file with the database credentials
  4. Run migrations: alembic upgrade head

Database Schema

The application includes:

  • Users: User accounts with many-to-many role relationships
  • Roles: System roles (admin, user, reviewer)
  • Categories: Content organization categories
  • Records: Media submissions with metadata
  • User-Role Association: Many-to-many relationship table

Authentication

The API uses JWT (JSON Web Tokens) for authentication:

  1. Register/Login: Use /api/v1/auth/register or /api/v1/auth/login
  2. Get Token: Login returns an access token
  3. Use Token: Include in Authorization header: Bearer <token>
  4. Protected Endpoints: Most endpoints require authentication

Example authentication flow:

# Register a new user
# role_ids 1 for admin
curl -X 'POST' \
  'http://localhost:8000/api/v1/users/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "phone": "1234567890",
  "name": "John",
  "email": "[email protected]",
  "gender": "male",
  "date_of_birth": "2025-07-01",
  "place": "Telangana",
  "password": "password",
  "role_ids": [
    1
  ],
  "has_given_consent": true
}'

# Login to get token
curl -X POST "http://localhost:8000/api/v1/auth/login" \
  -H "Content-Type: application/json" \
  -d '{"phone": "1234567890", "password": "password"}'

# Use token in requests
curl -X GET "http://localhost:8000/api/v1/users/" \
  -H "Authorization: Bearer <your-token-here>"

Troubleshooting

PostgreSQL Setup Issues

Database Connection Fails:

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Test connection manually
psql -h localhost -U postgres -d postgres

Setup Script Issues:

# Check database configuration
python setup_postgresql.py

# Run with verbose output
python setup_postgresql.py --test-connection

Common Error Solutions:

  1. "PostgreSQL server connection failed"

    • Ensure PostgreSQL is installed and running
    • Check credentials in .env file
    • Verify host and port settings
  2. "Database connection failed"

    • Run python setup_postgresql.py --create-db first
    • Check if database name matches .env configuration
  3. "Migration failed"

    • Ensure database exists and is accessible
    • Check for conflicting migrations with alembic current
    • Reset migrations if needed: alembic downgrade base
  4. "Permission denied"

    • Ensure PostgreSQL user has CREATE DATABASE privileges
    • Check PostgreSQL authentication settings in pg_hba.conf
  5. "password authentication failed"

    • verify your database password
    • for unix based systems like linux sudo sed -i /etc/postgresql/17/main/pg_hba.conf s/peer/scram-sha-256/g sudo systemctl restart postgresql

Database Reset (Development Only):

# Drop and recreate database
psql -h localhost -U postgres -c "DROP DATABASE IF EXISTS corpus_te;"
python setup_postgresql.py --all

Application Issues

Import Errors:

  • Ensure virtual environment is activated
  • Install dependencies: uv sync --dev

Port Already in Use:

# Find process using port 8000
lsof -i :8000

# Kill process
kill -9 <PID>

For more detailed troubleshooting, see POSTGRESQL_SETUP.md.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

This project is licensed under the terms specified in the LICENSE file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8

Languages