API Documentation

Hard Word Extractor API v1.0

Base URL: http://localhost:8000/api/

Authentication
Audio File Upload
Processing Status
Transcriptions
Words
Error Handling

Authentication

Phase 1 (MVP): No authentication required. All endpoints are publicly accessible.

Phase 2+: JWT authentication will be added.

Audio File Upload

Upload Audio File

Upload an audio file for transcription and word extraction.

Endpoint: POST /api/upload/

Content-Type: multipart/form-data

Request Body:

file: <audio file>

Supported Formats:

MP3 (audio/mpeg)
WAV (audio/wav)
M4A (audio/x-m4a, audio/mp4)

File Size Limit: 100MB

Example Request (curl):

curl -X POST http://localhost:8000/api/upload/ \
  -F "file=@/path/to/audio.mp3"

Example Request (JavaScript):

const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('http://localhost:8000/api/upload/', {
  method: 'POST',
  body: formData
});

const data = await response.json();

Success Response (201 Created):

{
  "id": 1,
  "file": "/media/audio/2025/10/08/sample.mp3",
  "original_filename": "sample.mp3",
  "file_size": 5242880,
  "file_size_mb": 5.0,
  "duration": null,
  "status": "pending",
  "error_message": null,
  "uploaded_at": "2025-10-08T10:30:00Z",
  "processing_started_at": null,
  "processing_completed_at": null,
  "processing_time": null
}

Status Values:

pending - Queued for processing
processing - Processing started
transcribing - Running Whisper transcription
analyzing - Extracting and classifying words
completed - Processing finished successfully
failed - Processing failed (check error_message)

Error Response (400 Bad Request):

{
  "file": ["File size must be less than 100MB"]
}

Processing Status

Get Processing Status

Check the current status of an audio file's processing.

Endpoint: GET /api/status/<audio_id>/

Example Request:

curl http://localhost:8000/api/status/1/

Success Response (200 OK):

{
  "id": 1,
  "status": "analyzing",
  "progress": 70,
  "error_message": null,
  "original_filename": "sample.mp3",
  "uploaded_at": "2025-10-08T10:30:00Z",
  "processing_started_at": "2025-10-08T10:30:05Z",
  "processing_completed_at": null,
  "has_transcription": true,
  "transcription_id": 1
}

Progress Values:

0% - Pending or failed
20% - Processing started
40% - Transcribing
70% - Analyzing words
100% - Completed

Polling Recommendation: Poll this endpoint every 2-5 seconds until status is completed or failed.

Transcriptions

List Transcriptions

Get a list of all transcriptions.

Endpoint: GET /api/transcriptions/

Example Response:

{
  "count": 10,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": 1,
      "audio_file": {
        "id": 1,
        "original_filename": "sample.mp3",
        "status": "completed"
      },
      "text": "Full transcription text here...",
      "language": "en",
      "word_count": 150,
      "unique_word_count": 85,
      "statistics": {
        "id": 1,
        "a1_count": 45,
        "a2_count": 20,
        "b1_count": 10,
        "b2_count": 5,
        "c1_count": 3,
        "c2_count": 2,
        "unknown_count": 0,
        "total_words": 85,
        "level_distribution": {
          "A1": 52.9,
          "A2": 23.5,
          "B1": 11.8,
          "B2": 5.9,
          "C1": 3.5,
          "C2": 2.4,
          "Unknown": 0.0
        }
      },
      "created_at": "2025-10-08T10:31:00Z",
      "updated_at": "2025-10-08T10:31:00Z"
    }
  ]
}

Get Transcription Detail

Get detailed information about a specific transcription, including all extracted words.

Endpoint: GET /api/transcriptions/<id>/

Example Request:

curl http://localhost:8000/api/transcriptions/1/

Example Response:

{
  "id": 1,
  "audio_file": {
    "id": 1,
    "original_filename": "sample.mp3",
    "file_size_mb": 5.0,
    "duration": 180.5,
    "status": "completed"
  },
  "text": "Hello everyone, this is a sample transcription. We will learn about difficult paradigms and simple concepts.",
  "language": "en",
  "word_count": 17,
  "unique_word_count": 15,
  "statistics": {
    "id": 1,
    "a1_count": 8,
    "a2_count": 3,
    "b1_count": 2,
    "b2_count": 1,
    "c1_count": 1,
    "c2_count": 0,
    "unknown_count": 0,
    "total_words": 15,
    "level_distribution": {
      "A1": 53.3,
      "A2": 20.0,
      "B1": 13.3,
      "B2": 6.7,
      "C1": 6.7,
      "C2": 0.0,
      "Unknown": 0.0
    }
  },
  "extracted_words": [
    {
      "id": 1,
      "word": {
        "id": 1,
        "text": "hello",
        "lemma": "hello",
        "cefr_level": "A1",
        "cefr_level_display": "A1 - Beginner",
        "global_frequency": 1
      },
      "context": "Hello everyone, this is a sample transcription.",
      "timestamp": 0.5,
      "position": 0,
      "frequency": 1
    },
    {
      "id": 2,
      "word": {
        "id": 2,
        "text": "paradigm",
        "lemma": "paradigm",
        "cefr_level": "C1",
        "cefr_level_display": "C1 - Advanced",
        "global_frequency": 1
      },
      "context": "We will learn about difficult paradigms and simple concepts.",
      "timestamp": 8.2,
      "position": 12,
      "frequency": 1
    }
  ]
}

Get Words by CEFR Level

Get words from a transcription, filtered by CEFR level.

Endpoint: GET /api/transcriptions/<id>/words/?level=<levels>

Query Parameters:

level (optional) - Comma-separated CEFR levels (A1, A2, B1, B2, C1, C2)

Example Requests:

# Get all words
curl http://localhost:8000/api/transcriptions/1/words/

# Get only A1 words
curl http://localhost:8000/api/transcriptions/1/words/?level=A1

# Get B1 and B2 words
curl http://localhost:8000/api/transcriptions/1/words/?level=B1,B2

# Get advanced words (C1 and C2)
curl http://localhost:8000/api/transcriptions/1/words/?level=C1,C2

Example Response:

[
  {
    "id": 5,
    "word": {
      "id": 5,
      "text": "difficult",
      "lemma": "difficult",
      "cefr_level": "B1",
      "cefr_level_display": "B1 - Intermediate",
      "global_frequency": 1
    },
    "context": "We will learn about difficult paradigms.",
    "timestamp": 7.8,
    "position": 11,
    "frequency": 1
  },
  {
    "id": 6,
    "word": {
      "id": 6,
      "text": "concept",
      "lemma": "concept",
      "cefr_level": "B2",
      "cefr_level_display": "B2 - Upper Intermediate",
      "global_frequency": 1
    },
    "context": "difficult paradigms and simple concepts.",
    "timestamp": 9.5,
    "position": 14,
    "frequency": 1
  }
]

Get Statistics

Get word statistics for a transcription.

Endpoint: GET /api/transcriptions/<id>/statistics/

Example Response:

{
  "id": 1,
  "a1_count": 8,
  "a2_count": 3,
  "b1_count": 2,
  "b2_count": 1,
  "c1_count": 1,
  "c2_count": 0,
  "unknown_count": 0,
  "total_words": 15,
  "level_distribution": {
    "A1": 53.3,
    "A2": 20.0,
    "B1": 13.3,
    "B2": 6.7,
    "C1": 6.7,
    "C2": 0.0,
    "Unknown": 0.0
  },
  "created_at": "2025-10-08T10:31:00Z"
}

Words

List All Words

Get a list of all words in the database.

Endpoint: GET /api/words/

Query Parameters:

level (optional) - Filter by CEFR level (A1, A2, B1, B2, C1, C2)
search (optional) - Search by word text or lemma

Example Requests:

# Get all words
curl http://localhost:8000/api/words/

# Get only C1 words
curl http://localhost:8000/api/words/?level=C1

# Search for words
curl http://localhost:8000/api/words/?search=paradigm

Example Response:

{
  "count": 100,
  "next": "http://localhost:8000/api/words/?page=2",
  "previous": null,
  "results": [
    {
      "id": 1,
      "text": "hello",
      "lemma": "hello",
      "cefr_level": "A1",
      "cefr_level_display": "A1 - Beginner",
      "global_frequency": 5,
      "created_at": "2025-10-08T10:31:00Z"
    },
    {
      "id": 2,
      "text": "paradigm",
      "lemma": "paradigm",
      "cefr_level": "C1",
      "cefr_level_display": "C1 - Advanced",
      "global_frequency": 2,
      "created_at": "2025-10-08T10:31:15Z"
    }
  ]
}

Get Word Detail

Get detailed information about a specific word.

Endpoint: GET /api/words/<id>/

Example Response:

{
  "id": 1,
  "text": "paradigm",
  "lemma": "paradigm",
  "cefr_level": "C1",
  "cefr_level_display": "C1 - Advanced",
  "global_frequency": 2,
  "created_at": "2025-10-08T10:31:15Z"
}

Error Handling

Error Response Format

All errors follow a consistent format:

{
  "detail": "Error message here"
}

Or for validation errors:

{
  "field_name": ["Error message for this field"]
}

HTTP Status Codes

200 OK - Successful GET request
201 Created - Successful POST request (resource created)
400 Bad Request - Validation error or malformed request
404 Not Found - Resource not found
500 Internal Server Error - Server error

Common Errors

File Too Large:

{
  "file": ["File size must be less than 100MB"]
}

Invalid File Format:

{
  "file": ["File format not supported. Allowed formats: audio/mpeg, audio/wav, audio/x-m4a, audio/mp4"]
}

Resource Not Found:

{
  "detail": "Not found."
}

Processing Failed:

Check the error_message field in the audio file status:

{
  "id": 1,
  "status": "failed",
  "error_message": "Failed to transcribe audio: Invalid audio format"
}

Rate Limiting

Phase 1 (MVP): No rate limiting

Phase 2+: Rate limiting will be implemented:

100 requests per hour per IP
10 file uploads per hour per IP

CORS

CORS is configured to allow requests from:

http://localhost:3000 (React development server)
http://localhost:80 (Frontend in production)

Example Workflows

Complete Upload and Processing Workflow

// 1. Upload file
const formData = new FormData();
formData.append('file', audioFile);

const uploadResponse = await fetch('http://localhost:8000/api/upload/', {
  method: 'POST',
  body: formData
});

const audioFile = await uploadResponse.json();
const audioId = audioFile.id;

// 2. Poll for status
const pollStatus = async () => {
  const statusResponse = await fetch(`http://localhost:8000/api/status/${audioId}/`);
  const status = await statusResponse.json();
  
  if (status.status === 'completed') {
    return status.transcription_id;
  } else if (status.status === 'failed') {
    throw new Error(status.error_message);
  } else {
    // Still processing, poll again
    await new Promise(resolve => setTimeout(resolve, 3000));
    return pollStatus();
  }
};

const transcriptionId = await pollStatus();

// 3. Get transcription and words
const transcriptionResponse = await fetch(
  `http://localhost:8000/api/transcriptions/${transcriptionId}/`
);
const transcription = await transcriptionResponse.json();

// 4. Get words by level (e.g., B1 and above)
const wordsResponse = await fetch(
  `http://localhost:8000/api/transcriptions/${transcriptionId}/words/?level=B1,B2,C1,C2`
);
const words = await wordsResponse.json();

console.log('Transcription:', transcription.text);
console.log('Advanced words:', words);

Pagination

List endpoints use pagination with a default page size of 50 items.

Response Format:

{
  "count": 150,
  "next": "http://localhost:8000/api/words/?page=2",
  "previous": null,
  "results": [...]
}

Query Parameters:

page - Page number (default: 1)
page_size - Items per page (max: 100)

Notes

All timestamps are in UTC
All durations are in seconds
File paths are relative to MEDIA_ROOT
Audio files are stored in /media/audio/YYYY/MM/DD/ format

API Version: 1.0
Last Updated: October 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Documentation

Hard Word Extractor API v1.0

Table of Contents

Authentication

Audio File Upload

Upload Audio File

Processing Status

Get Processing Status

Transcriptions

List Transcriptions

Get Transcription Detail

Get Words by CEFR Level

Get Statistics

Words

List All Words

Get Word Detail

Error Handling

Error Response Format

HTTP Status Codes

Common Errors

Rate Limiting

CORS

Example Workflows

Complete Upload and Processing Workflow

Pagination

Notes

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Documentation

Hard Word Extractor API v1.0

Table of Contents

Authentication

Audio File Upload

Upload Audio File

Processing Status

Get Processing Status

Transcriptions

List Transcriptions

Get Transcription Detail

Get Words by CEFR Level

Get Statistics

Words

List All Words

Get Word Detail

Error Handling

Error Response Format

HTTP Status Codes

Common Errors

Rate Limiting

CORS

Example Workflows

Complete Upload and Processing Workflow

Pagination

Notes