Skip to content

Latest commit

 

History

History
632 lines (495 loc) · 12 KB

File metadata and controls

632 lines (495 loc) · 12 KB

API Documentation

Hard Word Extractor API v1.0

Base URL: http://localhost:8000/api/


Table of Contents

  1. Authentication
  2. Audio File Upload
  3. Processing Status
  4. Transcriptions
  5. Words
  6. Error Handling

Authentication

Phase 1 (MVP): No authentication required. All endpoints are publicly accessible.

Phase 2+: JWT authentication will be added.


Audio File Upload

Upload Audio File

Upload an audio file for transcription and word extraction.

Endpoint: POST /api/upload/

Content-Type: multipart/form-data

Request Body:

file: <audio file>

Supported Formats:

  • MP3 (audio/mpeg)
  • WAV (audio/wav)
  • M4A (audio/x-m4a, audio/mp4)

File Size Limit: 100MB

Example Request (curl):

curl -X POST http://localhost:8000/api/upload/ \
  -F "file=@/path/to/audio.mp3"

Example Request (JavaScript):

const formData = new FormData();
formData.append('file', audioFile);

const response = await fetch('http://localhost:8000/api/upload/', {
  method: 'POST',
  body: formData
});

const data = await response.json();

Success Response (201 Created):

{
  "id": 1,
  "file": "/media/audio/2025/10/08/sample.mp3",
  "original_filename": "sample.mp3",
  "file_size": 5242880,
  "file_size_mb": 5.0,
  "duration": null,
  "status": "pending",
  "error_message": null,
  "uploaded_at": "2025-10-08T10:30:00Z",
  "processing_started_at": null,
  "processing_completed_at": null,
  "processing_time": null
}

Status Values:

  • pending - Queued for processing
  • processing - Processing started
  • transcribing - Running Whisper transcription
  • analyzing - Extracting and classifying words
  • completed - Processing finished successfully
  • failed - Processing failed (check error_message)

Error Response (400 Bad Request):

{
  "file": ["File size must be less than 100MB"]
}

Processing Status

Get Processing Status

Check the current status of an audio file's processing.

Endpoint: GET /api/status/<audio_id>/

Example Request:

curl http://localhost:8000/api/status/1/

Success Response (200 OK):

{
  "id": 1,
  "status": "analyzing",
  "progress": 70,
  "error_message": null,
  "original_filename": "sample.mp3",
  "uploaded_at": "2025-10-08T10:30:00Z",
  "processing_started_at": "2025-10-08T10:30:05Z",
  "processing_completed_at": null,
  "has_transcription": true,
  "transcription_id": 1
}

Progress Values:

  • 0% - Pending or failed
  • 20% - Processing started
  • 40% - Transcribing
  • 70% - Analyzing words
  • 100% - Completed

Polling Recommendation: Poll this endpoint every 2-5 seconds until status is completed or failed.


Transcriptions

List Transcriptions

Get a list of all transcriptions.

Endpoint: GET /api/transcriptions/

Example Response:

{
  "count": 10,
  "next": null,
  "previous": null,
  "results": [
    {
      "id": 1,
      "audio_file": {
        "id": 1,
        "original_filename": "sample.mp3",
        "status": "completed"
      },
      "text": "Full transcription text here...",
      "language": "en",
      "word_count": 150,
      "unique_word_count": 85,
      "statistics": {
        "id": 1,
        "a1_count": 45,
        "a2_count": 20,
        "b1_count": 10,
        "b2_count": 5,
        "c1_count": 3,
        "c2_count": 2,
        "unknown_count": 0,
        "total_words": 85,
        "level_distribution": {
          "A1": 52.9,
          "A2": 23.5,
          "B1": 11.8,
          "B2": 5.9,
          "C1": 3.5,
          "C2": 2.4,
          "Unknown": 0.0
        }
      },
      "created_at": "2025-10-08T10:31:00Z",
      "updated_at": "2025-10-08T10:31:00Z"
    }
  ]
}

Get Transcription Detail

Get detailed information about a specific transcription, including all extracted words.

Endpoint: GET /api/transcriptions/<id>/

Example Request:

curl http://localhost:8000/api/transcriptions/1/

Example Response:

{
  "id": 1,
  "audio_file": {
    "id": 1,
    "original_filename": "sample.mp3",
    "file_size_mb": 5.0,
    "duration": 180.5,
    "status": "completed"
  },
  "text": "Hello everyone, this is a sample transcription. We will learn about difficult paradigms and simple concepts.",
  "language": "en",
  "word_count": 17,
  "unique_word_count": 15,
  "statistics": {
    "id": 1,
    "a1_count": 8,
    "a2_count": 3,
    "b1_count": 2,
    "b2_count": 1,
    "c1_count": 1,
    "c2_count": 0,
    "unknown_count": 0,
    "total_words": 15,
    "level_distribution": {
      "A1": 53.3,
      "A2": 20.0,
      "B1": 13.3,
      "B2": 6.7,
      "C1": 6.7,
      "C2": 0.0,
      "Unknown": 0.0
    }
  },
  "extracted_words": [
    {
      "id": 1,
      "word": {
        "id": 1,
        "text": "hello",
        "lemma": "hello",
        "cefr_level": "A1",
        "cefr_level_display": "A1 - Beginner",
        "global_frequency": 1
      },
      "context": "Hello everyone, this is a sample transcription.",
      "timestamp": 0.5,
      "position": 0,
      "frequency": 1
    },
    {
      "id": 2,
      "word": {
        "id": 2,
        "text": "paradigm",
        "lemma": "paradigm",
        "cefr_level": "C1",
        "cefr_level_display": "C1 - Advanced",
        "global_frequency": 1
      },
      "context": "We will learn about difficult paradigms and simple concepts.",
      "timestamp": 8.2,
      "position": 12,
      "frequency": 1
    }
  ]
}

Get Words by CEFR Level

Get words from a transcription, filtered by CEFR level.

Endpoint: GET /api/transcriptions/<id>/words/?level=<levels>

Query Parameters:

  • level (optional) - Comma-separated CEFR levels (A1, A2, B1, B2, C1, C2)

Example Requests:

# Get all words
curl http://localhost:8000/api/transcriptions/1/words/

# Get only A1 words
curl http://localhost:8000/api/transcriptions/1/words/?level=A1

# Get B1 and B2 words
curl http://localhost:8000/api/transcriptions/1/words/?level=B1,B2

# Get advanced words (C1 and C2)
curl http://localhost:8000/api/transcriptions/1/words/?level=C1,C2

Example Response:

[
  {
    "id": 5,
    "word": {
      "id": 5,
      "text": "difficult",
      "lemma": "difficult",
      "cefr_level": "B1",
      "cefr_level_display": "B1 - Intermediate",
      "global_frequency": 1
    },
    "context": "We will learn about difficult paradigms.",
    "timestamp": 7.8,
    "position": 11,
    "frequency": 1
  },
  {
    "id": 6,
    "word": {
      "id": 6,
      "text": "concept",
      "lemma": "concept",
      "cefr_level": "B2",
      "cefr_level_display": "B2 - Upper Intermediate",
      "global_frequency": 1
    },
    "context": "difficult paradigms and simple concepts.",
    "timestamp": 9.5,
    "position": 14,
    "frequency": 1
  }
]

Get Statistics

Get word statistics for a transcription.

Endpoint: GET /api/transcriptions/<id>/statistics/

Example Response:

{
  "id": 1,
  "a1_count": 8,
  "a2_count": 3,
  "b1_count": 2,
  "b2_count": 1,
  "c1_count": 1,
  "c2_count": 0,
  "unknown_count": 0,
  "total_words": 15,
  "level_distribution": {
    "A1": 53.3,
    "A2": 20.0,
    "B1": 13.3,
    "B2": 6.7,
    "C1": 6.7,
    "C2": 0.0,
    "Unknown": 0.0
  },
  "created_at": "2025-10-08T10:31:00Z"
}

Words

List All Words

Get a list of all words in the database.

Endpoint: GET /api/words/

Query Parameters:

  • level (optional) - Filter by CEFR level (A1, A2, B1, B2, C1, C2)
  • search (optional) - Search by word text or lemma

Example Requests:

# Get all words
curl http://localhost:8000/api/words/

# Get only C1 words
curl http://localhost:8000/api/words/?level=C1

# Search for words
curl http://localhost:8000/api/words/?search=paradigm

Example Response:

{
  "count": 100,
  "next": "http://localhost:8000/api/words/?page=2",
  "previous": null,
  "results": [
    {
      "id": 1,
      "text": "hello",
      "lemma": "hello",
      "cefr_level": "A1",
      "cefr_level_display": "A1 - Beginner",
      "global_frequency": 5,
      "created_at": "2025-10-08T10:31:00Z"
    },
    {
      "id": 2,
      "text": "paradigm",
      "lemma": "paradigm",
      "cefr_level": "C1",
      "cefr_level_display": "C1 - Advanced",
      "global_frequency": 2,
      "created_at": "2025-10-08T10:31:15Z"
    }
  ]
}

Get Word Detail

Get detailed information about a specific word.

Endpoint: GET /api/words/<id>/

Example Response:

{
  "id": 1,
  "text": "paradigm",
  "lemma": "paradigm",
  "cefr_level": "C1",
  "cefr_level_display": "C1 - Advanced",
  "global_frequency": 2,
  "created_at": "2025-10-08T10:31:15Z"
}

Error Handling

Error Response Format

All errors follow a consistent format:

{
  "detail": "Error message here"
}

Or for validation errors:

{
  "field_name": ["Error message for this field"]
}

HTTP Status Codes

  • 200 OK - Successful GET request
  • 201 Created - Successful POST request (resource created)
  • 400 Bad Request - Validation error or malformed request
  • 404 Not Found - Resource not found
  • 500 Internal Server Error - Server error

Common Errors

File Too Large:

{
  "file": ["File size must be less than 100MB"]
}

Invalid File Format:

{
  "file": ["File format not supported. Allowed formats: audio/mpeg, audio/wav, audio/x-m4a, audio/mp4"]
}

Resource Not Found:

{
  "detail": "Not found."
}

Processing Failed:

Check the error_message field in the audio file status:

{
  "id": 1,
  "status": "failed",
  "error_message": "Failed to transcribe audio: Invalid audio format"
}

Rate Limiting

Phase 1 (MVP): No rate limiting

Phase 2+: Rate limiting will be implemented:

  • 100 requests per hour per IP
  • 10 file uploads per hour per IP

CORS

CORS is configured to allow requests from:

  • http://localhost:3000 (React development server)
  • http://localhost:80 (Frontend in production)

Example Workflows

Complete Upload and Processing Workflow

// 1. Upload file
const formData = new FormData();
formData.append('file', audioFile);

const uploadResponse = await fetch('http://localhost:8000/api/upload/', {
  method: 'POST',
  body: formData
});

const audioFile = await uploadResponse.json();
const audioId = audioFile.id;

// 2. Poll for status
const pollStatus = async () => {
  const statusResponse = await fetch(`http://localhost:8000/api/status/${audioId}/`);
  const status = await statusResponse.json();
  
  if (status.status === 'completed') {
    return status.transcription_id;
  } else if (status.status === 'failed') {
    throw new Error(status.error_message);
  } else {
    // Still processing, poll again
    await new Promise(resolve => setTimeout(resolve, 3000));
    return pollStatus();
  }
};

const transcriptionId = await pollStatus();

// 3. Get transcription and words
const transcriptionResponse = await fetch(
  `http://localhost:8000/api/transcriptions/${transcriptionId}/`
);
const transcription = await transcriptionResponse.json();

// 4. Get words by level (e.g., B1 and above)
const wordsResponse = await fetch(
  `http://localhost:8000/api/transcriptions/${transcriptionId}/words/?level=B1,B2,C1,C2`
);
const words = await wordsResponse.json();

console.log('Transcription:', transcription.text);
console.log('Advanced words:', words);

Pagination

List endpoints use pagination with a default page size of 50 items.

Response Format:

{
  "count": 150,
  "next": "http://localhost:8000/api/words/?page=2",
  "previous": null,
  "results": [...]
}

Query Parameters:

  • page - Page number (default: 1)
  • page_size - Items per page (max: 100)

Notes

  • All timestamps are in UTC
  • All durations are in seconds
  • File paths are relative to MEDIA_ROOT
  • Audio files are stored in /media/audio/YYYY/MM/DD/ format

API Version: 1.0
Last Updated: October 8, 2025