Skip to content

API that accepts audio file and asynchronously transcript them and informs the server using webhook

Notifications You must be signed in to change notification settings

vaidik-bajpai/Audio-Transcription-API

Repository files navigation

Audio Transcription API

Features

  • Audio upload
  • Transcription processing
  • Job status
  • Retrieve transcription
  • JWT authentication
  • Rate limiting
  • Webhook support to notify clients when transcription is complete
  • User management (register/login)
  • API documentation

Bonus Features

  • Dowload links for uploaded audio and generated transcript
  • Support for .flac file format

API Architecture

flowchart  TD
subgraph  Client  Layer
Client[Client  Application]
end

subgraph  Backend
A[HTTP  Server]
db[(PostgreSQL DB)]
queue["BullMQ  -  Redis  Queue"]
webhook[Webhook  Endpoint /sns-callback]
end

subgraph  Cloud  Services
s3[(Amazon S3 Bucket)]
SNS[Amazon  SNS]
transcribe[Amazon  Transcribe]
end

subgraph  Worker  Pool
worker1[Worker 1]
worker2[Worker 2]
worker3[Worker 3]
worker4[Worker 4]
worker5[Worker 5]
end

%% Client flow
Client  --> A
A  --> db
A  --> queue
A  --> s3

%% Workers
queue  --> worker1
queue  --> worker2
queue  --> worker3
queue  --> worker4
queue  --> worker5

worker1  --> transcribe
worker2  --> transcribe
worker3  --> transcribe
worker4  --> transcribe
worker5  --> transcribe

transcribe  --> s3

%% Notification & callback
s3  --> SNS
SNS  --> webhook
webhook  --> db
Loading

📚 Documentation

🔧 AWS Setup Guide

See docs/aws-setup.md for full instructions on how to:

  • Configure S3 bucket with policy
  • Set up IAM user and permissions
  • Configure SNS topic and connect it to your backend
  • Use Ngrok for local webhook testing

Setup Guide

Prerequisites


1. Clone the Repository

git clone https://github.com/vaidik-bajpai/Audio-Transcription-API.git
cd Audio-Transcription-API

2. Install Dependencies

npm install

3. Set Up Environment Variables

Create a .env file in the root directory of your project.

Click to expand .env example
# Server Configuration
PORT=8080                         # Port your server will run on

# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=your_aws_region        # e.g., us-east-1
AWS_S3_BUCKET=your_bucket_name    # e.g., audio-transcription-files

# Database Connection
DATABASE_URL=postgresql://user:password@localhost:5432/transcriptiondb

# File Upload Limit
MAX_FILE_BYTES=10485760           # 10 MB in bytes

# JWT Authentication Secrets
ACCESS_TOKEN_SECRET=your_access_token_secret
REFRESH_TOKEN_SECRET=your_refresh_token_secret

4. Set Up the Database (via Docker)

Make sure PostgreSQL and Redis are running in Docker containers. You can spin them up using the following commands:

# PostgreSQL
docker run --name postgres-transcription \
  -e POSTGRES_USER=transcriber \
  -e POSTGRES_PASSWORD=secret123 \
  -e POSTGRES_DB=transcriptiondb \
  -p 5432:5432 \
  -d postgres:16

# Redis
docker run --name redis-transcription \
  -p 6379:6379 \
  -d redis:7

5. Initialize Prisma

Run the following Prisma commands to set up your database schema and generate the Prisma client:

npx prisma generate
npx prisma db push

6. Start the Development Server

Start your API server using:

npm run dev

Ensure PostgreSQL container is running before starting the server. If successful, you should see something like:

> audio-transcription-api@1.0.0 dev
> tsx watch src/index.ts

Server running on port 8080

Start your worker processes

npm run worker

Ensure Redis container is running before starting the server. If successful, you should see something like:

> audio-transcription-api@1.0.0 worker
> tsx src/jobs/worker.ts

worker started

7. Test the API

Use Postman, Hoppscotch, or curl to test the following API endpoints:

Method Endpoint Description
POST /api/users/signup Register a new user
POST /api/users/login Log in and receive access/refresh tokens
POST /api/users/logout Log out and invalidate the refresh token
POST /api/users/refresh Refresh access token using refresh token
POST /api/transcription/upload Upload an audio file for transcription
GET /api/transcription/status/:id Check the status of a transcription job
GET /api/transcription/result/:id Retrieve the transcription result
GET /api/transcription/links/:id Retrieve the presigned download urls

About

API that accepts audio file and asynchronously transcript them and informs the server using webhook

Docs: https://app.swaggerhub.com/apis/vaidik-81d/Audio_Transcription/1.0.0

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages