- Audio upload
- Transcription processing
- Job status
- Retrieve transcription
- JWT authentication
- Rate limiting
- Webhook support to notify clients when transcription is complete
- User management (register/login)
- API documentation
- Dowload links for uploaded audio and generated transcript
- Support for .flac file format
flowchart TD
subgraph Client Layer
Client[Client Application]
end
subgraph Backend
A[HTTP Server]
db[(PostgreSQL DB)]
queue["BullMQ - Redis Queue"]
webhook[Webhook Endpoint /sns-callback]
end
subgraph Cloud Services
s3[(Amazon S3 Bucket)]
SNS[Amazon SNS]
transcribe[Amazon Transcribe]
end
subgraph Worker Pool
worker1[Worker 1]
worker2[Worker 2]
worker3[Worker 3]
worker4[Worker 4]
worker5[Worker 5]
end
%% Client flow
Client --> A
A --> db
A --> queue
A --> s3
%% Workers
queue --> worker1
queue --> worker2
queue --> worker3
queue --> worker4
queue --> worker5
worker1 --> transcribe
worker2 --> transcribe
worker3 --> transcribe
worker4 --> transcribe
worker5 --> transcribe
transcribe --> s3
%% Notification & callback
s3 --> SNS
SNS --> webhook
webhook --> db
🔧 AWS Setup Guide
See docs/aws-setup.md for full instructions on how to:
- Configure S3 bucket with policy
- Set up IAM user and permissions
- Configure SNS topic and connect it to your backend
- Use Ngrok for local webhook testing
- Node.js v18+
- Redis 7+
- PostgreSQL 16+
- AWS credentials with access to:
- S3 (read/write)
- Transcribe
- SNS
git clone https://github.com/vaidik-bajpai/Audio-Transcription-API.git
cd Audio-Transcription-APInpm install
Create a .env file in the root directory of your project.
Click to expand .env example
# Server Configuration
PORT=8080 # Port your server will run on
# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=your_aws_region # e.g., us-east-1
AWS_S3_BUCKET=your_bucket_name # e.g., audio-transcription-files
# Database Connection
DATABASE_URL=postgresql://user:password@localhost:5432/transcriptiondb
# File Upload Limit
MAX_FILE_BYTES=10485760 # 10 MB in bytes
# JWT Authentication Secrets
ACCESS_TOKEN_SECRET=your_access_token_secret
REFRESH_TOKEN_SECRET=your_refresh_token_secretMake sure PostgreSQL and Redis are running in Docker containers. You can spin them up using the following commands:
# PostgreSQL
docker run --name postgres-transcription \
-e POSTGRES_USER=transcriber \
-e POSTGRES_PASSWORD=secret123 \
-e POSTGRES_DB=transcriptiondb \
-p 5432:5432 \
-d postgres:16
# Redis
docker run --name redis-transcription \
-p 6379:6379 \
-d redis:7Run the following Prisma commands to set up your database schema and generate the Prisma client:
npx prisma generate
npx prisma db pushStart your API server using:
npm run devEnsure PostgreSQL container is running before starting the server. If successful, you should see something like:
> audio-transcription-api@1.0.0 dev
> tsx watch src/index.ts
Server running on port 8080Start your worker processes
npm run workerEnsure Redis container is running before starting the server. If successful, you should see something like:
> audio-transcription-api@1.0.0 worker
> tsx src/jobs/worker.ts
worker startedUse Postman, Hoppscotch, or curl to test the following API endpoints:
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/users/signup |
Register a new user |
| POST | /api/users/login |
Log in and receive access/refresh tokens |
| POST | /api/users/logout |
Log out and invalidate the refresh token |
| POST | /api/users/refresh |
Refresh access token using refresh token |
| POST | /api/transcription/upload |
Upload an audio file for transcription |
| GET | /api/transcription/status/:id |
Check the status of a transcription job |
| GET | /api/transcription/result/:id |
Retrieve the transcription result |
| GET | /api/transcription/links/:id |
Retrieve the presigned download urls |