Complete reference for Gosper's HTTP API endpoints.
- Overview
- Authentication
- Endpoints
- Request Format
- Response Format
- Error Handling
- Rate Limiting
- Client Examples
Base URL: http://your-server:8080
Content Type: multipart/form-data for file uploads
Response Format: application/json
Max Request Size: 200 MB (for MP3 files), unlimited for WAV
Currently, Gosper API does not require authentication. For production deployments:
Recommended Approaches:
- API Gateway - Add auth layer (Kong, Traefik, nginx)
- VPN/Private Network - Restrict network access
- Cloudflare Tunnel - Use Cloudflare Access for authentication
- Custom Middleware - Add API key validation
Example with nginx (API key validation):
location /api/ {
if ($http_x_api_key != "your-secret-key") {
return 401;
}
proxy_pass http://gosper-backend:8080;
}Transcribe an audio file to text.
Request
POST /api/transcribe HTTP/1.1
Host: localhost:8080
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryForm Data Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
audio |
file | ✅ Yes | Audio file (WAV or MP3) |
model |
string | ❌ No | Model name (default: ggml-tiny.en.bin) |
lang |
string | ❌ No | Language code or auto (default: auto) |
Supported Audio Formats:
- WAV:
.wav,.Wave,.WAV - MP3:
.mp3,.MP3(max 200 MB)
Supported Languages (multilingual models):
auto- Automatic detectionen- Englishes- Spanishfr- Frenchde- Germanja- Japanesezh- Chinese- ...and 90+ more
Example Request (curl):
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@recording.mp3" \
-F "lang=auto"Example Request (with specific model):
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@meeting.wav" \
-F "model=ggml-base.en.bin" \
-F "lang=en"Response (Success - 200 OK):
{
"text": "This is the complete transcribed text from your audio file.",
"language": "en",
"duration_ms": 5420,
"segments": [
{
"start_ms": 0,
"end_ms": 2800,
"text": "This is the complete transcribed text"
},
{
"start_ms": 2800,
"end_ms": 5420,
"text": " from your audio file."
}
]
}Response Fields:
| Field | Type | Description |
|---|---|---|
text |
string | Complete transcription text |
language |
string | Detected or specified language code |
duration_ms |
int | Processing time in milliseconds |
segments |
array | Individual speech segments with timestamps |
segments[].start_ms |
int | Segment start time (milliseconds) |
segments[].end_ms |
int | Segment end time (milliseconds) |
segments[].text |
string | Segment text |
Error Response (400 Bad Request):
{
"error": "audio file is required"
}Error Response (500 Internal Server Error):
{
"error": "transcription failed: model not found"
}Health check endpoint for monitoring and load balancers.
Request:
GET /health HTTP/1.1
Host: localhost:8080Response (200 OK):
{
"status": "healthy",
"version": "1.0.0"
}Example:
curl http://localhost:8080/healthUse Cases:
- Kubernetes liveness/readiness probes
- Load balancer health checks
- Monitoring systems
Audio files must be sent as multipart/form-data:
Structure:
POST /api/transcribe
Content-Type: multipart/form-data; boundary=----Boundary
------Boundary
Content-Disposition: form-data; name="audio"; filename="recording.mp3"
Content-Type: audio/mpeg
[binary audio data]
------Boundary
Content-Disposition: form-data; name="lang"
auto
------Boundary--
| Format | Maximum Size | Reason |
|---|---|---|
| WAV | Unlimited | Efficient streaming decode |
| MP3 | 200 MB | Memory protection (~600 MB decoded) |
For Large Files:
# Convert MP3 > 200MB to WAV
ffmpeg -i large-audio.mp3 large-audio.wav
curl -F "audio=@large-audio.wav" http://localhost:8080/api/transcribeStructure:
{
"text": string,
"language": string,
"duration_ms": integer,
"segments": [
{
"start_ms": integer,
"end_ms": integer,
"text": string
}
]
}Example (Short Audio):
{
"text": "Hello world.",
"language": "en",
"duration_ms": 856,
"segments": [
{
"start_ms": 0,
"end_ms": 856,
"text": "Hello world."
}
]
}Example (Long Audio with Multiple Segments):
{
"text": "This is a longer transcription. It contains multiple sentences. Each sentence may be a separate segment.",
"language": "en",
"duration_ms": 12340,
"segments": [
{
"start_ms": 0,
"end_ms": 3200,
"text": "This is a longer transcription."
},
{
"start_ms": 3200,
"end_ms": 6800,
"text": " It contains multiple sentences."
},
{
"start_ms": 6800,
"end_ms": 12340,
"text": " Each sentence may be a separate segment."
}
]
}Structure:
{
"error": string
}Common Error Messages:
| HTTP Status | Error Message | Cause | Solution |
|---|---|---|---|
| 400 | audio file is required |
Missing audio form field |
Include audio file in request |
| 400 | unsupported audio format: .m4a |
Unsupported file extension | Convert to WAV or MP3 |
| 400 | mp3: file too large (250 MB, max 200 MB) |
MP3 exceeds 200 MB | Convert to WAV or compress |
| 400 | mp3: invalid format |
Corrupted or invalid MP3 | Verify file integrity |
| 500 | model not found: ggml-xyz.bin |
Invalid model name | Use valid model name |
| 500 | transcription failed |
Internal processing error | Check server logs |
| Code | Meaning | Description |
|---|---|---|
| 200 | OK | Request successful |
| 400 | Bad Request | Invalid request (missing file, unsupported format) |
| 413 | Payload Too Large | File exceeds server limits |
| 500 | Internal Server Error | Server-side processing error |
| 503 | Service Unavailable | Server overloaded or starting up |
Recommended:
- 400 errors: Do not retry (client error)
- 500 errors: Retry with exponential backoff (server error)
- 503 errors: Retry after delay (service temporarily unavailable)
Example (Python with retries):
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=1,
status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
response = session.post(
'http://localhost:8080/api/transcribe',
files={'audio': open('recording.mp3', 'rb')},
data={'lang': 'auto'}
)Current Behavior: No rate limiting implemented.
For Production:
- Use API gateway (Kong, Traefik) with rate limiting
- Use nginx
limit_reqmodule - Implement application-level throttling
Example (nginx rate limiting):
http {
limit_req_zone $binary_remote_addr zone=transcribe:10m rate=10r/m;
server {
location /api/transcribe {
limit_req zone=transcribe burst=5;
proxy_pass http://gosper-backend:8080;
}
}
}Basic Transcription:
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@recording.mp3" \
-F "lang=auto"With Custom Model:
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@meeting.wav" \
-F "model=ggml-medium.en.bin" \
-F "lang=en"Save Response to File:
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@audio.mp3" \
-F "lang=auto" \
-o transcript.jsonParse with jq:
curl -X POST http://localhost:8080/api/transcribe \
-F "audio=@audio.mp3" \
-F "lang=auto" | jq -r '.text'Using requests:
import requests
# Basic transcription
with open('recording.mp3', 'rb') as audio_file:
files = {'audio': audio_file}
data = {'lang': 'auto'}
response = requests.post(
'http://localhost:8080/api/transcribe',
files=files,
data=data
)
if response.status_code == 200:
result = response.json()
print(f"Transcription: {result['text']}")
print(f"Language: {result['language']}")
print(f"Duration: {result['duration_ms']}ms")
else:
print(f"Error: {response.json()['error']}")With Custom Model:
import requests
files = {'audio': open('meeting.wav', 'rb')}
data = {
'model': 'ggml-base.en.bin',
'lang': 'en'
}
response = requests.post(
'http://localhost:8080/api/transcribe',
files=files,
data=data
)
result = response.json()
for segment in result['segments']:
start = segment['start_ms'] / 1000
end = segment['end_ms'] / 1000
text = segment['text']
print(f"[{start:.2f}s - {end:.2f}s] {text}")Using axios:
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
async function transcribe(audioPath) {
const form = new FormData();
form.append('audio', fs.createReadStream(audioPath));
form.append('lang', 'auto');
try {
const response = await axios.post(
'http://localhost:8080/api/transcribe',
form,
{ headers: form.getHeaders() }
);
console.log('Transcription:', response.data.text);
console.log('Language:', response.data.language);
console.log('Duration:', response.data.duration_ms, 'ms');
return response.data;
} catch (error) {
console.error('Error:', error.response?.data?.error || error.message);
throw error;
}
}
transcribe('recording.mp3');Using Fetch API:
async function transcribeAudio(audioFile) {
const formData = new FormData();
formData.append('audio', audioFile);
formData.append('lang', 'auto');
try {
const response = await fetch('http://localhost:8080/api/transcribe', {
method: 'POST',
body: formData
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.error);
}
const result = await response.json();
console.log('Transcription:', result.text);
return result;
} catch (error) {
console.error('Transcription failed:', error.message);
throw error;
}
}
// Usage with file input
document.getElementById('audioInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (file) {
const result = await transcribeAudio(file);
document.getElementById('output').textContent = result.text;
}
});package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
)
type TranscriptResponse struct {
Text string `json:"text"`
Language string `json:"language"`
DurationMs int `json:"duration_ms"`
Segments []Segment `json:"segments"`
}
type Segment struct {
StartMs int `json:"start_ms"`
EndMs int `json:"end_ms"`
Text string `json:"text"`
}
func transcribe(audioPath string) (*TranscriptResponse, error) {
// Open audio file
file, err := os.Open(audioPath)
if err != nil {
return nil, err
}
defer file.Close()
// Create multipart form
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
// Add audio file
part, err := writer.CreateFormFile("audio", audioPath)
if err != nil {
return nil, err
}
io.Copy(part, file)
// Add language parameter
writer.WriteField("lang", "auto")
writer.Close()
// Send request
resp, err := http.Post(
"http://localhost:8080/api/transcribe",
writer.FormDataContentType(),
body,
)
if err != nil {
return nil, err
}
defer resp.Body.Close()
// Parse response
var result TranscriptResponse
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
return nil, err
}
return &result, nil
}
func main() {
result, err := transcribe("recording.mp3")
if err != nil {
fmt.Println("Error:", err)
return
}
fmt.Println("Transcription:", result.Text)
fmt.Println("Language:", result.Language)
fmt.Printf("Duration: %dms\n", result.DurationMs)
}require 'net/http'
require 'json'
def transcribe(audio_path)
uri = URI('http://localhost:8080/api/transcribe')
request = Net::HTTP::Post.new(uri)
form_data = [
['audio', File.open(audio_path)],
['lang', 'auto']
]
request.set_form(form_data, 'multipart/form-data')
response = Net::HTTP.start(uri.hostname, uri.port) do |http|
http.request(request)
end
if response.code == '200'
result = JSON.parse(response.body)
puts "Transcription: #{result['text']}"
puts "Language: #{result['language']}"
puts "Duration: #{result['duration_ms']}ms"
result
else
error = JSON.parse(response.body)
puts "Error: #{error['error']}"
nil
end
end
transcribe('recording.mp3')Transcription time depends on:
- Audio duration - Longer audio = longer processing
- Model size - Larger models are slower but more accurate
- Server resources - CPU/memory availability
- Thread count - Set via
GOSPER_THREADS
Typical Performance (with ggml-base.en.bin on 4-core CPU):
- 1 minute audio → ~20 seconds processing (~3x real-time)
- 10 minutes audio → ~3 minutes processing
- 1 hour audio → ~18 minutes processing
Gosper processes requests sequentially (no built-in queuing).
For High Concurrency:
- Deploy multiple Gosper instances behind load balancer
- Use Kubernetes HPA (Horizontal Pod Autoscaler)
- Implement request queue (Redis, RabbitMQ)
Example (Kubernetes HPA):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gosper-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gosper-be
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70- Quick Start - Get started quickly
- Configuration - Environment variables and settings
- Deployment - Production deployment guide
- Troubleshooting - Common issues and solutions