This document defines the interface requirements for the Modal service that handles video clipping functionality.
https://kaiber-ai--clip-video-fastapi-app.modal.run
Initiates a new video clipping job.
Endpoint: POST /start
Request Headers:
Content-Type: application/json
Request Body:
{
"url": "https://example.com/video.mp4",
"width": 1280,
"height": 720
}Request Schema:
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | The URL of the video to clip |
width |
number | Yes | Video width in pixels |
height |
number | Yes | Video height in pixels |
Response (Success - 200 OK):
{
"job_id": "uuid-string"
}Response Schema:
| Field | Type | Required | Description |
|---|---|---|---|
job_id |
string | Yes | Unique identifier for the clipping job |
Error Response:
- Returns non-200 status code on failure
- Error details should be included in response body
Retrieves the current status of a clipping job.
Endpoint: GET /status/{job_id}
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
job_id |
string | Yes | The job ID returned from /start |
Request Headers:
Content-Type: application/json
Response (Processing/Queued):
{
"status": "processing",
"progress": 0.45
}Response Schema (Processing):
| Field | Type | Required | Description |
|---|---|---|---|
status |
string | Yes | Must be "processing" or "queued" |
progress |
number | No | Progress value (0.0-1.0), defaults to 0.1 |
Response (Completed):
{
"status": "completed",
"scenes": [
{
"key": "user-videos/temp/hash1.mp4",
"length": 5.5,
"width": 1280,
"height": 720,
"url": "https://..."
},
{
"key": "user-videos/temp/hash2.mp4",
"length": 3.2,
"width": 1280,
"height": 720,
"url": "https://..."
}
],
"progress": 1.0
}Response Schema (Completed):
| Field | Type | Required | Description |
|---|---|---|---|
status |
string | Yes | Must be "completed" |
scenes |
array | Yes | Array of clipped video scenes (can be empty) |
progress |
number | No | Progress value (1.0 for completed) |
Scene Object Schema:
| Field | Type | Required | Description |
|---|---|---|---|
key |
string | No | R2 storage key for the clip |
length |
number | No | Duration of clip in seconds |
width |
number | No | Video width in pixels |
height |
number | No | Video height in pixels |
url |
string | No | Presigned R2 URL (valid for 24 hours) |
Response (Failed):
{
"status": "failed",
"error": "Error message describing what went wrong"
}Response Schema (Failed):
| Field | Type | Required | Description |
|---|---|---|---|
status |
string | Yes | Must be "failed" |
error |
string | No | Error message, defaults to "Unknown error" |
- Job Created → Status:
"processing"withprogress: 0.1 - Job In Progress → Status:
"processing"withprogress(0.0-1.0, e.g., 0.45) - Job Complete → Status:
"completed"withscenesarray andprogress: 1.0 - Job Failed → Status:
"failed"witherrormessage
The Kaiber server (ClipVideoService.ts) performs the following transformations:
-
On Start:
- Sends
{ url: source, width: ..., height: ... }to Modal/start - Receives
{ job_id }from Modal - Returns
{ jobId, createdAt }to client
- Sends
-
On Status Check:
- Sends GET to Modal
/status/{jobId} - Transforms Modal response to internal schema
- On Completed: Creates MongoDB Media documents for each scene
- Returns standardized response to client
- Sends GET to Modal
-
Media Creation:
- Each scene becomes a
Mediadocument with:- New
mediaId(UUID v4) type: MediaType.Videopath.key: scene.urlorscene.video_url- Thumbnail generation via
genThumbnailForMedia
- New
- Each scene becomes a
- Non-200 responses throw
InternalServerErrorException - Missing scenes array defaults to empty array
[] - Missing optional fields use sensible defaults:
progress: defaults to0error: defaults to"Unknown error"
The Modal app now includes a process_video_with_gemini function that:
- Takes a video URL, width, and height
- Streams the video at 1 FPS with scale 1280 using FFmpeg directly to Google's Gemini File API
- Downloads the high-resolution video in parallel for clipping
- Polls until the Gemini file is ACTIVE
- Uses
gemini-3-flash-previewto analyze the video and extract scene timestamps - Validates and filters scenes (minimum 2 seconds, trims 0.2s from start and 0.5s from end)
- Clips scenes in parallel using fan-out workers that upload to R2
- Returns structured JSON with scene boundaries, descriptions, and R2 URLs
You can call the Gemini processing function directly using Modal CLI (note: requires url, width, and height parameters):
curl -X POST https://kaiber-ai--clip-video-fastapi-app.modal.run/start \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/video.mp4", "width": 1280, "height": 720}'The Gemini processing function returns JSON with scene data:
[
{
"start_time": 0.0,
"end_time": 5.5,
"url": "https://...",
"key": "user-videos/temp/hash.mp4",
"filename": "hash.mp4",
"width": 1280,
"height": 720,
"length": 5.5,
"description": "Brief scene description"
}
]-
FFmpeg: Automatically installed in the Modal container image
-
Google Generative AI SDK: Automatically installed via requirements.txt
All processing steps are logged to Modal logs:
- FFmpeg stream progress (1 FPS with scale 1280)
- High-resolution video download progress
- Gemini File API upload status
- Polling for ACTIVE state
- Scene validation and filtering
- Parallel clip processing and R2 uploads
- Final scene data (printed to logs)
Access logs via Modal dashboard or CLI:
modal logs clip-videocurl -X POST https://kaiber-ai--clip-video-fastapi-app.modal.run/start \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/video.mp4", "width": 1280, "height": 720}'curl -X GET https://kaiber-ai--clip-video-fastapi-app.modal.run/status/{job_id} \
-H "Content-Type: application/json"