A TypeScript agent that listens for tasks via the A2A (Agent-to-Agent) protocol, automatically generates images and videos from text prompts using advanced AI models. It supports real-time notifications (SSE) and webhooks, and is designed for seamless orchestration in multi-agent workflows.
The Image & Video Generation Agent is designed to:
- Receive prompts for image or video generation via the A2A protocol.
- Generate images or videos using state-of-the-art AI models, based on the provided prompt and parameters.
- Output the final artifact (image or video URL) and metadata.
- Support real-time updates and notifications via SSE and webhooks.
This agent implements the A2A protocol, enabling standard orchestration and communication between Nevermined agents and third-party systems.
This agent is part of an AI-powered multimedia creation ecosystem. See how it interacts with other agents:
- Music Video Orchestrator Agent
- Orchestrates end-to-end workflows: collects prompts, splits tasks, pays agents, merges results.
- Script Generator Agent
- Generates cinematic scripts, extracts scenes and characters, produces prompts for video.
- Song Generator Agent
- Produces music using third-party APIs and AI models.
Workflow example:
[ User Prompt ] --> [Music Orchestrator] --> [Song Generation] --> [Script Generation] --> [Image/Video Generation] --> [Final Compilation]
- Features
- Prerequisites
- Installation
- Environment Variables
- Project Structure
- Architecture & Workflow
- A2A Protocol
- Skills
- Usage
- Examples & Scripts
- Development & Testing
- License
- A2A protocol: Full support for task orchestration, state transitions, SSE notifications, and webhooks.
- Image and video generation: Advanced AI models for text-to-image, image-to-image, and text-to-video.
- Real-time notifications: SSE and webhook support for task updates.
- Configurable: Customize prompts, models, and parameters.
- Logging and error management: Detailed logs via a custom
Logger. - Modular and SOLID architecture: Each class/function has a clear responsibility.
- Node.js (>= 18.0.0 recommended)
- TypeScript (^5.7.0 or higher)
- API keys for any third-party AI services used (if required)
- Clone the repository:
git clone https://github.com/nevermined-io/video-generation-agent-a2a.git cd video-generation-agent-a2a - Install dependencies:
yarn install
- Configure the environment:
cp .env.example .env # Edit .env and add your keys - Build the project (optional for production):
yarn build
Rename .env.example to .env and set the required keys:
# Example
FAL_API_KEY=your_fal_key
PIAPI_KEY=your_piapi_key
DEMO_MODE=trueFAL_API_KEY: Access to Fal.ai for image/video generation (if used).PIAPI_KEY: Access to TTapi for video generation (if used).DEMO_MODE: Set totrueto use the demo video client that simulates API responses without making external API calls (default:false).
video-generation-agent-a2a/
├── src/
│ ├── server.ts # Main entry point (Express)
│ │ └── a2aRoutes.ts # RESTful and A2A routes
│ ├── controllers/
│ │ ├── a2aController.ts # Main A2A protocol logic
│ │ ├── imageController.ts # Image generation logic
│ │ └── videoController.ts # Video generation logic
│ ├── core/
│ │ ├── taskProcessor.ts # Task processing
│ │ ├── taskStore.ts # Task storage and lifecycle
│ │ └── ...
│ ├── services/
│ │ ├── pushNotificationService.ts # SSE and webhook notifications
│ │ └── streamingService.ts # Real-time SSE streaming
│ ├── clients/ # API clients for third-party services
│ ├── interfaces/ # Types and A2A contracts
│ ├── models/ # Data models (Task, Artifact)
│ ├── utils/ # Utilities and logger
│ └── config/ # Configuration and environment variables
├── scripts/
│ ├── generate-image.ts
│ ├── generate-image-with-notifications.ts
│ ├── generate-image-with-webhook.ts
│ ├── generate-video.ts
│ ├── generate-video-with-notifications.ts
│ └── generate-video-with-webhook.ts
├── package.json
└── README.md
- Task reception: The agent exposes RESTful and A2A endpoints (
/tasks/send,/tasks/sendSubscribe) to receive prompts and parameters. - Image/video generation: The agent processes the task and invokes the appropriate AI model or API.
- Notifications: The agent emits status updates and results via SSE (
/tasks/:taskId/notifications) or webhooks. - Result delivery: The user receives the artifact URL and metadata as A2A artifacts.
Simplified flow diagram:
Client Agent AI Model/API
| | |
|--Task------>| |
| |--Generate---->|
| | image/video |
| |<--------------|
|<------------| SSE/Webhook |
|<------------| Final result|
The agent implements the A2A (Agent-to-Agent) protocol, which defines:
- Task states:
submitted,working,input-required,completed,failed,cancelled. - Messages: Standard structure with
role,parts(text, image, video, file, etc.). - Artifacts: Structured responses with parts (image, video, text, metadata).
- Notifications: Real-time updates via SSE or webhooks.
A2A request example (JSON-RPC):
{
"jsonrpc": "2.0",
"id": 1,
"method": "tasks/sendSubscribe",
"params": {
"id": "unique-task-id",
"sessionId": "user-session-123",
"acceptedOutputModes": ["image/png"],
"message": {
"role": "user",
"parts": [
{ "type": "text", "text": "Generate a futuristic cityscape at night" }
]
},
"taskType": "text2image"
}
}Nota: Los endpoints
/tasks/sendy/tasks/sendSubscriberequieren que todas las peticiones sean en formato JSON-RPC 2.0. El cuerpo debe incluir los camposjsonrpc,id,methodyparamssiguiendo el estándar A2A.
Streaming SSE response example:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"id": "unique-task-id",
"status": {
"state": "working",
"timestamp": "2024-06-01T12:00:00Z",
"message": {
"role": "agent",
"parts": [
{ "type": "text", "text": "Generating image..." }
]
}
},
"final": false
}
}Final artifact:
{
"parts": [
{ "type": "image", "url": "https://.../image.png" }
],
"metadata": {
"prompt": "Generate a futuristic cityscape at night"
},
"index": 0
}Important: The
taskTypeparameter is mandatory and determines the type of operation the agent will perform. Always specifytaskTypein your request. If omitted or incorrect, the agent will not know which skill to execute and will return an error.
The agent exposes the following skills via the A2A protocol:
- Description: Generates an image from a text prompt.
- Input Modes:
text/plain,application/json - Output Modes:
image/png,application/json - Parameters:
taskType(string, required): Type of image generation task. Must be"text2image".prompt(string, required): Text prompt for image generation.
Example: text2image
{
"jsonrpc": "2.0",
"id": 1,
"method": "tasks/send",
"params": {
"id": "task-123",
"sessionId": "session-abc",
"message": {
"role": "user",
"parts": [
{ "type": "text", "text": "A surreal landscape with floating islands" }
]
},
"taskType": "text2image"
}
}- Description: Generates a video from a text prompt and one or more reference images.
- Input Modes:
text/plain,application/json - Output Modes:
video/mp4,application/json - Parameters:
taskType(string, required): Type of video generation task. Must be"text2video".prompt(string, required): Text prompt for video generation.imageUrls(string[], required): List of reference image URLs.duration(number, optional): Video duration in seconds (5 or 10).
Example: text2video
{
"jsonrpc": "2.0",
"id": 2,
"method": "tasks/send",
"params": {
"id": "task-456",
"sessionId": "session-def",
"message": {
"role": "user",
"parts": [
{ "type": "text", "text": "A time-lapse of a flower blooming" }
]
},
"taskType": "text2video",
"imageUrls": [
"https://example.com/flower1.png",
"https://example.com/flower2.png"
],
"duration": 10
}
}- Configure
.envwith your keys. - Start the agent in development mode:
The agent will wait for A2A or REST tasks.
yarn dev
- Send a prompt using a compatible client (see examples below).
The repository includes example scripts to interact with the agent:
Lanza una tarea y consulta periódicamente su estado hasta la finalización. Usa el formato JSON-RPC 2.0 para enviar la tarea:
yarn ts-node scripts/generate-image.ts
yarn ts-node scripts/generate-video.ts2. SSE notifications (scripts/generate-image-with-notifications.ts, scripts/generate-video-with-notifications.ts)
Lanza una tarea y se suscribe a eventos SSE para recibir actualizaciones en tiempo real. Usa el formato JSON-RPC 2.0:
yarn ts-node scripts/generate-image-with-notifications.ts "A futuristic cityscape"
yarn ts-node scripts/generate-video-with-notifications.ts "A time-lapse of a flower blooming"Lanza una tarea y registra un webhook local para recibir notificaciones push. Usa el formato JSON-RPC 2.0:
yarn ts-node scripts/generate-image-with-webhook.ts "A surreal landscape"
yarn ts-node scripts/generate-video-with-webhook.ts "A time-lapse of a flower blooming"yarn devyarn buildyarn testApache License 2.0
(C) 2025 Nevermined AG
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions
and limitations under the License.
