Skip to content
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9675a14
implemented heygen avatars
d3xvn Oct 27, 2025
4f3a6e4
add lip-sync support by forwarding agent audio to heygen
d3xvn Oct 28, 2025
680f5d7
switch avatar example to use gemini realtime for better lip-sync testing
d3xvn Oct 28, 2025
6eb638f
WIP: audio track approach for lip-sync (audio flows but no lip movement)
d3xvn Oct 29, 2025
7f2983a
Clean up HeyGen implementation and fix duplicate text sending
d3xvn Oct 30, 2025
aefdeda
Merge main into feat/heygen - add vogent plugin alongside heygen
d3xvn Nov 3, 2025
96f1cc9
PR cleanup
d3xvn Nov 3, 2025
c14b98c
Auto-attach processors to agent (no more manual set_agent calls)
d3xvn Nov 3, 2025
6188ed3
fixed audio duplication and sluggishness
d3xvn Nov 3, 2025
74aa6ff
Fix video aspect ratio stretching - add letterboxing
d3xvn Nov 3, 2025
f54c372
fixed and simplified both implementations
d3xvn Nov 4, 2025
a94b181
Merge main into feat/heygen - added moondream plugin
d3xvn Nov 4, 2025
fad9f49
Fix ruff linting - remove unused imports
d3xvn Nov 4, 2025
f03c81d
Fix HeyGen plugin tests - import paths and mocking
d3xvn Nov 4, 2025
a5be206
Fix mypy type errors in HeyGen plugin
d3xvn Nov 4, 2025
d6d66bf
Allow reattaching to new HeyGen video tracks on renegotiation
d3xvn Nov 4, 2025
f7a2f37
Migrate quality to enum
Nash0x7E2 Nov 4, 2025
0b4894a
Ruff and Mypy
Nash0x7E2 Nov 4, 2025
30322b2
Merge branch 'main' into feat/heygen
Nash0x7E2 Nov 4, 2025
4bafa66
More ruff issues
Nash0x7E2 Nov 4, 2025
f5a1aaa
Fix broken method sigs
Nash0x7E2 Nov 4, 2025
3f5e203
Unused var
Nash0x7E2 Nov 4, 2025
12cad15
final ruff error
Nash0x7E2 Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions agents-core/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ deepgram = ["vision-agents-plugins-deepgram"]
elevenlabs = ["vision-agents-plugins-elevenlabs"]
gemini = ["vision-agents-plugins-gemini"]
getstream = ["vision-agents-plugins-getstream"]
heygen = ["vision-agents-plugins-heygen"]
kokoro = ["vision-agents-plugins-kokoro"]
krisp = ["vision-agents-plugins-krisp"]
moonshine = ["vision-agents-plugins-moonshine"]
Expand All @@ -57,6 +58,7 @@ all-plugins = [
"vision-agents-plugins-elevenlabs",
"vision-agents-plugins-gemini",
"vision-agents-plugins-getstream",
"vision-agents-plugins-heygen",
"vision-agents-plugins-kokoro",
"vision-agents-plugins-krisp",
"vision-agents-plugins-moonshine",
Expand Down
15 changes: 14 additions & 1 deletion agents-core/vision_agents/core/agents/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,11 @@ def __init__(

self.llm._attach_agent(self)

# Attach processors that need agent reference
for processor in self.processors:
if hasattr(processor, '_attach_agent'):
processor._attach_agent(self)

self.events.subscribe(self._on_vad_audio)
self.events.subscribe(self._on_agent_say)
# Initialize state variables
Expand Down Expand Up @@ -1167,10 +1172,13 @@ def publish_audio(self) -> bool:
"""Whether the agent should publish an outbound audio track.

Returns:
True if TTS is configured or when in Realtime mode.
True if TTS is configured, when in Realtime mode, or if there are audio publishers.
"""
if self.tts is not None or self.realtime_mode:
return True
# Also publish audio if there are audio publishers (e.g., HeyGen avatar)
if self.audio_publishers:
return True
return False

@property
Expand Down Expand Up @@ -1296,6 +1304,11 @@ def _prepare_rtc(self):
if self.realtime_mode and isinstance(self.llm, Realtime):
self._audio_track = self.llm.output_track
self.logger.info("🎵 Using Realtime provider output track for audio")
elif self.audio_publishers:
# Get the first audio publisher to create the track
audio_publisher = self.audio_publishers[0]
self._audio_track = audio_publisher.publish_audio_track()
self.logger.info("🎵 Audio track initialized from audio publisher")
else:
# Default to WebRTC-friendly format unless configured differently
framerate = 48000
Expand Down
1 change: 0 additions & 1 deletion aiortc
Submodule aiortc deleted from f84800
4 changes: 3 additions & 1 deletion plugins/aws/example/uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

189 changes: 189 additions & 0 deletions plugins/heygen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# HeyGen Avatar Plugin for Vision Agents

Add realistic avatar video to your AI agents using HeyGen's streaming avatar API.

## Features

- 🎭 **Realistic Avatars**: Use HeyGen's high-quality avatars with natural movements
- 🎤 **Automatic Lip-Sync**: Avatar automatically syncs with audio from any TTS provider
- 🚀 **WebRTC Streaming**: Low-latency real-time video streaming via WebRTC
- 🔌 **Easy Integration**: Works seamlessly with Vision Agents framework
- 🎨 **Customizable**: Configure avatar, quality, resolution, and more

## Installation

```bash
pip install vision-agents-plugins-heygen
```

Or with uv:

```bash
uv pip install vision-agents-plugins-heygen
```

## Quick Start

```python
import asyncio
from uuid import uuid4
from dotenv import load_dotenv

from vision_agents.core import User, Agent
from vision_agents.plugins import cartesia, deepgram, getstream, gemini, heygen
from vision_agents.plugins.heygen import VideoQuality

load_dotenv()

async def start_avatar_agent():
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="AI Assistant with Avatar", id="agent"),
instructions="You're a friendly AI assistant.",

llm=gemini.LLM("gemini-2.0-flash"),
tts=cartesia.TTS(),
stt=deepgram.STT(),

# Add HeyGen avatar
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=VideoQuality.HIGH
)
]
)

call = agent.edge.client.video.call("default", str(uuid4()))

with await agent.join(call):
await agent.edge.open_demo(call)
await agent.simple_response("Hello! I'm your AI assistant with an avatar.")
await agent.finish()

if __name__ == "__main__":
asyncio.run(start_avatar_agent())
```

## Configuration

### Environment Variables

Set your HeyGen API key:

```bash
HEYGEN_API_KEY=your_heygen_api_key_here
```

### AvatarPublisher Options

```python
from vision_agents.plugins.heygen import VideoQuality

heygen.AvatarPublisher(
avatar_id="default", # HeyGen avatar ID
quality=VideoQuality.HIGH, # Video quality: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH
resolution=(1920, 1080), # Output resolution (width, height)
api_key=None, # Optional: override env var
)
```

## Usage Examples

### With Realtime LLM

```python
from vision_agents.plugins import gemini, heygen, getstream

agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Realtime Avatar AI"),
instructions="Be conversational and responsive.",

llm=gemini.Realtime(fps=2), # No separate TTS needed

processors=[
heygen.AvatarPublisher(avatar_id="professional_presenter")
]
)

call = agent.edge.client.video.call("default", str(uuid4()))

with await agent.join(call):
await agent.finish()
```

### With Multiple Processors

```python
from vision_agents.plugins import ultralytics, heygen

agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Fitness Coach"),
instructions="Analyze user poses and provide feedback.",

llm=gemini.Realtime(fps=3),

processors=[
# Process incoming user video
ultralytics.YOLOPoseProcessor(model_path="yolo11n-pose.pt"),
# Publish avatar video
heygen.AvatarPublisher(avatar_id="fitness_trainer")
]
)
```

## How It Works

1. **Connection**: Establishes WebRTC connection to HeyGen's streaming API
2. **Audio Input**: Receives audio from your TTS provider or Realtime LLM
3. **Avatar Generation**: HeyGen generates avatar video with lip-sync
4. **Video Streaming**: Streams avatar video to call participants via GetStream Edge

## Requirements

- Python 3.10+
- HeyGen API key (get one at [heygen.com](https://heygen.com))
- GetStream account for video calls
- TTS provider (Cartesia, ElevenLabs, etc.) or Realtime LLM

## Troubleshooting

### Connection Issues

If you experience connection problems:

1. Check your HeyGen API key is valid
2. Ensure you have network access to HeyGen's servers
3. Check firewall settings for WebRTC traffic

### Video Quality

To optimize video quality:

- Use `quality=VideoQuality.HIGH` for best results
- Increase resolution if bandwidth allows
- Ensure stable internet connection

## API Reference

### AvatarPublisher

Main class for publishing HeyGen avatar video.

**Methods:**
- `publish_video_track()`: Returns video track for streaming
- `state()`: Returns current state information
- `close()`: Clean up resources

## License

MIT

## Links

- [Documentation](https://visionagents.ai/)
- [GitHub](https://github.com/GetStream/Vision-Agents)
- [HeyGen API Docs](https://docs.heygen.com/docs/streaming-api)

Loading
Loading