Releases: videosdk-live/agents
Agents v0.0.72
- Fix/int min words #233
- Bug Fixes & Improvements
Agents v0.0.71
fix: transcription timestamp + guard transport transcripts #231
- Bug Fixes & Improvements
Agents v0.0.70
Agents v0.0.69
- Plugin Update : Fix interrupt in sarvam tts #226
Agents v0.0.68
- Inferencing common data type structure #221
- Improved VideoSDK Inference Gateway
- feat: add
no_participant_timeout_secondsto auto-end session when no participant joins #224- When the agent joins a meeting but no participant connects within the configured timeout (default 90s), the session now ends automatically instead of staying connected indefinitely.
- Add
no_participant_timeout_secondsparam toRoomOptions
- agent participant implementation for #215
- Force cleanup after jobctx is cleaned up #223
Agents 1.0.0b1
Agents 1.0.0b1
This release introduces the new unified pipeline architecture for the VideoSDK Agents framework.
It is not backward compatible with v0.x (up to v0.0.67).
This beta release is intended for testing the new architecture before the stable 1.0.0 release.
Unified Pipeline Architecture
CascadingPipeline and RealtimePipeline have been replaced with a single Pipeline class.
Developers now configure one pipeline, and the SDK automatically determines the optimal execution mode based on the components provided.
Before (v0.0.67 and earlier)
from videosdk.agents import CascadingPipeline, RealtimePipeline
# Cascading pipeline
pipeline = CascadingPipeline(
stt=...,
llm=...,
tts=...,
vad=...,
turn_detector=...
)
# Realtime pipeline
pipeline = RealtimePipeline(
llm=OpenAIRealtime(...)
)After (v1.0.0b1)
from videosdk.agents import Pipeline
# Full voice agent
pipeline = Pipeline(
stt=...,
llm=...,
tts=...,
vad=...,
turn_detector=...
)
# Realtime voice agent
pipeline = Pipeline(
llm=OpenAIRealtime(...)
)Flexible Agent Composition
The new Pipeline allows you to build different types of agents depending on your use case.
Transcription Agent
Pipeline(stt=...)Voice + Chat Agent
Pipeline(stt=..., llm=..., tts=...)Full Voice Agent with Turn Detection
Pipeline(stt=..., llm=..., tts=..., vad=..., turn_detector=...)Chatbot (Text Only)
Pipeline(llm=...)Realtime Voice Agent
Pipeline(llm=OpenAIRealtime(...))You simply include the components you need, and the SDK handles the rest.
Conversational Flow Removal
The previous Conversational Flow abstraction has been removed.
Instead of defining conversation flows separately, developers can now directly control behavior using Pipeline Hooks.
This gives full flexibility to intercept and modify data at any stage of the pipeline.
Pipeline Hooks
Customize pipeline behavior using:
@pipeline.on(...)Available hook points include:
sttttsllmvision_frameuser_turn_startuser_turn_endagent_turn_startagent_turn_endcontent_generated
Hooks allow you to preprocess input, modify outputs, control LLM invocation, and implement custom logic.
These hooks allow you to implement:
- custom preprocessing
- business logic
- LLM routing
- speech formatting
- pronunciation correction
directly inside the pipeline.
API Changes
| Previous (v0.x) | Replacement (v1.0.0b1) |
|---|---|
CascadingPipeline |
Use Pipeline |
RealtimePipeline |
Use Pipeline |
ConversationalFlow |
Use Pipeline Hooks |
Instead of defining conversational flows separately, you can now implement custom logic directly using pipeline hooks:
@pipeline.on("stt")
@pipeline.on("llm")
@pipeline.on("tts")This provides more flexibility to control preprocessing, LLM invocation, and speech synthesis behavior directly within the pipeline.
Other features such as function tools, Agent lifecycle, AgentSession, and WorkerJob continue to work as before.
Migration Guide
Replace:
CascadingPipeline(...)
RealtimePipeline(...)with:
Pipeline(...)Constructor arguments remain the same — simply pass your components and the SDK will handle execution automatically.
Agents v0.0.67
Agents v0.0.66
Agents v0.0.65
- feat: add Anam virtual avatar support (#188)
- Add support for Anam Virtual Avatar with VideoSDK AI Agents
- chore: update Sarvam plugin (#199)
- Upgrade Sarvam AI plugin to the latest configuration provided by Sarvam (STT, LLM, TTS)
- feat: add Camb AI TTS plugin (#200)
- Add support for Camb AI Text-to-Speech (TTS)
- fix: TURN-D handling in cascading pipeline (#203)
- Fix issue where missing TURN-D caused unintended agent execution
- fix(rnnoise): resolve segfaults and init errors on ARM64/Linux (#202)
- Fix rnnoise build issues on Linux ARM64 and Resolve segmentation faults and initialization errors
- Now supported across all platforms
Agents v0.0.64
- fix linux rnnoise issue #186
- fix: allow session.say inside tools without interrupting LLM #189
- feat : add keywords, keyterm prompting, and language param #193
- Feature/krisp deepgram #194
- Add support for Denoise in VideoSDK inference for the providers Krisp, ai-coustics, and Sanas.
- Add Deepgram TTS support in VideoSDK inference.
- fix(turn-detector): download model once and use cache #198