Skip to content

Releases: videosdk-live/agents

Agents v0.0.72

20 Mar 11:40
d02919d

Choose a tag to compare

  • Fix/int min words #233
    - Bug Fixes & Improvements

Agents v0.0.71

18 Mar 10:13
c3404af

Choose a tag to compare

fix: transcription timestamp + guard transport transcripts #231

  • Bug Fixes & Improvements

Agents v0.0.70

14 Mar 15:03
572cc45

Choose a tag to compare

  • Worker refactor, updates, and improvements #222
    • update: wait time added
  • feat: add support for screen sharing streams #229

Agents v0.0.69

13 Mar 07:55
af794d2

Choose a tag to compare

  • Plugin Update : Fix interrupt in sarvam tts #226

Agents v0.0.68

12 Mar 12:26
60225ae

Choose a tag to compare

  • Inferencing common data type structure #221
    • Improved VideoSDK Inference Gateway
  • feat: add no_participant_timeout_seconds to auto-end session when no participant joins #224
    • When the agent joins a meeting but no participant connects within the configured timeout (default 90s), the session now ends automatically instead of staying connected indefinitely.
    • Add no_participant_timeout_seconds param to RoomOptions
  • agent participant implementation for #215
  • Force cleanup after jobctx is cleaned up #223

Agents 1.0.0b1

05 Mar 11:07

Choose a tag to compare

Agents 1.0.0b1 Pre-release
Pre-release

Agents 1.0.0b1

⚠️ Beta Release

This release introduces the new unified pipeline architecture for the VideoSDK Agents framework.

It is not backward compatible with v0.x (up to v0.0.67).

This beta release is intended for testing the new architecture before the stable 1.0.0 release.

Unified Pipeline Architecture

CascadingPipeline and RealtimePipeline have been replaced with a single Pipeline class.

Developers now configure one pipeline, and the SDK automatically determines the optimal execution mode based on the components provided.


Before (v0.0.67 and earlier)

from videosdk.agents import CascadingPipeline, RealtimePipeline

# Cascading pipeline
pipeline = CascadingPipeline(
    stt=..., 
    llm=..., 
    tts=..., 
    vad=..., 
    turn_detector=...
)

# Realtime pipeline
pipeline = RealtimePipeline(
    llm=OpenAIRealtime(...)
)

After (v1.0.0b1)

from videosdk.agents import Pipeline

# Full voice agent
pipeline = Pipeline(
    stt=..., 
    llm=..., 
    tts=..., 
    vad=..., 
    turn_detector=...
)

# Realtime voice agent
pipeline = Pipeline(
    llm=OpenAIRealtime(...)
)

Flexible Agent Composition

The new Pipeline allows you to build different types of agents depending on your use case.

Transcription Agent

Pipeline(stt=...)

Voice + Chat Agent

Pipeline(stt=..., llm=..., tts=...)

Full Voice Agent with Turn Detection

Pipeline(stt=..., llm=..., tts=..., vad=..., turn_detector=...)

Chatbot (Text Only)

Pipeline(llm=...)

Realtime Voice Agent

Pipeline(llm=OpenAIRealtime(...))

You simply include the components you need, and the SDK handles the rest.


Conversational Flow Removal

The previous Conversational Flow abstraction has been removed.

Instead of defining conversation flows separately, developers can now directly control behavior using Pipeline Hooks.

This gives full flexibility to intercept and modify data at any stage of the pipeline.


Pipeline Hooks

Customize pipeline behavior using:

@pipeline.on(...)

Available hook points include:

  • stt
  • tts
  • llm
  • vision_frame
  • user_turn_start
  • user_turn_end
  • agent_turn_start
  • agent_turn_end
  • content_generated

Hooks allow you to preprocess input, modify outputs, control LLM invocation, and implement custom logic.

These hooks allow you to implement:

  • custom preprocessing
  • business logic
  • LLM routing
  • speech formatting
  • pronunciation correction

directly inside the pipeline.


API Changes

Previous (v0.x) Replacement (v1.0.0b1)
CascadingPipeline Use Pipeline
RealtimePipeline Use Pipeline
ConversationalFlow Use Pipeline Hooks

Instead of defining conversational flows separately, you can now implement custom logic directly using pipeline hooks:

@pipeline.on("stt")
@pipeline.on("llm")
@pipeline.on("tts")

This provides more flexibility to control preprocessing, LLM invocation, and speech synthesis behavior directly within the pipeline.

Other features such as function tools, Agent lifecycle, AgentSession, and WorkerJob continue to work as before.


Migration Guide

Replace:

CascadingPipeline(...)
RealtimePipeline(...)

with:

Pipeline(...)

Constructor arguments remain the same — simply pass your components and the SDK will handle execution automatically.

Agents v0.0.67

03 Mar 10:53
d45b7c2

Choose a tag to compare

  • Fix/calculations #216
    • Metrics improvements and refinements
  • fix: prevent duplicate MCP tools by only extending with newly added tools #217
  • Updates Cambai Plugin #218

Agents v0.0.66

26 Feb 13:51
9f90027

Choose a tag to compare

  • deprecated : remove the cli in version >= 0.0.65 #210
  • Remove unused parameter #205
  • feat: upgrade Simli AI plugin to SDK v2.0 #206
  • feat: enhance TTS and STT plugins with additional configuration options #212
    • Cartesia,Deepgram, ElevenLabs, Google Plugin Updates.

Agents v0.0.65

23 Feb 13:14
c4d4964

Choose a tag to compare

  • feat: add Anam virtual avatar support (#188)
    • Add support for Anam Virtual Avatar with VideoSDK AI Agents
  • chore: update Sarvam plugin (#199)
    • Upgrade Sarvam AI plugin to the latest configuration provided by Sarvam (STT, LLM, TTS)
  • feat: add Camb AI TTS plugin (#200)
    • Add support for Camb AI Text-to-Speech (TTS)
  • fix: TURN-D handling in cascading pipeline (#203)
    • Fix issue where missing TURN-D caused unintended agent execution
  • fix(rnnoise): resolve segfaults and init errors on ARM64/Linux (#202)
    • Fix rnnoise build issues on Linux ARM64 and Resolve segmentation faults and initialization errors
    • Now supported across all platforms

Agents v0.0.64

17 Feb 06:10
b36d02a

Choose a tag to compare

  • fix linux rnnoise issue #186
  • fix: allow session.say inside tools without interrupting LLM #189
  • feat : add keywords, keyterm prompting, and language param #193
  • Feature/krisp deepgram #194
    • Add support for Denoise in VideoSDK inference for the providers Krisp, ai-coustics, and Sanas.
    • Add Deepgram TTS support in VideoSDK inference.
  • fix(turn-detector): download model once and use cache #198