All notable changes to Pipecat will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added
LocalSmartTurnAnalyzerV2, which supports local on-device inference with the newsmart-turn-v2turn detection model.
- For
LmntTTSService, changed the defaultmodeltoblizzard, LMNT's recommended model.
-
Fixed an issue where, in some edge cases, the
EmulateUserStartedSpeakingFramecould be created even if we didn't have a transcription. -
Fixed an issue in
GoogleLLMContextwhere it would inject thesystem_messageas a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly.
- Added
SpeechControlParamsFrame, a newSystemFramethat notifies downstream processors of the VAD and Turn analyzer params. This frame is pushed by theBaseInputTransportat Start and any time aVADParamsUpdateFrameis received.
- Two package dependencies have been updated:
numpynow supports 1.26.0 and newertransformersnow supports 4.48.0 and newer
-
Fixed an issue with RTVI's handling of
append-to-context. -
Fixed an issue where using audio input with a sample rate requiring resampling could result in empty audio being passed to STT services, causing errors.
-
Fixed the VAD analyzer to process the full audio buffer as long as it contains more than the minimum required bytes per iteration, instead of only analyzing the first chunk.
-
Fixed an issue in ParallelPipeline that caused errors when attempting to drain the queues.
-
Fixed an issue with emulated VAD timeout inconsistency in
LLMUserContextAggregator. Previously, emulated VAD scenarios (where transcription is received without VAD detection) used a hardcodedaggregation_timeout(default 0.5s) instead of matching the VAD'sstop_secsparameter (default 0.8s). This created different user experiences between real VAD and emulated VAD scenarios. Now, emulated VAD timeouts automatically synchronize with the VAD'sstop_secsparameter. -
Fix a pipeline freeze when using AWS Nova Sonic, which would occur if the user started early, while the bot was still working through
trigger_assistant_response().
-
Added an
aggregate_sentencesarg inCartesiaTTSService,ElevenLabsTTSService,NeuphonicTTSServiceandRimeTTSService, where the default value is True. Whenaggregate_sentencesis True, theTTSServiceaggregates the LLM streamed tokens into sentences by default. Note: setting the value to False requires a custom processor before theTTSServiceto aggregate LLM tokens. -
Added
kwargsto theOLLamaLLMServiceto allow for configuration args to be passed to Ollama. -
Added call hang-up error handling in
TwilioFrameSerializer, which handles the case where the user has hung up before theTwilioFrameSerializerhangs up the call.
-
Updated
RTVIObserverandRTVIProcessorto match the new RTVI 1.0.0 protocol. This includes:- Deprecating support for all messages related to service configuaration and actions.
- Adding support for obtaining and logging data about client, including its RTVI version and optionally included system information (OS/browser/etc.)
- Adding support for handling the new
client-messageRTVI message through either aon_client_messageevent handler or listening for a newRTVIClientMessageFrame - Adding support for responding to a
client-messagewith aserver-responsevia either a direct call on theRTVIProcessoror via pushing a newRTVIServerResponseFrame - Adding built-in support for handling the new
append-to-contextRTVI message which allows a client to add to the user or assistant llm context. No extra code is required for supporting this behavior. - Updating all JavaScript and React client RTVI examples to use versions 1.0.0 of the clients.
Get started migrating to RTVI protocol 1.0.0 by following the migration guide: https://docs.pipecat.ai/client/migration-guide
-
Refactored
AWSBedrockLLMServiceandAWSPollyTTSServiceto work asynchronously usingaioboto3instead of theboto3library. -
The
UserIdleProcessornow handles the scenario where function calls take longer than the idle timeout duration. This allows you to use theUserIdleProcessorin conjunction with function calls that take a while to return a result.
-
Updated the
NeuphonicTTSServiceto work with the updated websocket API. -
Fixed an issue with
RivaSTTServicewhere the watchdog feature was causing an error on initialization.
- Remove unncessary push task in each
FrameProcessor.
-
Added a new STT service,
SpeechmaticsSTTService. This service provides real-time speech-to-text transcription using the Speechmatics API. It supports partial and final transcriptions, multiple languages, various audio formats, and speaker diarization. -
Added
normalizeandmodel_idtoFishAudioTTSService. -
Added
http_optionsargument toGoogleLLMService. -
Added
run_llmfield toLLMMessagesAppendFrameandLLMMessagesUpdateFrameframes. If true, a context frame will be pushed triggering the LLM to respond. -
Added a new
SOXRStreamAudioResamplerfor processing audio in chunks or streams. If you write your own processor and need to use an audio resampler, use the newcreate_stream_resampler(). -
Added new
DailyParams.audio_in_user_tracksto allow receiving one track per user (default) or a single track from the room (all participants mixed). -
Added support for providing "direct" functions, which don't need an accompanying
FunctionSchemaor function definition dict. Instead, metadata (i.e.name,description,properties, andrequired) are automatically extracted from a combination of the function signature and docstring.Usage:
# "Direct" function # `params` must be the first parameter async def do_something(params: FunctionCallParams, foo: int, bar: str = ""): """ Do something interesting. Args: foo (int): The foo to do something interesting with. bar (string): The bar to do something interesting with. """ result = await process(foo, bar) await params.result_callback({"result": result}) # ... llm.register_direct_function(do_something) # ... tools = ToolsSchema(standard_tools=[do_something])
-
user_idis now populated in theTranscriptionFrameandInterimTranscriptionFramewhen using a transport that provides auser_id, likeDailyTransportorLiveKitTransport. -
Added
watchdog_coroutine(). This is a watchdog helper for couroutines. So, if you have a coroutine that is waiting for a result and that takes a long time, you will need to wrap it withwatchdog_coroutine()so the watchdog timers are reset regularly. -
Added
session_tokenparameter toAWSNovaSonicLLMService. -
Added Gemini Multimodal Live File API for uploading, fetching, listing, and deleting files. See
26f-gemini-multimodal-live-files-api.pyfor example usage.
-
Updated all the services to use the new
SOXRStreamAudioResampler, ensuring smooth transitions and eliminating clicks. -
Upgraded
daily-pythonto 0.19.4. -
Updated
googleoptional dependency to usegoogle-genaiversion1.24.0.
-
Fixed an issue where audio would get stuck in the queue when an interrupt occurs during Azure TTS synthesis.
-
Fixed a race condition that occurs in Python 3.10+ where the task could miss the
CancelledErrorand continue running indefinitely, freezing the pipeline. -
Fixed a
AWSNovaSonicLLMServiceissue introduced in 0.0.72.
- In
FishAudioTTSService, deprecatedmodeland replaced withreference_id. This change is to better align with Fish Audio's variable naming and to reduce confusion about what functionality the variable controls.
- Fixed an issue introduced in 0.0.72 that would cause
ElevenLabsTTSService,GladiaSTTService,NeuphonicTTSServiceandOpenAIRealtimeBetaLLMServiceto throw an error.
-
Added logging and improved error handling to help diagnose and prevent potential Pipeline freezes.
-
Added
WatchdogQueue,WatchdogPriorityQueue,WatchdogEventandWatchdogAsyncIterator. These helper utilities reset watchdog timers appropriately before they expire. When watchdog timers are disabled, the utilities behave as standard counterparts without side effects. -
Introduce task watchdog timers. Watchdog timers are used to detect if a Pipecat task is taking longer than expected (by default 5 seconds). Watchdog timers are disabled by default and can be enabled globally by passing
enable_watchdog_timersargument toPipelineTaskconstructor. It is possible to change the default watchdog timer timeout by using thewatchdog_timeoutargument. You can also log how long it takes to reset the watchdog timers which is done with theenable_watchdog_logging. You can control all these settings per each frame processor or even per task. That is, you can setenable_watchdog_timers,enable_watchdog_loggingandwatchdog_timeoutwhen creating any frame processor through their constructor arguments or when you create a task withFrameProcessor.create_task(). Note that watchdog timers only work with Pipecat tasks and will not work if you useasycio.create_task()or similar. -
Added
lexicon_namesparameter toAWSPollyTTSService.InputParams. -
Added reconnection logic and audio buffer management to
GladiaSTTService. -
The
TurnTrackingObservernow ends a turn upon observing anEndFrameorCancelFrame. -
Added Polish support to
AWSTranscribeSTTService. -
Added new frames
FrameProcessorPauseFrameandFrameProcessorResumeFramewhich allow pausing and resuming frame processing for a given frame processor. These are control frames, so they are ordered. Pausing frame processor will keep old frames in the internal queues until resume takes place. Frames being pushed while a frame processor is paused will be pushed to the queues. When frame processing is resumed all queued frames will be processed in order. Also addedFrameProcessorPauseUrgentFrameandFrameProcessorResumeUrgentFramewhich are system frames and therefore they have high priority. -
Added a property called
has_function_calls_in_progressinLLMAssistantContextAggregatorthat exposes whether a function call is in progress. -
Added
SambaNovaLLMServicewhich provides llm api integration with an OpenAI-compatible interface. -
Added
SambaNovaTTSServicewhich provides speech-to-text functionality using SambaNovas's (whisper) API. -
Add fundational examples for function calling and transcription
14s-function-calling-sambanova.py,13g-sambanova-transcription.py
-
HeartbeatFrames are now control frames. This will make it easier to detect pipeline freezes. Previously, heartbeat frames were system frames which meant they were not get queued with other frames, making it difficult to detect pipeline stalls. -
Updated
OpenAIRealtimeBetaLLMServiceto acceptlanguagein theInputAudioTranscriptionclass for all models. -
Updated the default model for
OpenAIRealtimeBetaLLMServicetogpt-4o-realtime-preview-2025-06-03. -
The
PipelineParamsargallow_interruptionsnow defaults toTrue. -
TavusTransportandTavusVideoServicenow send audio to Tavus using WebRTC audio tracks instead ofapp-messagesover WebSocket. This should improve the overall audio quality. -
Upgraded
daily-pythonto 0.19.3.
-
Fixed an issue that would cause heartbeat frames to be sent before processors were started.
-
Fixed an event loop blocking issue when using
SentryMetrics. -
Fixed an issue in
FastAPIWebsocketClientto ensure proper disconnection when the websocket is already closed. -
Fixed an issue where the
UserStoppedSpeakingFramewas not received if the transport was not receiving new audio frames. -
Fixed an edge case where if the user interrupted the bot but no new aggregation was received, the bot would not resume speaking.
-
Fixed an issue with
TelnyxFrameSerializerwhere it would throw an exception when the user hung up the call. -
Fixed an issue with
ElevenLabsTTSServicewhere the context was not being closed. -
Fixed function calling in
AWSNovaSonicLLMService. -
Fixed an issue that would cause multiple
PipelineTask.on_idle_timeoutevents to be triggered repeatedly. -
Fixed an issue that was causing user and bot speech to not be synchronized during recordings.
-
Fixed an issue where voice settings weren't applied to ElevenLabsTTSService.
-
Fixed an issue with
GroqTTSServicewhere it was not properly parsing the WAV file header. -
Fixed an issue with
GoogleSTTServicewhere it was constantly reconnecting before starting to receive audio from the user. -
Fixed an issue where
GoogleLLMService's TTFB value was incorrect.
AudioBufferProcessorparameteruser_continuos_streamis deprecated.
- Rename
14e-function-calling-gemini.pyto14e-function-calling-google.py.
- Adds a parameter called
additional_span_attributesto PipelineTask that lets you add any additional attributes you'd like to the conversation span.
- Fixed an issue with
CartesiaSTTServiceinitialization.
-
Added
ExotelFrameSerializerto handle telephony calls via Exotel. -
Added the option
informaltoTranslationConfigon Gladia config. Allowing to force informal language forms when available. -
Added
CartesiaSTTServicewhich is a websocket based implementation to transcribe audio. Added a foundational example in13f-cartesia-transcription.py -
Added an
websocketexample, showing how to use the new Pipecat clientWebsocketTransportto connect with PipecatFastAPIWebsocketTransportorWebsocketServerTransport. -
Added language support to
RimeHttpTTSService. Extended languages to include German and French for bothRimeTTSServiceandRimeHttpTTSService.
-
Upgraded
daily-pythonto 0.19.2. -
Make
PipelineTask.add_observer()synchronous. This allows callers to call it before doing the work of running thePipelineTask(i.e. without invokingPipelineTask.set_event_loop()first). -
Pipecat 0.0.69 forced
uvloopevent loop on Linux on macOS. Unfortunately, this is causing issue in some systems. So,uvloopis not enabled by default anymore. If you want to useuvloopyou can just set theasyncioevent policy before starting your agent with:
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())-
Fixed an issue with various TTS services that would cause audio glitches at the start of every bot turn.
-
Fixed an
ElevenLabsTTSServiceissue where a context warning was printed when pushing aTTSSpeakFrame. -
Fixed an
AssemblyAISTTServiceissue that could cause unexpected behavior when yielding emptyFrame()s. -
Fixed an issue where
OutputAudioRawFrame.transport_destinationwas being reset toNoneinstead of retaining its intended value before sending the audio frame towrite_audio_frame. -
Fixed a typo in Livekit transport that prevented initialization.
-
Added a new frame
FunctionCallsStartedFrame. This frame is pushed both upstream and downstream from the LLM service to indicate that one or more function calls are going to be executed. -
Added LLM services
on_function_calls_startedevent. This event will be triggered when the LLM service receives function calls from the model and is going to start executing them. -
Function calls can now be executed sequentially (in the order received in the completion) by passing
run_in_parallel=Falsewhen creating your LLM service. By default, if the LLM completion returns 2 or more function calls they run concurrently. In both cases, concurrently and sequentially, a new LLM completion will run when the last function call finishes. -
Added OpenTelemetry tracing for
GeminiMultimodalLiveLLMServiceandOpenAIRealtimeBetaLLMService. -
Added initial support for interruption strategies, which determine if the user should interrupt the bot while the bot is speaking. Interruption strategies can be based on factors such as audio volume or the number of words spoken by the user. These can be specified via the new
interruption_strategiesfield inPipelineParams. A newMinWordsInterruptionStrategystrategy has been introduced which triggers an interruption if the user has spoken a minimum number of words. If no interruption strategies are specified, the normal interruption behavior applies. If multiple strategies are provided, the first one that evaluates to true will trigger the interruption. -
BaseInputTransportnow handlesStopFrame. When aStopFrameis received the transport will pause sending frames downstream until a newStartFrameis received. This allows the transport to be reused (keeping the same connection) in a different pipeline. -
Updated AssemblyAI STT service to support their latest streaming speech-to-text model with improved transcription latency and endpointing.
-
You can now access STT service results through the new
TranscriptionFrame.resultandInterimTranscriptionFrame.resultfield. This is useful in case you use some specific settings for the STT and you want to access the STT results. -
The examples runner is now public from the
pipecat.examplespackage. This allows everyone to build their own examples and run them easily. -
It is now possible to push
OutputDTMFFrameorOutputDTMFUrgentFramewithDailyTransport. This will be sent properly if a Daily dial-out connection has been established. -
Added
OutputDTMFUrgentFrameto send a DTMF keypress quickly. The previousOutputDTMFFramequeues the keypress with the rest of data frames. -
Added
DTMFAggregator, which aggregates keypad presses intoTranscriptionFrames. Aggregation occurs after a timeout, termination key press, or user interruption. You can specify the prefix of theTranscriptionFrame. -
Added new functions
DailyTransport.start_transcription()andDailyTransport.stop_transcription()to be able to start and stop Daily transcription dynamically (maybe with different settings).
-
Reverted the default model for
GeminiMultimodalLiveLLMServiceback tomodels/gemini-2.0-flash-live-001.gemini-2.5-flash-preview-native-audio-dialoghas inconsistent performance. You can opt in to using this model by setting themodelarg. -
Function calls are now cancelled by default if there's an interruption. To disable this behavior you can set
cancel_on_interruption=Falsewhen registering the function call. Since function calls are executed as tasks you can tell if a function call has been cancelled by catching theasyncio.CancelledErrorexception (and don't forget to raise it again!). -
Updated OpenTelemetry tracing attribute
metrics.ttfb_mstometrics.ttfb. The attribute reports TTFB in seconds.
DailyTransport.send_dtmf()is deprecated, push anOutputDTMFFrameor anOutputDTMFUrgentFrameinstead.
-
Fixed an issue with
ElevenLabsTTSServicewhere long responses would continue generating output even after an interruption. -
Fixed an issue with the
OpenAILLMContextwhere non-Roman characters were being incorrectly encoded as Unicode escape sequences. This was a logging issue and did not impact the actual conversation. -
In
AWSBedrockLLMService, worked around a possible bug in AWS Bedrock where atoolConfigis required if there has been previous tool use in the messages array. This workaround includes a no_op factory function call is used to satisfy the requirement. -
Fixed
WebsocketClientTransportto useFrameProcessorSetup.task_managerinstead ofStartFrame.task_manager.
- Use
uvloopas the new event loop on Linux and macOS systems.
-
Added
GoogleHttpTTSServicewhich uses Google's HTTP TTS API. -
Added
TavusTransport, a new transport implementation compatible with any Pipecat pipeline. When using theTavusTransportthe Pipecat bot will connect in the same room as the Tavus Avatar and the user. -
Added
PlivoFrameSerializerto support Plivo calls. A full running example has also been added toexamples/plivo-chatbot. -
Added
UserBotLatencyLogObserver. This is an observer that logs the latency between when the user stops speaking and when the bot starts speaking. This gives you an initial idea on how quickly the AI services respond. -
Added
SarvamTTSService, which implements Sarvam AI's TTS API: https://docs.sarvam.ai/api-reference-docs/text-to-speech/convert. -
Added
PipelineTask.add_observer()andPipelineTask.remove_observer()to allow mangaging observers at runtime. This is useful for cases where the task is passed around to other code components that might want to observe the pipeline dynamically. -
Added
user_idfield toTranscriptionMessage. This allows identifying the user in a multi-user scenario. Note that this requires thatTranscriptionFramehas theuser_idproperly set. -
Added new
PipelineTaskevent handlerson_pipeline_started,on_pipeline_stopped,on_pipeline_endedandon_pipeline_cancelled, which correspond to theStartFrame,StopFrame,EndFrameandCancelFramerespectively. -
Added additional languages to
LmntTTSService. Languages include:hi,id,it,ja,nl,pl,ru,sv,th,tr,uk,vi. -
Added a
modelparameter to theLmntTTSServiceconstructor, allowing switching between LMNT models. -
Added
MiniMaxHttpTTSService, which implements MiniMax's T2A API for TTS. Learn more: https://www.minimax.io/platform_overview -
A new function
FrameProcessor.setup()has been added to allow setting up frame processors before receiving aStartFrame. This is what's happening internally:FrameProcessor.setup()is called,StartFrameis pushed from the beginning of the pipeline, your regular pipeline operations,EndFrameorCancelFrameare pushed from the beginning of the pipeline and finallyFrameProcessor.cleanup()is called. -
Added support for OpenTelemetry tracing in Pipecat. This initial implementation includes:
- A
setup_tracingmethod where you can specify your OpenTelemetry exporter - Service decorators for STT (
@traced_stt), LLM (@traced_llm), and TTS (@traced_tts) which trace the execution and collect properties and metrics (TTFB, token usage, character counts, etc.) - Class decorators that provide execution tracking; these are generic and can be used for service tracking as needed
- Spans that help track traces on a per conversations and turn basis:
conversation-uuid ├── turn-1 │ ├── stt_deepgramsttservice │ ├── llm_openaillmservice │ └── tts_cartesiattsservice ... └── turn-n └── ...By default, Pipecat has implemented service decorators to trace execution of STT, LLM, and TTS services. You can enable tracing by setting
enable_tracingtoTruein the PipelineTask. - A
-
Added
TurnTrackingObserver, which tracks the start and end of a user/bot turn pair and emits eventson_turn_startedandon_turn_stoppedcorresponding to the start and end of a turn, respectively. -
Allow passing observers to
run_test()while running unit tests.
-
Upgraded
daily-pythonto 0.19.1. -
⚠️ UpdatedSmallWebRTCTransportto align with how other transports handleon_client_disconnected. Now, when the connection is closed and no reconnection is attempted,on_client_disconnectedis called instead ofon_client_close. Theon_client_closecallback is no longer used, useon_client_disconnectedinstead. -
Check if
PipelineTaskhas already been cancelled. -
Don't raise an exception if event handler is not registered.
-
Upgraded
deepgram-sdkto 4.1.0. -
Updated
GoogleTTSServiceto use Google's streaming TTS API. The default voice also updated toen-US-Chirp3-HD-Charon. -
⚠️ Refactored theTavusVideoService, so it acts like a proxy, sending audio to Tavus and receiving both audio and video. This will makeTavusVideoServiceusable with any Pipecat pipeline and with any transport. This is a breaking change, check theexamples/foundational/21a-tavus-layer-small-webrtc.pyto see how to use it. -
DailyTransportnow uses custom microphone audio tracks instead of virtual microphones. Now, multiple Daily transports can be used in the same process. -
DailyTransportnow captures audio from individual participants instead of the whole room. This allows identifying audio frames per participant. -
Updated the default model for
AnthropicLLMServicetoclaude-sonnet-4-20250514. -
Updated the default model for
GeminiMultimodalLiveLLMServicetomodels/gemini-2.5-flash-preview-native-audio-dialog. -
BaseTextFiltermethodsfilter(),update_settings(),handle_interruption()andreset_interruption()are now async. -
BaseTextAggregatormethodsaggregate(),handle_interruption()andreset()are now async. -
The API version for
CartesiaTTSServiceandCartesiaHttpTTSServicehas been updated. Also, thecartesiadependency has been updated to 2.x. -
CartesiaTTSServiceandCartesiaHttpTTSServicenow support Cartesia's newspeedparameter which accepts values ofslow,normal, andfast. -
GeminiMultimodalLiveLLMServicenow uses the user transcription and usage metrics provided by Gemini Live. -
GoogleLLMServicehas been updated to usegoogle-genaiinstead of the deprecatedgoogle-generativeai.
- In
CartesiaTTSServiceandCartesiaHttpTTSService,emotionhas been deprecated by Cartesia. Pipecat is following suit and deprecatingemotionas well.
-
Since
GeminiMultimodalLiveLLMServicenow transcribes it's own audio, thetranscribe_user_audioarg has been removed. Audio is now transcribed automatically. -
Removed
SileroVADframe processor, just useSileroVADAnalyzerinstead. Also removed,07a-interruptible-vad.pyexample.
-
Fixed a
DailyTransportissue that was not allow capturing video frames if framerate was greater than zero. -
Fixed a
DeegramSTTServiceconnection issue when the user provided their ownLiveOptions. -
Fixed a
DailyTransportissue that would cause images needing resize to block the event loop. -
Fixed an issue with
ElevenLabsTTSServicewhere changing the model or voice while the service is running wasn't working. -
Fixed an issue that would cause multiple instances of the same class to behave incorrectly if any of the given constructor arguments defaulted to a mutable value (e.g. lists, dictionaries, objects).
-
Fixed an issue with
CartesiaTTSServicewhereTTSTextFramemessages weren't being emitted when the model was set tosonic. This resulted in the assistant context not being updated with assistant messages.
-
DailyTransport: process audio, video and events in separate tasks. -
Don't create event handler tasks if no user event handlers have been registered.
-
It is now possible to run all (or most) foundational example with multiple transports. By default, they run with P2P (Peer-To-Peer) WebRTC so you can try everything locally. You can also run them with Daily or even with a Twilio phone number.
-
Added foundation examples
07y-interruptible-minimax.pyand07z-interruptible-sarvam.pyto show how to use theMiniMaxHttpTTSServiceandSarvamTTSService, respectively. -
Added an
open-telemetry-tracingexample, showing how to setup tracing. The example also includes Jaeger as an open source OpenTelemetry client to review traces from the example runs. -
Added foundational example
29-turn-tracking-observer.pyto show how to use theTurnTrackingObserver.
-
Added
DebugLogObserverfor detailed frame logging with configurable filtering by frame type and endpoint. This observer automatically extracts and formats all frame data fields for debug logging. -
UserImageRequestFrame.video_sourcefield has been added to request an image from the desired video source. -
Added support for the AWS Nova Sonic speech-to-speech model with the new
AWSNovaSonicLLMService. See https://docs.aws.amazon.com/nova/latest/userguide/speech.html. Note that it requires Python >= 3.12 andpip install pipecat-ai[aws-nova-sonic]. -
Added new AWS services
AWSBedrockLLMServiceandAWSTranscribeSTTService. -
Added
on_active_speaker_changedevent handler to theDailyTransportclass. -
Added
enable_ssml_parsingandenable_loggingtoInputParamsinElevenLabsTTSService. -
Added support to
RimeHttpTTSServicefor thearcanamodel.
-
Updated
ElevenLabsTTSServiceto use the beta websocket API (multi-stream-input). This new API supports context_ids and cancelling those contexts, which greatly improves interruption handling. -
Observers
on_push_frame()now take a single argumentFramePushedinstead of multiple arguments. -
Updated the default voice for
DeepgramTTSServicetoaura-2-helena-en.
-
PollyTTSServiceis now deprecated, useAWSPollyTTSServiceinstead. -
Observer
on_push_frame(src, dst, frame, direction, timestamp)is now deprecated, useon_push_frame(data: FramePushed)instead.
-
Fixed a
DailyTransportissue that was causing issues when multiple audio or video sources where being captured. -
Fixed a
UltravoxSTTServiceissue that would cause the service to generate all tokens as one word. -
Fixed a
PipelineTaskissue that would cause tasks to not be cancelled if task was cancelled from outside of Pipecat. -
Fixed a
TaskManagerthat was causing dangling tasks to be reported. -
Fixed an issue that could cause data to be sent to the transports when they were still not ready.
-
Remove custom audio tracks from
DailyTransportbefore leaving.
- Removed
CanonicalMetricsServiceas it's no longer maintained.
-
Added two new input parameters to
RimeTTSService:pause_between_bracketsandphonemize_between_brackets. -
Added support for cross-platform local smart turn detection. You can use
LocalSmartTurnAnalyzerfor on-device inference using Torch. -
BaseOutputTransportnow allows multiple destinations if the transport implementation supports it (e.g. Daily's custom tracks). With multiple destinations it is possible to send different audio or video tracks with a single transport simultaneously. To do that, you need to set the newFrame.transport_destinationfield with your desired transport destination (e.g. custom track name), tell the transport you want a new destination withTransportParams.audio_out_destinationsorTransportParams.video_out_destinationsand the transport should take care of the rest. -
Similar to the new
Frame.transport_destination, there's a newFrame.transport_sourcefield which is set by theBaseInputTransportif the incoming data comes from a non-default source (e.g. custom tracks). -
TTSServicehas a newtransport_destinationconstructor parameter. This parameter will be used to update theFrame.transport_destinationfield for each generatedTTSAudioRawFrame. This allows sending multiple bots' audio to multiple destinations in the same pipeline. -
Added
DailyTransportParams.camera_out_enabledandDailyTransportParams.microphone_out_enabledwhich allows you to enable/disable the main output camera or microphone tracks. This is useful if you only want to use custom tracks and not send the main tracks. Note that you still needaudio_out_enabled=Trueorvideo_out_enabled. -
Added
DailyTransport.capture_participant_audio()which allows you to capture an audio source (e.g. "microphone", "screenAudio" or a custom track name) from a remote participant. -
Added
DailyTransport.update_publishing()which allows you to update the call video and audio publishing settings (e.g. audio and video quality). -
Added
RTVIObserverParamswhich allows you to configure what RTVI messages are sent to the clients. -
Added a
context_window_compressionInputParam toGeminiMultimodalLiveLLMServicewhich allows you to enable a sliding context window for the session as well as set the token limit of the sliding window. -
Updated
SmallWebRTCConnectionto supportice_serverswith credentials. -
Added
VADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrame, indicating when the VAD detected the user to start and stop speaking. These events are helpful when using smart turn detection, as the user's stop time can differ from when their turn ends (signified by UserStoppedSpeakingFrame). -
Added
TranslationFrame, a new frame type that contains a translated transcription. -
Added
TransportParams.audio_in_passthrough. If set (the default), incoming audio will be pushed downstream. -
Added
MCPClient; a way to connect to MCP servers and use the MCP servers' tools. -
Added
Mem0 OSS, along with Mem0 cloud support now the OSS version is also available.
TransportParams.audio_mixernow supports a string and also a dictionary to provide a mixer per destination. For example:
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},-
The
STTMuteFilternow mutesInterimTranscriptionFrameandTranscriptionFramewhich allows theSTTMuteFilterto be used in conjunction with transports that generate transcripts, e.g.DailyTransport. -
Function calls now receive a single parameter
FunctionCallParamsinstead of(function_name, tool_call_id, args, llm, context, result_callback)which is now deprecated. -
Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s (
LLMUserAggregatorParams.aggregation_timeout). Sometimes, the STT services might give us more than one transcription which could come after the user stopped speaking. We still want to include these additional transcriptions with the first one because it's part of the user turn. This is what this timeout is helpful with. -
Short utterances not detected by VAD while the bot is speaking are now ignored. This reduces the amount of bot interruptions significantly providing a more natural conversation experience.
-
Updated
GladiaSTTServiceto output aTranslationFramewhen specifying atranslationandtranslation_config. -
STT services now passthrough audio frames by default. This allows you to add audio recording without worrying about what's wrong in your pipeline when it doesn't work the first time.
-
Input transports now always push audio downstream unless disabled with
TransportParams.audio_in_passthrough. After many Pipecat releases, we realized this is the common use case. There are use cases where the input transport already provides STT and you also don't want recordings, in which case there's no need to push audio to the rest of the pipeline, but this is not a very common case. -
Added
RivaSegmentedSTTService, which allows Riva offline/batch models, such as to be "canary-1b-asr" used in Pipecat.
-
Function calls with parameters
(function_name, tool_call_id, args, llm, context, result_callback)are deprectated, use a singleFunctionCallParamsparameter instead. -
TransportParams.camera_*parameters are now deprecated, useTransportParams.video_*instead. -
TransportParams.vad_enabledparameter is now deprecated, useTransportParams.audio_in_enabledandTransportParams.vad_analyzerinstead. -
TransportParams.vad_audio_passthroughparameter is now deprecated, useTransportParams.audio_in_passthroughinstead. -
ParakeetSTTServiceis now deprecated, useRivaSTTServiceinstead, which uses the model "parakeet-ctc-1.1b-asr" by default. -
FastPitchTTSServiceis now deprecated, useRivaTTSServiceinstead, which uses the model "magpie-tts-multilingual" by default.
-
Fixed an issue with
SimliVideoServicewhere the bot was continuously outputting audio, which prevents theBotStoppedSpeakingFramefrom being emitted. -
Fixed an issue where
OpenAIRealtimeBetaLLMServicewould add two assistant messages to the context. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the context contained tokens instead of words. -
Fixed an issue with HTTP Smart Turn handling, where the service returns a 500 error. Previously, this would cause an unhandled exception. Now, a 500 error is treated as an incomplete response.
-
Fixed a TTS services issue that could cause assistant output not to be aggregated to the context when also using
TTSSpeakFrames. -
Fixed an issue where the
SmartTurnMetricsDatawas reporting 0ms for inference and processing time when using theFalSmartTurnAnalyzer.
-
Added
examples/daily-custom-tracksto show how to send and receive Daily custom tracks. -
Added
examples/daily-multi-translationto showcase how to send multiple simulataneous translations with the same transport. -
Added 04 foundational examples for client/server transports. Also, renamed
29-livekit-audio-chat.pyto04b-transports-livekit.py. -
Added foundational example
13c-gladia-translation.pyshowing how to useTranscriptionFrameandTranslationFrame.
https://en.wikipedia.org/wiki/Saint_George%27s_Day_in_Catalonia
-
Added automatic hangup logic to the Telnyx serializer. This feature hangs up the Telnyx call when an
EndFrameorCancelFrameis received. It is enabled by default and is configurable via theauto_hang_upInputParam. -
Added a keepalive task to
GladiaSTTServiceto prevent the websocket from disconnecting after 30 seconds of no audio input.
-
The
InputParamsforElevenLabsTTSServiceandElevenLabsHttpTTSServiceno longer require thatstabilityandsimilarity_boostbe set. You can individually set each param. -
In
TwilioFrameSerializer,call_sidis Optional so as to avoid a breaking changed.call_sidis required to automatically hang up.
- Fixed an issue where
TwilioFrameSerializerwould send two hang up commands: one for theEndFrameand one for theCancelFrame.
-
Added automatic hangup logic to the Twilio serializer. This feature hangs up the Twilio call when an
EndFrameorCancelFrameis received. It is enabled by default and is configurable via theauto_hang_upInputParam. -
Added
SmartTurnMetricsData, which contains end-of-turn prediction metrics, to theMetricsFrame. UsingMetricsFrame, you can now retrieve prediction confidence scores and processing time metrics from the smart turn analyzers. -
Added support for Application Default Credentials in Google services,
GoogleSTTService,GoogleTTSService, andGoogleVertexLLMService. -
Added support for Smart Turn Detection via the
turn_analyzertransport parameter. You can now choose betweenHttpSmartTurnAnalyzer()orFalSmartTurnAnalyzer()for remote inference orLocalCoreMLSmartTurnAnalyzer()for on-device inference using Core ML. -
DeepgramTTSServiceacceptsbase_urlargument again, allowing you to connect to an on-prem service. -
Added
LLMUserAggregatorParamsandLLMAssistantAggregatorParamswhich allow you to control aggregator settings. You can now pass these arguments when creating aggregator pairs withcreate_context_aggregator(). -
Added
previous_textcontext support to ElevenLabsHttpTTSService, improving speech consistency across sentences within an LLM response. -
Added word/timestamp pairs to
ElevenLabsHttpTTSService. -
It is now possible to disable
SoundfileMixerwhen created. You can then useMixerEnableFrameto dynamically enable it when necessary. -
Added
on_client_connectedandon_client_disconnectedevent handlers to theDailyTransportclass. These handlers map to the same underlying Daily events ason_participant_joinedandon_participant_left, respectively. This makes it easier to write a single bot pipeline that can also use other transports likeSmallWebRTCTransportandFastAPIWebsocketTransport.
-
GrokLLMServicenow usesgrok-3-betaas its default model. -
Daily's REST helpers now include an
eject_at_token_expparam, which ejects the user when their token expires. This new parameter defaults to False. Also, the default value forenable_prejoin_uichanged to False andeject_at_room_expchanged to False. -
OpenAILLMServiceandOpenPipeLLMServicenow usegpt-4.1as their default model. -
SoundfileMixerconstructor arguments need to be keywords.
DeepgramSTTServiceparameterurlis now deprecated, usebase_urlinstead.
- Parameters
user_kwargsandassistant_kwargswhen creating a context aggregator pair usingcreate_context_aggregator()have been removed. Useuser_paramsandassistant_paramsinstead.
-
Fixed an issue that would cause TTS websocket-based services to not cleanup resources properly when disconnecting.
-
Fixed a
TavusVideoServiceissue that was causing audio choppiness. -
Fixed an issue in
SmallWebRTCTransportwhere an error was thrown if the client did not create a video transceiver. -
Fixed an issue where LLM input parameters were not working and applied correctly in
GoogleVertexLLMService, causing unexpected behavior during inference.
- Updated the
twilio-chatbotexample to use the auto-hangup feature.
-
Added media resolution control to
GeminiMultimodalLiveLLMServicewithGeminiMediaResolutionenum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens). -
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMServicewithGeminiVADParams, allowing fine control over speech detection sensitivity and timing, including:- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
-
Added comprehensive language support to
GeminiMultimodalLiveLLMService, supporting over 30 languages via thelanguageparameter, with proper mapping between Pipecat'sLanguageenum and Gemini's language codes. -
Added support in
SmallWebRTCTransportto detect when remote tracks are muted. -
Added support for image capture from a video stream to the
SmallWebRTCTransport. -
Added a new iOS client option to the
SmallWebRTCTransportvideo-transform example. -
Added new processors
ProducerProcessorandConsumerProcessor. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when usingParallelPipeline). -
Improvements for the
SmallWebRTCTransport:- Wait until the pipeline is ready before triggering the
connectedevent. - Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission.
- Avoid initial video delays.
- Wait until the pipeline is ready before triggering the
-
In
GeminiMultimodalLiveLLMService, removed thetranscribe_model_audioparameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required. -
Updated
GeminiMultimodalLiveLLMService’s defaultmodeltomodels/gemini-2.0-flash-live-001andbase_urlto thev1betawebsocket URL.
-
Updated
daily-pythonto 0.17.0 to fix an issue that was preventing to run on older platforms. -
Fixed an issue where
CartesiaTTSService's spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR". -
Fixed an issue in the Azure TTS services where the language was being set incorrectly.
-
Fixed
SmallWebRTCTransportto support dynamic values forTransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms chunks. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the assistant context messages had no space between words. -
Fixed an issue where
LLMAssistantContextAggregatorwould prevent aBotStoppedSpeakingFramefrom moving through the pipeline.
-
Added
TransportParams.audio_out_10ms_chunksparameter to allow controlling the amount of audio being sent by the output transport. It defaults to 4, so 40ms audio chunks are sent. -
Added
QwenLLMServicefor Qwen integration with an OpenAI-compatible interface. Added foundational example14q-function-calling-qwen.py. -
Added
Mem0MemoryService. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. -
Added
WhisperSTTServiceMLXfor Whisper transcription on Apple Silicon. See example inexamples/foundational/13e-whisper-mlx.py. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. -
Added
SmallWebRTCTransport, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransportusingTypeScript. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
GladiaSTTServicenow have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. -
Added
SmallWebRTCTransport, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransportusingTypeScript. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
Added support to
ProtobufFrameSerializerto send the messages fromTransportMessageFrameandTransportMessageUrgentFrame. -
Added support for a new TTS service,
PiperTTSService. (see https://github.com/rhasspy/piper/) -
It is now possible to tell whether
UserStartedSpeakingFrameorUserStoppedSpeakingFramehave been generated because of emulation frames.
-
FunctionCallResultFramea are now system frames. This is to prevent function call results to be discarded during interruptions. -
Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:
- image: for image generation services
- llm: for LLM services
- memory: for memory services
- stt: for Speech-To-Text services
- tts: for Text-To-Speech services
- video: for video generation services
- vision: for video recognition services
-
Base classes for AI services have been reorganized into modules. They can now be found in
pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
GladiaSTTServicenow uses thesolaria-1model by default. Other params use Gladia's default values. Added support for more language codes.
-
All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be
pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]. For example,from pipecat.services.openai.llm import OpenAILLMService. -
Import for AI services base classes from
pipecat.services.ai_servicesis now deprecated, use one ofpipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
Deprecated the
languageparameter inGladiaSTTService.InputParamsin favor oflanguage_config, which better aligns with Gladia's API. -
Deprecated using
GladiaSTTService.InputParamsdirectly. Use the newGladiaInputParamsclass instead.
-
Fixed a
FastAPIWebsocketTransportandWebsocketClientTransportissue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending anEndFrame, preventing the bot to finish. -
Fixed an issue that could cause the
TranscriptionUpdateFramebeing pushed because of an interruption to be discarded. -
Fixed an issue that would cause
SegmentedSTTServicebased services (e.g.OpenAISTTService) to try to transcribe non-spoken audio, causing invalid transcriptions. -
Fixed an issue where
GoogleTTSServicewas emitting twoTTSStoppedFrames.
-
Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.
-
BotSpeakingFrames are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.
-
Added foundational example
37-mem0.pydemonstrating how to use theMem0MemoryService. -
Added foundational example
13e-whisper-mlx.pydemonstrating how to use theWhisperSTTServiceMLX.
-
Added a new frame,
LLMSetToolChoiceFrame, which provides a mechanism for modifying thetool_choicein the context. -
Added
GroqTTSServicewhich provides text-to-speech functionality using Groq's API. -
Added support in
DailyTransportfor updating remote participants'canReceivepermission via theupdate_remote_participants()method, by bumping the daily-python dependency to >= 0.16.0. -
ElevenLabs TTS services now support a sample rate of 8000.
-
Added support for
instructionsinOpenAITTSService. -
Added support for
base_urlinOpenAIImageGenServiceandOpenAITTSService.
-
Fixed an issue in
RTVIObserverthat prevented handling of Google LLM context messages. The observer now processes both OpenAI-style and Google-style contexts. -
Fixed an issue in Daily involving switching virtual devices, by bumping the daily-python dependency to >= 0.16.1.
-
Fixed a
GoogleAssistantContextAggregatorissue where function calls placeholders where not being updated when then function call result was different from a string. -
Fixed an issue that would cause
LLMAssistantContextAggregatorto block processing more frames while processing a function call result. -
Fixed an issue where the
RTVIObserverwould report two bot started and stopped speaking events for each bot turn. -
Fixed an issue in
UltravoxSTTServicethat caused improper audio processing and incorrect LLM frame output.
- Added
examples/foundational/07x-interruptible-local.pyto show how a local transport can be used.
- Added
default_headersparameter toBaseOpenAILLMServiceconstructor.
-
Rollback to
deepgram-sdk3.8.0 since 3.10.1 was causing connections issues. -
Changed the default
InputAudioTranscriptionmodel togpt-4o-transcribeforOpenAIRealtimeBetaLLMService.
- Update the
19-openai-realtime-beta.pyand19a-azure-realtime-beta.pyexamples to use the FunctionSchema format.
-
When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via
cancel_on_interruption(defaults to False). This is now possible because function calls are executed concurrently. -
Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the
PipelineTaskwill be automatically cancelled. It is possible to override this behavior by passingcancel_on_idle_timeout=False. It is also possible to change the default timeout withidle_timeout_secsor the frames that prevent the pipeline from being idle withidle_timeout_frames. Finally, anon_idle_timeoutevent handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). -
Added
FalSTTService, which provides STT for Fal's Wizper API. -
Added a
reconnect_on_errorparameter to websocket-based TTS services as well as aon_connection_errorevent handler. Thereconnect_on_errorindicates whether the TTS service should reconnect on error. Theon_connection_errorwill always get called if there's any error no matter the value ofreconnect_on_error. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. -
Added new
SkipTagsAggregatorthat extendsBaseTextAggregatorto aggregate text and skips end of sentence matching if aggregated text is between start/end tags. -
Added new
PatternPairAggregatorthat extendsBaseTextAggregatorto identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. -
Added new
BaseTextAggregator. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed viatext_aggregatorto the TTS service. -
Added new
sample_rateconstructor parameter toTavusVideoServiceto allow changing the output sample rate. -
Added new
NeuphonicTTSService. (see https://neuphonic.com) -
Added new
UltravoxSTTService. (see https://github.com/fixie-ai/ultravox) -
Added
on_frame_reached_upstreamandon_frame_reached_downstreamevent handlers toPipelineTask. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set withPipelineTask.set_reached_upstream_filter()orPipelineTask.set_reached_downstream_filter(). -
Added support for Chirp voices in
GoogleTTSService. -
Added a
flush_audio()method toFishTTSServiceandLmntTTSService. -
Added a
set_languageconvenience method forGoogleSTTService, allowing you to set a single language. This is in addition to theset_languagesmethod which allows you to set a list of languages. -
Added
on_user_turn_audio_dataandon_bot_turn_audio_datatoAudioBufferProcessor. This gives the ability to grab the audio of only that turn for both the user and the bot. -
Added new base class
BaseObjectwhich is now the base class ofFrameProcessor,PipelineRunner,PipelineTaskandBaseTransport. The newBaseObjectadds supports for event handlers. -
Added support for a unified format for specifying function calling across all LLM services.
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])-
Added
speech_thresholdparameter toGladiaSTTService. -
Allow passing user (
user_kwargs) and assistant (assistant_kwargs) context aggregator parameters when usingcreate_context_aggregator(). The values are passed as a mapping that will then be converted to arguments. -
Added
speedas anInputParamfor bothElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added new
LLMFullResponseAggregatorto aggregate full LLM completions. At every completion theon_completionevent handler is triggered. -
Added a new frame,
RTVIServerMessageFrame, and RTVI messageRTVIServerMessagewhich provides a generic mechanism for sending custom messages from server to client. TheRTVIServerMessageFrameis processed by theRTVIObserverand will be delivered to the client'sonServerMessagecallback orServerMessageevent. -
Added
GoogleLLMOpenAIBetaServicefor Google LLM integration with an OpenAI-compatible interface. Added foundational example14o-function-calling-gemini-openai-format.py. -
Added
AzureRealtimeBetaLLMServiceto support Azure's OpeanAI Realtime API. Added foundational example19a-azure-realtime-beta.py. -
Introduced
GoogleVertexLLMService, a new class for integrating with Vertex AI Gemini models. Added foundational example14p-function-calling-gemini-vertex-ai.py. -
Added support in
OpenAIRealtimeBetaLLMServicefor a slate of new features:-
The
'gpt-4o-transcribe'input audio transcription model, along with newlanguageandpromptoptions specific to that model. -
The
input_audio_noise_reductionsession property.session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... )
-
The
'semantic_vad'turn_detectionsession property value, a more sophisticated model for detecting when the user has stopped speaking. -
on_conversation_item_createdandon_conversation_item_updatedevents toOpenAIRealtimeBetaLLMService.@llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ...
-
The
retrieve_conversation_item(item_id)method for introspecting a conversation item on the server.item = await llm.retrieve_conversation_item(item_id)
-
-
Updated
OpenAISTTServiceto usegpt-4o-transcribeas the default transcription model. -
Updated
OpenAITTSServiceto usegpt-4o-mini-ttsas the default TTS model. -
Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed.
-
⚠️ PipelineTaskwill now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, seePipelineTaskdocumentation for more details. -
All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete.
-
Updated
TranscriptProcessorto support text output fromOpenAIRealtimeBetaLLMService. -
OpenAIRealtimeBetaLLMServiceandGeminiMultimodalLiveLLMServicenow push aTTSTextFrame. -
Updated the default mode for
CartesiaTTSServiceandCartesiaHttpTTSServicetosonic-2.
-
Passing a
start_callbacktoLLMService.register_function()is now deprecated, simply move the code from the start callback to the function call. -
TTSServiceparametertext_filteris now deprecated, usetext_filtersinstead which is now a list. This allows passing multiple filters that will be executed in order.
-
Removed deprecated
audio.resample_audio(), usecreate_default_resampler()instead. -
Removed deprecated
stt_serviceparameter fromSTTMuteFilter. -
Removed deprecated RTVI processors, use an
RTVIObserverinstead. -
Removed deprecated
AWSTTSService, usePollyTTSServiceinstead. -
Removed deprecated field
tierfromDailyTranscriptionSettings, usemodelinstead. -
Removed deprecated
pipecat.vadpackage, usepipecat.audio.vadinstead.
-
Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls.
-
Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications.
-
Fixed a
SegmentedSTTServiceissue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. -
Fixed a
GeminiMultimodalLiveLLMServiceissue that was causing messages to be duplicated in the context when pushingLLMMessagesAppendFrameframes. -
Fixed an issue with
SegmentedSTTServicebased services (e.g.GroqSTTService) that was not allow audio to pass-through downstream. -
Fixed a
CartesiaTTSServiceandRimeTTSServiceissue that would consider text between spelling out tags end of sentence. -
Fixed a
match_endofsentenceissue that would result in floating point numbers to be considered an end of sentence. -
Fixed a
match_endofsentenceissue that would result in emails to be considered an end of sentence. -
Fixed an issue where the RTVI message
disconnect-botwas pushing anEndFrame, resulting in the pipeline not shutting down. It now pushes anEndTaskFrameupstream to shutdown the pipeline. -
Fixed an issue with the
GoogleSTTServicewhere stream timeouts during periods of inactivity were causing connection failures. The service now properly detects timeout errors and handles reconnection gracefully, ensuring continuous operation even after periods of silence or when using anSTTMuteFilter. -
Fixed an issue in
RimeTTSServicewhere the last line of text sent didn't result in an audio output being generated. -
Fixed
OpenAIRealtimeBetaLLMServiceby adding proper handling for:- The
conversation.item.input_audio_transcription.deltaserver message, which was added server-side at some point and not handled client-side. - Errors reported by the
response.doneserver message.
- The
-
Add foundational example
07w-interruptible-fal.py, showingFalSTTService. -
Added a new Ultravox example
examples/foundational/07u-interruptible-ultravox.py. -
Added new Neuphonic examples
examples/foundational/07v-interruptible-neuphonic.pyandexamples/foundational/07v-interruptible-neuphonic-http.py. -
Added a new example
examples/foundational/36-user-email-gathering.pyto show how to gather user emails. The example uses's Cartesia's<spell></spell>tags and Rimespell()function to spell out the emails for confirmation. -
Update the
34-audio-recording.pyexample to include an STT processor. -
Added foundational example
35-voice-switching.pyshowing how to use the newPatternPairAggregator. This example shows how to encode information for the LLM to instruct TTS voice changes, but this can be used to encode any information into the LLM response, which you want to parse and use in other parts of your application. -
Added a Pipecat Cloud deployment example to the
examplesdirectory. -
Removed foundational examples 28b and 28c as the TranscriptProcessor no longer has an LLM depedency. Renamed foundational example 28a to
28-transcript-processor.py.
-
Added track-specific audio event
on_track_audio_datatoAudioBufferProcessorfor accessing separate input and output audio tracks. -
Pipecat version will now be logged on every application startup. This will help us identify what version we are running in case of any issues.
-
Added a new
StopFramewhich can be used to stop a pipeline task while keeping the frame processors running. The frame processors could then be used in a different pipeline. The difference between aStopFrameand aStopTaskFrameis that, as withEndFrameandEndTaskFrame, theStopFrameis pushed from the task and theStopTaskFrameis pushed upstream inside the pipeline by any processor. -
Added a new
PipelineTaskparameterobserversthat replaces the previousPipelineParams.observers. -
Added a new
PipelineTaskparametercheck_dangling_tasksto enable or disable checking for frame processors' dangling tasks when the Pipeline finishes running. -
Added new
on_completion_timeoutevent for LLM services (all OpenAI-based services, Anthropic and Google). Note that this event will only get triggered if LLM timeouts are setup and if the timeout was reached. It can be useful to retrigger another completion and see if the timeout was just a blip. -
Added new log observers
LLMLogObserverandTranscriptionLogObserverthat can be useful for debugging your pipelines. -
Added
room_urlproperty toDailyTransport. -
Added
addonsargument toDeepgramSTTService. -
Added
exponential_backoff_time()toutils.networkmodule.
-
⚠️ PipelineTasknow requires keyword arguments (except for the first one for the pipeline). -
Updated
PlayHTHttpTTSServiceto take avoice_engineandprotocolinput in the constructor. The previous method of providing avoice_engineinput that contains the engine and protocol is deprecated by PlayHT. -
The base
TTSServiceclass now strips leading newlines before sending text to the TTS provider. This change is to solve issues where some TTS providers, like Azure, would not output text due to newlines. -
GrokLLMSServicenow usesgrok-2as the default model. -
AnthropicLLMServicenow usesclaude-3-7-sonnet-20250219as the default model. -
RimeHttpTTSServiceneeds anaiohttp.ClientSessionto be passed to the constructor as all the other HTTP-based services. -
RimeHttpTTSServicedoesn't use a default voice anymore. -
DeepgramSTTServicenow uses the newnova-3model by default. If you want to use the previous model you can passLiveOptions(model="nova-2-general"). (see https://deepgram.com/learn/introducing-nova-3-speech-to-text-api)
stt = DeepgramSTTService(..., live_options=LiveOptions(model="nova-2-general"))PipelineParams.observersis now deprecated, you the newPipelineTaskparameterobservers.
- Remove
TransportParams.audio_out_is_livesince it was not being used at all.
-
Fixed an issue that would cause undesired interruptions via
EmulateUserStartedSpeakingFrame. -
Fixed a
GoogleLLMServicethat was causing an exception when sending inline audio in some cases. -
Fixed an
AudioContextWordTTSServiceissue that would cause anEndFrameto disconnect from the TTS service before audio from all the contexts was received. This affected services like Cartesia and Rime. -
Fixed an issue that was not allowing to pass an
OpenAILLMContextto createGoogleLLMService's context aggregators. -
Fixed a
ElevenLabsTTSService,FishAudioTTSService,LMNTTTSServiceandPlayHTTTSServiceissue that was resulting in audio requested before an interruption being played after an interruption. -
Fixed
match_endofsentencesupport for ellipses. -
Fixed an issue where
EndTaskFramewas not triggeringon_client_disconnectedor closing the WebSocket in FastAPI. -
Fixed an issue in
DeepgramSTTServicewhere thesample_ratepassed to theLiveOptionswas not being used, causing the service to use the default sample rate of pipeline. -
Fixed a context aggregator issue that would not append the LLM text response to the context if a function call happened in the same LLM turn.
-
Fixed an issue that was causing HTTP TTS services to push
TTSStoppedFramemore than once. -
Fixed a
FishAudioTTSServiceissue whereTTSStoppedFramewas not being pushed. -
Fixed an issue that
start_callbackwas not invoked for some LLM services. -
Fixed an issue that would cause
DeepgramSTTServiceto stop working after an error occurred (e.g. sudden network loss). If the network recovered we would not reconnect. -
Fixed a
STTMuteFilterissue that would not mute user audio frames causing transcriptions to be generated by the STT service.
-
Added Gemini support to
examples/phone-chatbot. -
Added foundational example
34-audio-recording.pyshowing how to use the AudioBufferProcessor callbacks to save merged and track recordings.
-
Added new
AudioContextWordTTSService. This is a TTS base class for TTS services that handling multiple separate audio requests. -
Added new frames
EmulateUserStartedSpeakingFrameandEmulateUserStoppedSpeakingFramewhich can be used to emulated VAD behavior without VAD being present or not being triggered. -
Added a new
audio_in_stream_on_startfield toTransportParams. -
Added a new method
start_audio_in_streamingin theBaseInputTransport.- This method should be used to start receiving the input audio in case the
field
audio_in_stream_on_startis set tofalse.
- This method should be used to start receiving the input audio in case the
field
-
Added support for the
RTVIProcessorto handle buffered audio inbase64format, converting it into InputAudioRawFrame for transport. -
Added support for the
RTVIProcessorto triggerstart_audio_in_streamingonly after theclient-readymessage. -
Added new
MUTE_UNTIL_FIRST_BOT_COMPLETEstrategy toSTTMuteStrategy. This strategy starts muted and remains muted until the first bot speech completes, ensuring the bot's first response cannot be interrupted. This complements the existingFIRST_SPEECHstrategy which only mutes during the first detected bot speech. -
Added support for Google Cloud Speech-to-Text V2 through
GoogleSTTService. -
Added
RimeTTSService, a newWordTTSService. Updated the foundational example07q-interruptible-rime.pyto useRimeTTSService. -
Added support for Groq's Whisper API through the new
GroqSTTServiceand OpenAI's Whisper API through the newOpenAISTTService. Introduced a new base classBaseWhisperSTTServiceto handle common Whisper API functionality. -
Added
PerplexityLLMServicefor Perplexity NIM API integration, with an OpenAI-compatible interface. Also, added foundational example14n-function-calling-perplexity.py. -
Added
DailyTransport.update_remote_participants(). This allows you to update remote participant's settings, like their permissions or which of their devices are enabled. Requires that the local participant have participant admin permission.
-
We don't consider a colon
:and end of sentence any more. -
Updated
DailyTransportto respect theaudio_in_stream_on_startfield, ensuring it only starts receiving the audio input if it is enabled. -
Updated
FastAPIWebsocketOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Updated
WebsocketServerOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Enhanced
STTMuteConfigto validate strategy combinations, preventingMUTE_UNTIL_FIRST_BOT_COMPLETEandFIRST_SPEECHfrom being used together as they handle first bot speech differently. -
Updated foundational example
07n-interruptible-google.pyto use all Google services. -
RimeHttpTTSServicenow uses themistv2model by default. -
Improved error handling in
AzureTTSServiceto properly detect and log synthesis cancellation errors. -
Enhanced
WhisperSTTServicewith full language support and improved model documentation. -
Updated foundation example
14f-function-calling-groq.pyto useGroqSTTServicefor transcription. -
Updated
GroqLLMServiceto usellama-3.3-70b-versatileas the default model. -
RTVIObserverdoesn't handleLLMSearchResponseFrameframes anymore. For now, to handle those frames you need to create aGoogleRTVIObserverinstead.
-
STTMuteFilterconstructor'sstt_serviceparameter is now deprecated and will be removed in a future version. The filter now manages mute state internally instead of querying the STT service. -
RTVI.observer()is now deprecated, instantiate anRTVIObserverdirectly instead. -
All RTVI frame processors (e.g.
RTVISpeakingProcessor,RTVIBotLLMProcessor) are now deprecated, instantiate anRTVIObserverinstead.
-
Fixed a
FalImageGenServiceissue that was causing the event loop to be blocked while loading the downloadded image. -
Fixed a
CartesiaTTSServiceservice issue that would cause audio overlapping in some cases. -
Fixed a websocket-based service issue (e.g.
CartesiaTTSService) that was preventing a reconnection after the server disconnected cleanly, which was causing an inifite loop instead. -
Fixed a
BaseOutputTransportissue that was causing upstream frames to no be pushed upstream. -
Fixed multiple issue where user transcriptions where not being handled properly. It was possible for short utterances to not trigger VAD which would cause user transcriptions to be ignored. It was also possible for one or more transcriptions to be generated after VAD in which case they would also be ignored.
-
Fixed an issue that was causing
BotStoppedSpeakingFrameto be generated too late. This could then cause issues unblockingSTTMuteFilterlater than desired. -
Fixed an issue that was causing
AudioBufferProcessorto not record synchronized audio. -
Fixed an
RTVIissue that was causingbot-tts-textmessages to be sent before being processed by the output transport. -
Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect the websocket connection even when the connection is already closed.
-
Fixed an issue where
has_regular_messagescondition was always true inGoogleLLMContextdue toParthavingfunction_call&function_responsewithNonevalues.
-
Added new
instant-voiceexample. This example showcases how to enable instant voice communication as soon as a user connects. -
Added new
local-input-select-sttexample. This examples allows you to play with local audio inputs by slecting them through a nice text interface.
-
Use
gemini-2.0-flash-001as the default model forGoogleLLMSerivce. -
Improved foundational examples 22b, 22c, and 22d to support function calling. With these base examples,
FunctionCallInProgressFrameandFunctionCallResultFramewill no longer be blocked by the gates.
-
Fixed a
TkLocalTransportandLocalAudioTransportissues that was causing errors on cleanup. -
Fixed an issue that was causing
tests.utilsimport to fail because of logging setup. -
Fixed a
SentryMetricsissue that was preventing any metrics to be sent to Sentry and also was preventing from metrics frames to be pushed to the pipeline. -
Fixed an issue in
BaseOutputTransportwhere incoming audio would not be resampled to the desired output sample rate. -
Fixed an issue with the
TwilioFrameSerializerandTelnyxFrameSerializerwheretwilio_sample_rateandtelnyx_sample_ratewere incorrectly initialized toaudio_in_sample_rate. Those values currently default to 8000 and should be set manually from the serializer constructor if a different value is needed.
- Added a new
sentry-metricsexample.
-
Added a new
start_metadatafield toPipelineParams. The provided metadata will be set to the initialStartFramebeing pushed from thePipelineTask. -
Added new fields to
PipelineParamsto control audio input and output sample rates for the whole pipeline. This allows controlling sample rates from a single place instead of having to specify sample rates in each service. Setting a sample rate to a service is still possible and will override the value fromPipelineParams. -
Introduce audio resamplers (
BaseAudioResampler). This is just a base class to implement audio resamplers. Currently, two implementations are providedSOXRAudioResamplerandResampyResampler. A newcreate_default_resampler()has been added (replacing the now deprecatedresample_audio()). -
It is now possible to specify the asyncio event loop that a
PipelineTaskand all the processors should run on by passing it as a new argument to thePipelineRunner. This could allow running pipelines in multiple threads each one with its own event loop. -
Added a new
utils.TaskManager. Instead of a global task manager we now have a task manager perPipelineTask. In the previous version the task manager was global, so running multiple simultaneousPipelineTasks could result in dangling task warnings which were not actually true. In order, for all the processors to know about the task manager, we pass it through theStartFrame. This means that processors should create tasks when they receive aStartFramebut not before (because they don't have a task manager yet). -
Added
TelnyxFrameSerializerto support Telnyx calls. A full running example has also been added toexamples/telnyx-chatbot. -
Allow pushing silence audio frames before
TTSStoppedFrame. This might be useful for testing purposes, for example, passing bot audio to an STT service which usually needs additional audio data to detect the utterance stopped. -
TwilioSerializernow supports transport message frames. With this we can create Twilio emulators. -
Added a new transport:
WebsocketClientTransport. -
Added a
metadatafield toFramewhich makes it possible to pass custom data to all frames. -
Added
test/utils.pyinside of pipecat package.
-
GatedOpenAILLMContextAggregatornow require keyword arguments. Also, a newstart_openargument has been added to set the initial state of the gate. -
Added
organizationandprojectlevel authentication toOpenAILLMService. -
Improved the language checking logic in
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceto properly handle language codes based on model compatibility, with appropriate warnings when language codes cannot be applied. -
Updated
GoogleLLMContextto support pushingLLMMessagesUpdateFrames that contain a combination of function calls, function call responses, system messages, or just messages. -
InputDTMFFrameis now based onDTMFFrame. There's also a newOutputDTMFFrameframe.
resample_audio()is now deprecated, usecreate_default_resampler()instead.
AudioBufferProcessor.reset_audio_buffers()has been removed, useAudioBufferProcessor.start_recording()andAudioBufferProcessor.stop_recording()instead.
-
Fixed a
AudioBufferProcessorthat would cause crackling in some recordings. -
Fixed an issue in
AudioBufferProcessorwhere user callback would not be called on task cancellation. -
Fixed an issue in
AudioBufferProcessorthat would cause wrong silence padding in some cases. -
Fixed an issue where
ElevenLabsTTSServicemessages would return a 1009 websocket error by increasing the max message size limit to 16MB. -
Fixed a
DailyTransportissue that would cause events to be triggered before join finished. -
Fixed a
PipelineTaskissue that was preventing processors to be cleaned up after cancelling the task. -
Fixed an issue where queuing a
CancelFrameto a pipeline task would not cause the task to finish. However, usingPipelineTask.cancel()is still the recommended way to cancel a task.
-
Improved Unit Test
run_test()to usePipelineTaskandPipelineRunner. There's now also some control aroundStartFrameandEndFrame. TheEndTaskFramehas been removed since it doesn't seem necessary with this new approach. -
Updated
twilio-chatbotwith a few new features: use 8000 sample rate and avoid resampling, a new client useful for stress testing and testing locally without the need to make phone calls. Also, added audio recording on both the client and the server to make sure the audio sounds good. -
Updated examples to use
task.cancel()to immediately exit the example when a participant leaves or disconnects, instead of pushing anEndFrame. Pushing anEndFramecauses the bot to run through everything that is internally queued (which could take some seconds). Note that usingtask.cancel()might not always be the best option and pushing anEndFramecould still be desirable to make sure all the pipeline is flushed.
-
In order to create tasks in Pipecat frame processors it is now recommended to use
FrameProcessor.create_task()(which uses the newutils.asyncio.create_task()). It takes care of uncaught exceptions, task cancellation handling and task management. To cancel or wait for a task there isFrameProcessor.cancel_task()andFrameProcessor.wait_for_task(). All of Pipecat processors have been updated accordingly. Also, when a pipeline runner finishes, a warning about dangling tasks might appear, which indicates if any of the created tasks was never cancelled or awaited for (using these new functions). -
It is now possible to specify the period of the
PipelineTaskheartbeat frames withheartbeats_period_secs. -
Added
DailyMeetingTokenPropertiesandDailyMeetingTokenParamsPydantic models for meeting token creation inget_tokenmethod ofDailyRESTHelper. -
Added
enable_recordingandgeoparameters toDailyRoomProperties. -
Added
RecordingsBucketConfigtoDailyRoomPropertiesto upload recordings to a custom AWS bucket.
-
Enhanced
UserIdleProcessorwith retry functionality and control over idle monitoring via new callback signature(processor, retry_count) -> bool. Updated the17-detect-user-idle.pyto show how to use theretry_count. -
Add defensive error handling for
OpenAIRealtimeBetaLLMService's audio truncation. Audio truncation errors during interruptions now log a warning and allow the session to continue instead of throwing an exception. -
Modified
TranscriptProcessorto use TTS text frames for more accurate assistant transcripts. Assistant messages are now aggregated based on bot speaking boundaries rather than LLM context, providing better handling of interruptions and partial utterances. -
Updated foundational examples
28a-transcription-processor-openai.py,28b-transcript-processor-anthropic.py, and28c-transcription-processor-gemini.pyto use the updatedTranscriptProcessor.
-
Fixed an
GeminiMultimodalLiveLLMServiceissue that was preventing the user to push initial LLM assistant messages (usingLLMMessagesAppendFrame). -
Added missing
FrameProcessor.cleanup()calls toPipeline,ParallelPipelineandUserIdleProcessor. -
Fixed a type error when using
voice_settingsinElevenLabsHttpTTSService. -
Fixed an issue where
OpenAIRealtimeBetaLLMServicefunction calling resulted in an error. -
Fixed an issue in
AudioBufferProcessorwhere the last audio buffer was not being processed, in cases where the_user_audio_bufferwas smaller than the buffer size.
- Replaced audio resampling library
resampywithsoxr. Resampling a 2:21s audio file from 24KHz to 16KHz took 1.41s withresampyand 0.031s withsoxrwith similar audio quality.
- Added initial unit test infrastructure.
-
Added
ElevenLabsHttpTTSServicewhich uses EleveLabs' HTTP API instead of the websocket one. -
Introduced pipeline frame observers. Observers can view all the frames that go through the pipeline without the need to inject processors in the pipeline. This can be useful, for example, to implement frame loggers or debuggers among other things. The example
examples/foundational/30-observer.pyshows how to add an observer to a pipeline for debugging. -
Introduced heartbeat frames. The pipeline task can now push periodic heartbeats down the pipeline when
enable_heartbeats=True. Heartbeats are system frames that are supposed to make it all the way to the end of the pipeline. When a heartbeat frame is received the traversing time (i.e. the time it took to go through the whole pipeline) will be displayed (with TRACE logging) otherwise a warning will be shown. The exampleexamples/foundational/31-heartbeats.pyshows how to enable heartbeats and forces warnings to be displayed. -
Added
LLMTextFrameandTTSTextFramewhich should be pushed by LLM and TTS services respectively instead ofTextFrames. -
Added
OpenRouterfor OpenRouter integration with an OpenAI-compatible interface. Added foundational example14m-function-calling-openrouter.py. -
Added a new
WebsocketServicebased class for TTS services, containing base functions and retry logic. -
Added
DeepSeekLLMServicefor DeepSeek integration with an OpenAI-compatible interface. Added foundational example14l-function-calling-deepseek.py. -
Added
FunctionCallResultPropertiesdataclass to provide a structured way to control function call behavior, including:run_llm: Controls whether to trigger LLM completionon_context_updated: Optional callback triggered after context update
-
Added a new foundational example
07e-interruptible-playht-http.pyfor easy testing ofPlayHTHttpTTSService. -
Added support for Google TTS Journey voices in
GoogleTTSService. -
Added
29-livekit-audio-chat.py, as a new foundational examples forLiveKitTransportLayer. -
Added
enable_prejoin_ui,max_participantsandstart_video_offparams toDailyRoomProperties. -
Added
session_timeouttoFastAPIWebsocketTransportandWebsocketServerTransportfor configuring session timeouts (in seconds). Triggerson_session_timeoutfor custom timeout handling. See examples/websocket-server/bot.py. -
Added the new modalities option and helper function to set Gemini output modalities.
-
Added
examples/foundational/26d-gemini-multimodal-live-text.pywhich is using Gemini as TEXT modality and using another TTS provider for TTS process.
-
Modified
UserIdleProcessorto start monitoring only after first conversation activity (UserStartedSpeakingFrameorBotStartedSpeakingFrame) instead of immediately. -
Modified
OpenAIAssistantContextAggregatorto support controlled completions and to emit context update callbacks viaFunctionCallResultProperties. -
Added
aws_session_tokento thePollyTTSService. -
Changed the default model for
PlayHTHttpTTSServicetoPlay3.0-mini-http. -
api_key,aws_access_key_idandregionare no longer required parameters for the PollyTTSService (AWSTTSService) -
Added
session_timeoutexample inexamples/websocket-server/bot.pyto handle session timeout event. -
Changed
InputParamsinsrc/pipecat/services/gemini_multimodal_live/gemini.pyto support different modalities. -
Changed
DeepgramSTTServiceto sendfinalizeevent whenever VAD detectsUserStoppedSpeakingFrame. This helps in faster transcriptions and clearing theDeepgramaudio buffer.
-
Fixed an issue where
DeepgramSTTServicewas not generating metrics using pipeline's VAD. -
Fixed
UserIdleProcessornot properly propagatingEndFrames through the pipeline. -
Fixed an issue where websocket based TTS services could incorrectly terminate their connection due to a retry counter not resetting.
-
Fixed a
PipelineTaskissue that would cause a dangling task after stopping the pipeline with anEndFrame. -
Fixed an import issue for
PlayHTHttpTTSService. -
Fixed an issue where languages couldn't be used with the
PlayHTHttpTTSService. -
Fixed an issue where
OpenAIRealtimeBetaLLMServiceaudio chunks were hitting an error when truncating audio content. -
Fixed an issue where setting the voice and model for
RimeHttpTTSServicewasn't working. -
Fixed an issue where
IdleFrameProcessorandUserIdleProcessorwere getting initialized before the start of the pipeline.
-
Constructor arguments for GoogleLLMService to directly set tools and tool_config.
-
Smart turn detection example (
22d-natural-conversation-gemini-audio.py) that leverages Gemini 2.0 capabilities (). (see https://x.com/kwindla/status/1870974144831275410) -
Added
DailyTransport.send_dtmf()to send dial-out DTMF tones. -
Added
DailyTransport.sip_call_transfer()to forward SIP and PSTN calls to another address or number. For example, transfer a SIP call to a different SIP address or transfer a PSTN phone number to a different PSTN phone number. -
Added
DailyTransport.sip_refer()to transfer incoming SIP/PSTN calls from outside Daily to another SIP/PSTN address. -
Added an
auto_modeinput parameter toElevenLabsTTSService.auto_modeis set toTrueby default. Enabling this setting disables the chunk schedule and all buffers, which reduces latency. -
Added
KoalaFilterwhich implement on device noise reduction using Koala Noise Suppression. (see https://picovoice.ai/platform/koala/) -
Added
CerebrasLLMServicefor Cerebras integration with an OpenAI-compatible interface. Added foundational example14k-function-calling-cerebras.py. -
Pipecat now supports Python 3.13. We had a dependency on the
audiooppackage which was deprecated and now removed on Python 3.13. We are now usingaudioop-lts(https://github.com/AbstractUmbra/audioop) to provide the same functionality. -
Added timestamped conversation transcript support:
- New
TranscriptProcessorfactory provides access to user and assistant transcript processors. UserTranscriptProcessorprocesses user speech with timestamps from transcription.AssistantTranscriptProcessorprocesses assistant responses with LLM context timestamps.- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message format.
- New examples:
28a-transcription-processor-openai.py,28b-transcription-processor-anthropic.py, and28c-transcription-processor-gemini.py.
- New
-
Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino, Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
-
PlayHTTTSServiceuses the new v4 websocket API, which also fixes an issue where text inputted to the TTS didn't return audio. -
The default model for
ElevenLabsTTSServiceis noweleven_flash_v2_5. -
OpenAIRealtimeBetaLLMServicenow takes amodelparameter in the constructor. -
Updated the default model for the
OpenAIRealtimeBetaLLMService. -
Room expiration (
exp) inDailyRoomPropertiesis now optional (None) by default instead of automatically setting a 5-minute expiration time. You must explicitly set expiration time if desired.
AWSTTSServiceis now deprecated, usePollyTTSServiceinstead.
-
Fixed token counting in
GoogleLLMService. Tokens were summed incorrectly (double-counted in many cases). -
Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service.
-
Fixed an issue that would cause
ParallelPipelineto handleEndFrameincorrectly causing the main pipeline to not terminate or terminate too early. -
Fixed an audio stuttering issue in
FastPitchTTSService. -
Fixed a
BaseOutputTransportissue that was causing non-audio frames being processed before the previous audio frames were played. This will allow, for example, sending a frameAafter aTTSSpeakFrameand the frameAwill only be pushed downstream after the audio generated fromTTSSpeakFramehas been spoken. -
Fixed a
DeepgramSTTServiceissue that was causing language to be passed as an object instead of a string resulting in the connection to fail.
- Fixed an issue in websocket-based TTS services that was causing infinite reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).
-
Added
GeminiMultimodalLiveLLMService. This is an integration for Google's Gemini Multimodal Live API, supporting:- Real-time audio and video input processing
- Streaming text responses with TTS
- Audio transcription for both user and bot speech
- Function calling
- System instructions and context management
- Dynamic parameter updates (temperature, top_p, etc.)
-
Added
AudioTranscriberutility class for handling audio transcription with Gemini models. -
Added new context classes for Gemini:
GeminiMultimodalLiveContextGeminiMultimodalLiveUserContextAggregatorGeminiMultimodalLiveAssistantContextAggregatorGeminiMultimodalLiveContextAggregatorPair
-
Added new foundational examples for
GeminiMultimodalLiveLLMService:26-gemini-multimodal-live.py26a-gemini-multimodal-live-transcription.py26b-gemini-multimodal-live-video.py26c-gemini-multimodal-live-video.py
-
Added
SimliVideoService. This is an integration for Simli AI avatars. (see https://www.simli.com) -
Added NVIDIA Riva's
FastPitchTTSServiceandParakeetSTTService. (see https://www.nvidia.com/en-us/ai-data-science/products/riva/) -
Added
IdentityFilter. This is the simplest frame filter that lets through all incoming frames. -
New
STTMuteStrategycalledFUNCTION_CALLwhich mutes the STT service during LLM function calls. -
DeepgramSTTServicenow exposes two event handlerson_speech_startedandon_utterance_endthat could be used to implement interruptions. See new exampleexamples/foundational/07c-interruptible-deepgram-vad.py. -
Added
GroqLLMService,GrokLLMService, andNimLLMServicefor Groq, Grok, and NVIDIA NIM API integration, with an OpenAI-compatible interface. -
New examples demonstrating function calling with Groq, Grok, Azure OpenAI, Fireworks, and NVIDIA NIM:
14f-function-calling-groq.py,14g-function-calling-grok.py,14h-function-calling-azure.py,14i-function-calling-fireworks.py, and14j-function-calling-nvidia.py. -
In order to obtain the audio stored by the
AudioBufferProcessoryou can now also register anon_audio_dataevent handler. Theon_audio_datahandler will be called every timebuffer_size(a new constructor argument) is reached. Ifbuffer_sizeis 0 (default) you need to manually get the audio as before usingAudioBufferProcessor.merge_audio_buffers().
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
- Added a new RTVI message called
disconnect-bot, which when handled pushes anEndFrameto trigger the pipeline to stop.
-
STTMuteFilternow supports multiple simultaneous muting strategies. -
XTTSServicelanguage now defaults toLanguage.EN. -
SoundfileMixerdoesn't resample input files anymore to avoid startup delays. The sample rate of the provided sound files now need to match the sample rate of the output transport. -
Input frames (audio, image and transport messages) are now system frames. This means they are processed immediately by all processors instead of being queued internally.
-
Expanded the transcriptions.language module to support a superset of languages.
-
Updated STT and TTS services with language options that match the supported languages for each service.
-
Updated the
AzureLLMServiceto use theOpenAILLMService. Updated theapi_versionto2024-09-01-preview. -
Updated the
FireworksLLMServiceto use theOpenAILLMService. Updated the default model toaccounts/fireworks/models/firefunction-v2. -
Updated the
simple-chatbotexample to include a Javascript and React client example, using RTVI JS and React.
- Removed
AppFrame. This was used as a special user custom frame, but there's actually no use case for that.
-
Fixed a
ParallelPipelineissue that would cause system frames to be queued. -
Fixed
FastAPIWebsocketTransportso it can work with binary data (e.g. using the protobuf serializer). -
Fixed an issue in
CartesiaTTSServicethat could cause previous audio to be received after an interruption. -
Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket reconnection. Before, if an error occurred no reconnection was happening.
-
Fixed a
BaseOutputTransportissue that was causing audio to be discarded after anEndFramewas received. -
Fixed an issue in
WebsocketServerTransportandFastAPIWebsocketTransportthat would cause a busy loop when using audio mixer. -
Fixed a
DailyTransportandLiveKitTransportissue where connections were being closed in the input transport prematurely. This was causing frames queued inside the pipeline being discarded. -
Fixed an issue in
DailyTransportthat would cause some internal callbacks to not be executed. -
Fixed an issue where other frames were being processed while a
CancelFramewas being pushed down the pipeline. -
AudioBufferProcessornow handles interruptions properly. -
Fixed a
WebsocketServerTransportissue that would prevent interruptions withTwilioSerializerfrom working. -
DailyTransport.capture_participant_videonow allows capturing user's screen share by simply passingvideo_source="screenVideo". -
Fixed Google Gemini message handling to properly convert appended messages to Gemini's required format.
-
Fixed an issue with
FireworksLLMServicewhere chat completions were failing by removing thestream_optionsfrom the chat completion options.
-
Added RTVI
on_bot_startedevent which is useful in a single turn interaction. -
Added
DailyTransporteventsdialin-connected,dialin-stopped,dialin-erroranddialin-warning. Needs daily-python >= 0.13.0. -
Added
RimeHttpTTSServiceand the07q-interruptible-rime.pyfoundational example. -
Added
STTMuteFilter, a general-purpose processor that combines STT muting and interruption control. When active, it prevents both transcription and interruptions during bot speech. The processor supports multiple strategies:FIRST_SPEECH(mute only during bot's first speech),ALWAYS(mute during all bot speech), orCUSTOM(using provided callback). -
Added
STTMuteFrame, a control frame that enables/disables speech transcription in STT services.
-
There's now an input queue in each frame processor. When you call
FrameProcessor.push_frame()this will internally callFrameProcessor.queue_frame()on the next processor (upstream or downstream) and the frame will be internally queued (except system frames). Then, the queued frames will get processed. With this input queue it is also possible for FrameProcessors to block processing more frames by callingFrameProcessor.pause_processing_frames(). The way to resume processing frames is by callingFrameProcessor.resume_processing_frames(). -
Added audio filter
NoisereduceFilter. -
Introduce input transport audio filters (
BaseAudioFilter). Audio filters can be used to remove background noises before audio is sent to VAD. -
Introduce output transport audio mixers (
BaseAudioMixer). Output transport audio mixers can be used, for example, to add background sounds or any other audio mixing functionality before the output audio is actually written to the transport. -
Added
GatedOpenAILLMContextAggregator. This aggregator keeps the last received OpenAI LLM context frame and it doesn't let it through until the notifier is notified. -
Added
WakeNotifierFilter. This processor expects a list of frame types and will execute a given callback predicate when a frame of any of those type is being processed. If the callback returns true the notifier will be notified. -
Added
NullFilter. A null filter doesn't push any frames upstream or downstream. This is usually used to disable one of the pipelines inParallelPipeline. -
Added
EventNotifier. This can be used as a very simple synchronization feature between processors. -
Added
TavusVideoService. This is an integration for Tavus digital twins. (see https://www.tavus.io/) -
Added
DailyTransport.update_subscriptions(). This allows you to have fine grained control of what media subscriptions you want for each participant in a room. -
Added audio filter
KrispFilter.
-
The following
DailyTransportfunctions are nowasyncwhich means they need to be awaited:start_dialout,stop_dialout,start_recording,stop_recording,capture_participant_transcriptionandcapture_participant_video. -
Changed default output sample rate to 24000. This changes all TTS service to output to 24000 and also the default output transport sample rate. This improves audio quality at the cost of some extra bandwidth.
-
AzureTTSServicenow uses Azure websockets instead of HTTP requests. -
The previous
AzureTTSServiceHTTP implementation is nowAzureHttpTTSService.
-
Websocket transports (FastAPI and Websocket) now synchronize with time before sending data. This allows for interruptions to just work out of the box.
-
Improved bot speaking detection for all TTS services by using actual bot audio.
-
Fixed an issue that was generating constant bot started/stopped speaking frames for HTTP TTS services.
-
Fixed an issue that was causing stuttering with AWS TTS service.
-
Fixed an issue with PlayHTTTSService, where the TTFB metrics were reporting very small time values.
-
Fixed an issue where AzureTTSService wasn't initializing the specified language.
-
Add
23-bot-background-sound.pyfoundational example. -
Added a new foundational example
22-natural-conversation.py. This example shows how to achieve a more natural conversation detecting when the user ends statement.
-
Added
AssemblyAISTTServiceand corresponding foundational examples07o-interruptible-assemblyai.pyand13d-assemblyai-transcription.py. -
Added a foundational example for Gladia transcription:
13c-gladia-transcription.py
-
Updated
GladiaSTTServiceto use the V2 API. -
Changed
DailyTransporttranscription model tonova-2-general.
-
Fixed an issue that would cause an import error when importing
SileroVADAnalyzerfrom the old packagepipecat.vad.silero. -
Fixed
enable_usage_metricsto control LLM/TTS usage metrics separately fromenable_metrics.
-
Added
audio_passthroughparameter toSTTService. If enabled it allows audio frames to be pushed downstream in case other processors need them. -
Added input parameter options for
PlayHTTTSServiceandPlayHTHttpTTSService.
-
Changed
DeepgramSTTServicemodel tonova-2-general. -
Moved
SileroVADaudio processor toprocessors.audio.vad. -
Module
utils.audiois nowaudio.utils. A newresample_audiofunction has been added. -
PlayHTTTSServicenow uses PlayHT websockets instead of HTTP requests. -
The previous
PlayHTTTSServiceHTTP implementation is nowPlayHTHttpTTSService. -
PlayHTTTSServiceandPlayHTHttpTTSServicenow use avoice_engineofPlayHT3.0-mini, which allows for multi-lingual support. -
Renamed
OpenAILLMServiceRealtimeBetatoOpenAIRealtimeBetaLLMServiceto match other services.
-
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorare mostly deprecated, useOpenAILLMContextinstead. -
The
vadpackage is now deprecated andaudio.vadshould be used instead. Theavdpackage will get removed in a future release.
-
Fixed an issue that would cause an error if no VAD analyzer was passed to
LiveKitTransportparams. -
Fixed
SileroVADprocessor to support interruptions properly.
- Added
examples/foundational/07-interruptible-vad.py. This is the same as07-interruptible.pybut using theSileroVADprocessor instead of passing theVADAnalyzerin the transport.
- Metrics messages have moved out from the transport's base output into RTVI.
-
Added support for OpenAI Realtime API with the new
OpenAILLMServiceRealtimeBetaprocessor. (see https://platform.openai.com/docs/guides/realtime/overview) -
Added
RTVIBotTranscriptionProcessorwhich will send the RTVIbot-transcriptionprotocol message. These are TTS text aggregated (into sentences) messages. -
Added new input params to the
MarkdownTextFilterutility. You can setfilter_codeto filter code from text andfilter_tablesto filter tables from text. -
Added
CanonicalMetricsService. This processor uses the newAudioBufferProcessorto capture conversation audio and later send it to Canonical AI. (see https://canonical.chat/) -
Added
AudioBufferProcessor. This processor can be used to buffer mixed user and bot audio. This can later be saved into an audio file or processed by some audio analyzer. -
Added
on_first_participant_joinedevent toLiveKitTransport.
-
LLM text responses are now logged properly as unicode characters.
-
UserStartedSpeakingFrame,UserStoppedSpeakingFrame,BotStartedSpeakingFrame,BotStoppedSpeakingFrame,BotSpeakingFrameandUserImageRequestFrameare now based fromSystemFrame
-
Merge
RTVIBotLLMProcessor/RTVIBotLLMTextProcessorandRTVIBotTTSProcessor/RTVIBotTTSTextProcessorto avoid out of order issues. -
Fixed an issue in RTVI protocol that could cause a
bot-llm-stoppedorbot-tts-stoppedmessage to be sent before abot-llm-textorbot-tts-textmessage. -
Fixed
DeepgramSTTServiceconstructor settings not being merged with default ones. -
Fixed an issue in Daily transport that would cause tasks to be hanging if urgent transport messages were being sent from a transport event handler.
-
Fixed an issue in
BaseOutputTransportthat would causeEndFrameto be pushed downed too early and callFrameProcessor.cleanup()before letting the transport stop properly.
-
Added a new util called
MarkdownTextFilterwhich is a subclass of a new base class calledBaseTextFilter. This is a configurable utility which is intended to filter text received by TTS services. -
Added new
RTVIUserLLMTextProcessor. This processor will send an RTVIuser-llm-textmessage with the user content's that was sent to the LLM.
-
TransportMessageFramedoesn't have anurgentfield anymore, instead there's now aTransportMessageUrgentFramewhich is aSystemFrameand therefore skip all internal queuing. -
For TTS services, convert inputted languages to match each service's language format
- Fixed an issue where changing a language with the Deepgram STT service wouldn't apply the change. This was fixed by disconnecting and reconnecting when the language changes.
-
SentryMetricshas been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetricscan now be passed toFrameProcessor. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py -
Added AWS Polly TTS support and
07m-interruptible-aws.pyas an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessorscan now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor. This processor can be used together with aFrameSerializeras an async generator. It provides agenerator()function that returns anAsyncGeneratorand that yields serialized frames. -
Added
EndTaskFrameandCancelTaskFrame. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rateas a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames.
To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task.
In this version all the frame processors have their own task to push frames. That is, when
push_frame()is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
ptsfield (prensentation timestamp). There's currently just one clock implementationSystemClockand theptsfield is currently only used forTextFrames (audio and image frames will be next). -
A clock can now be specified to
PipelineTask(defaults toSystemClock). This clock will be passed to each frame processor via theStartFrame. -
Added
CartesiaHttpTTSService. -
DailyTransportnow supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrateparameter. The new default is 96kbps. -
DailyTransportnow uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializerwhen usingFastAPIWebsocketTransport. -
Added new
LmntTTSServicetext-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame,TTSLanguageUpdateFrame,STTModelUpdateFrame, andSTTLanguageUpdateFrameframes to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Languageenum.
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrameclass. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame,OutputAudioRawFrame,InputImageRawFrameandOutputImageRawFrame(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTaskhas been renamed toSyncParallelPipeline. ASyncParallelPipelineis a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipelineand aParallelPipelineis that, given an input frame, theSyncParallelPipelinewill wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrameis back a system frame to make sure it's processed immediately by all processors.EndFramestays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamServicerevision to2024-08-26. -
CartesiaTTSServiceandElevenLabsTTSServicenow add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joinedevent now returns the full session data instead of just the participant. -
CartesiaTTSServiceis now a subclass ofTTSService. -
DeepgramSTTServiceis now a subclass ofSTTService. -
WhisperSTTServiceis now a subclass ofSegmentedSTTService. ASegmentedSTTServiceis aSTTServicewhere the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransportissue that would stop audio and video rendering tasks (after receiving andEndFrame) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrameshould be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame) and other frames come in resulting in undesired behavior.
obj_id()andobj_count()now useitertools.countavoiding the need ofthreading.Lock.
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).
- Added
LivekitFrameSerializeraudio frame serializer.
-
Fix
FastAPIWebsocketOutputTransportvariable name clash with subclass. -
Fix an
AnthropicLLMServiceissue with empty arguments in function calling.
- Fixed
studypalexample errors.
-
VAD parameters can now be dynamicallt updated using the
VADParamsUpdateFrame. -
ErrorFramehas now afatalfield to indicate the bot should exit if a fatal error is pushed upstream (false by default). A newFatalErrorFramethat sets this flag to true has been added. -
AnthropicLLMServicenow supports function calling and initial support for prompt caching. (see https://www.anthropic.com/news/prompt-caching) -
ElevenLabsTTSServicecan now specify ElevenLabs input parameters such asoutput_format. -
TwilioFrameSerializercan now specify Twilio's and Pipecat's desired sample rates to use. -
Added new
on_participant_updatedevent toDailyTransport. -
Added
DailyRESTHelper.delete_room_by_name()andDailyRESTHelper.delete_room_by_url(). -
Added LLM and TTS usage metrics. Those are enabled when
PipelineParams.enable_usage_metricsis True. -
AudioRawFrames are now pushed downstream from the base output transport. This allows capturing the exact words the bot says by adding an STT service at the end of the pipeline. -
Added new
GStreamerPipelineSource. This processor can generate image or audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP stream or anything supported by GStreamer). -
Added
TransportParams.audio_out_is_live. This flag is False by default and it is useful to indicate we should not synchronize audio with sporadic images. -
Added new
BotStartedSpeakingFrameandBotStoppedSpeakingFramecontrol frames. These frames are pushed upstream and they should wrapBotSpeakingFrame. -
Transports now allow you to register event handlers without decorators.
-
Support RTVI message protocol 0.1. This includes new messages, support for messages responses, support for actions, configuration, webhooks and a bunch of new cool stuff. (see https://docs.rtvi.ai/)
-
SileroVADdependency is now imported via pip'ssilero-vadpackage. -
ElevenLabsTTSServicenow useseleven_turbo_v2_5model by default. -
BotSpeakingFrameis now a control frame. -
StartFrameis now a control frame similar toEndFrame. -
DeepgramTTSServicenow is more customizable. You can adjust the encoding and sample rate.
-
TTSStartFrameandTTSStopFrameare now sent when TTS really starts and stops. This allows for knowing when the bot starts and stops speaking even with asynchronous services (like Cartesia). -
Fixed
AzureSTTServicetranscription frame timestamps. -
Fixed an issue with
DailyRESTHelper.create_room()expirations which would cause this function to stop working after the initial expiration elapsed. -
Improved
EndFrameandCancelFramehandling.EndFrameshould end things gracefully while aCancelFrameshould cancel all running tasks as soon as possible. -
Fixed an issue in
AIServicethat would cause a yieldedNonevalue to be processed. -
RTVI's
bot-readymessage is now sent when the RTVI pipeline is ready and a first participant joins. -
Fixed a
BaseInputTransportissue that was causing incoming system frames to be queued instead of being pushed immediately. -
Fixed a
BaseInputTransportissue that was causing start/stop interruptions incoming frames to not cancel tasks and be processed properly.
-
Added
studypalexample (from to the Cartesia folks!). -
Most examples now use Cartesia.
-
Added examples
foundational/19a-tools-anthropic.py,foundational/19b-tools-video-anthropic.pyandfoundational/19a-tools-togetherai.py. -
Added examples
foundational/18-gstreamer-filesrc.pyandfoundational/18a-gstreamer-videotestsrc.pythat show how to useGStreamerPipelineSource -
Remove
requestslibrary usage. -
Cleanup examples and use
DailyRESTHelper.
- Fixed a regression introduced in 0.0.38 that would cause Daily transcription to stop the Pipeline.
-
Added
force_reload,skip_validationandtrust_repotoSileroVADandSileroVADAnalyzer. This allows caching and various GitHub repo validations. -
Added
send_initial_empty_metricsflag toPipelineParamsto request for initial empty metrics (zero values). True by default.
-
Fixed initial metrics format. It was using the wrong keys name/time instead of processor/value.
-
STT services should be using ISO 8601 time format for transcription frames.
-
Fixed an issue that would cause Daily transport to show a stop transcription error when actually none occurred.
-
Added
RTVIProcessorwhich implements the RTVI-AI standard. See https://github.com/rtvi-ai -
Added
BotInterruptionFramewhich allows interrupting the bot while talking. -
Added
LLMMessagesAppendFramewhich allows appending messages to the current LLM context. -
Added
LLMMessagesUpdateFramewhich allows changing the LLM context for the one provided in this new frame. -
Added
LLMModelUpdateFramewhich allows updating the LLM model. -
Added
TTSSpeakFramewhich causes the bot say some text. This text will not be part of the LLM context. -
Added
TTSVoiceUpdateFramewhich allows updating the TTS voice.
- We remove the
LLMResponseStartFrameandLLMResponseEndFrameframes. These were added in the past to properly handle interruptions for theLLMAssistantContextAggregator. But theLLMContextAggregatoris now based onLLMResponseAggregatorwhich handles interruptions properly by just processing theStartInterruptionFrame, so there's no need for these extra frames any more.
-
Fixed an issue with
StatelessTextTransformerwhere it was pushing a string instead of aTextFrame. -
TTSServiceend of sentence detection has been improved. It now works with acronyms, numbers, hours and others. -
Fixed an issue in
TTSServicethat would not properly flush the current aggregated sentence if anLLMFullResponseEndFramewas found.
CartesiaTTSServicenow uses websockets which improves speed. It also leverages the new Cartesia contexts which maintains generated audio prosody when multiple inputs are sent, therefore improving audio quality a lot.
-
Added
GladiaSTTService. See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition -
Added
XTTSService. This is a local Text-To-Speech service. See https://github.com/coqui-ai/TTS -
Added
UserIdleProcessor. This processor can be used to wait for any interaction with the user. If the user doesn't say anything within a given timeout a provided callback is called. -
Added
IdleFrameProcessor. This processor can be used to wait for frames within a given timeout. If no frame is received within the timeout a provided callback is called. -
Added new frame
BotSpeakingFrame. This frame will be continuously pushed upstream while the bot is talking. -
It is now possible to specify a Silero VAD version when using
SileroVADAnalyzerorSileroVAD. -
Added
AysncFrameProcessorandAsyncAIService. Some services likeDeepgramSTTServiceneed to process things asynchronously. For example, audio is sent to Deepgram but transcriptions are not returned immediately. In these cases we still require all frames (except system frames) to be pushed downstream from a single task. That's whatAsyncFrameProcessoris for. It creates a task and all frames should be pushed from that task. So, whenever a new Deepgram transcription is ready that transcription will also be pushed from this internal task. -
The
MetricsFramenow includes processing metrics if metrics are enabled. The processing metrics indicate the time a processor needs to generate all its output. Note that not all processors generate these kind of metrics.
-
WhisperSTTServicemodel can now also be a string. -
Added missing * keyword separators in services.
-
WebsocketServerTransportdoesn't try to send frames anymore if serializers returnsNone. -
Fixed an issue where exceptions that occurred inside frame processors were being swallowed and not displayed.
-
Fixed an issue in
FastAPIWebsocketTransportwhere it would still try to send data to the websocket after being closed.
-
Added Fly.io deployment example in
examples/deployment/flyio-example. -
Added new
17-detect-user-idle.pyexample that shows how to use the newUserIdleProcessor.
-
FastAPIWebsocketParamsnow require a serializer. -
TwilioFrameSerializernow requires astreamSid.
- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for 8000 sample rate.
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could interruptions to ignore transcriptions.
-
Fixed an issue introduced in 0.0.33 that would cause the LLM to generate shorter output.
- Upgraded to Cartesia's new Python library 1.0.0.
CartesiaTTSServicenow expects a voice ID instead of a voice name (you can get the voice ID from Cartesia's playground). You can also specify the audiosample_rateandencodinginstead of the previousoutput_format.
-
Fixed an issue with asynchronous STT services (Deepgram and Azure) that could cause static audio issues and interruptions to not work properly when dealing with multiple LLMs sentences.
-
Fixed an issue that could mix new LLM responses with previous ones when handling interruptions.
-
Fixed a Daily transport blocking situation that occurred while reading audio frames after a participant left the room. Needs daily-python >= 0.10.1.
-
Allow specifying a
DeepgramSTTServiceurl which allows using on-prem Deepgram. -
Added new
FastAPIWebsocketTransport. This is a new websocket transport that can be integrated with FastAPI websockets. -
Added new
TwilioFrameSerializer. This is a new serializer that knows how to serialize and deserialize audio frames from Twilio. -
Added Daily transport event:
on_dialout_answered. See https://reference-python.daily.co/api_reference.html#daily.EventHandler -
Added new
AzureSTTService. This allows you to use Azure Speech-To-Text.
- Convert
BaseOutputTransportandBaseOutputTransportto fully use asyncio and remove the use of threads.
-
Added
twilio-chatbot. This is an example that shows how to integrate Twilio phone numbers with a Pipecat bot. -
Updated
07f-interruptible-azure.pyto useAzureLLMService,AzureSTTServiceandAzureTTSService.
- Break long audio frames into 20ms chunks instead of 10ms.
-
Added
report_only_initial_ttfbtoPipelineParams. This will make it so only the initial TTFB metrics after the user stops talking are reported. -
Added
OpenPipeLLMService. This service will let you run OpenAI through OpenPipe's SDK. -
Allow specifying frame processors' name through a new
nameconstructor argument. -
Added
DeepgramSTTService. This service has an ongoing websocket connection. To handle this, it subclassesAIServiceinstead ofSTTService. The output of this service will be pushed from the same task, except system frames likeStartFrame,CancelFrameorStartInterruptionFrame.
-
FrameSerializer.deserialize()can now returnNonein case it is not possible to desearialize the given data. -
daily_rest.DailyRoomPropertiesnow allows extra unknown parameters.
-
Fixed an issue where
DailyRoomProperties.expalways had the same old timestamp unless set by the user. -
Fixed a couple of issues with
WebsocketServerTransport. It needed to usepush_audio_frame()and also VAD was not working properly. -
Fixed an issue that would cause LLM aggregator to fail with small
VADParams.stop_secsvalues. -
Fixed an issue where
BaseOutputTransportwould send longer audio frames preventing interruptions.
-
Added new
07h-interruptible-openpipe.pyexample. This example shows how to use OpenPipe to run OpenAI LLMs and get the logs stored in OpenPipe. -
Added new
dialin-chatbotexample. This examples shows how to call the bot using a phone number.
-
Added a new
FunctionFilter. This filter will let you filter frames based on a given function, except system messages which should never be filtered. -
Added
FrameProcessor.can_generate_metrics()method to indicate if a processor can generate metrics. In the future this might get an extra argument to ask for a specific type of metric. -
Added
BasePipeline. All pipeline classes should be based on this class. All subclasses should implement aprocessors_with_metrics()method that returns a list of allFrameProcessors in the pipeline that can generate metrics. -
Added
enable_metricstoPipelineParams. -
Added
MetricsFrame. TheMetricsFramewill report different metrics in the system. Right now, it can report TTFB (Time To First Byte) values for different services, that is the time spent between the arrival of aFrameto the processor/service until the firstDataFrameis pushed downstream. If metrics are enabled an intialMetricsFramewith all the services in the pipeline will be sent. -
Added TTFB metrics and debug logging for TTS services.
- Moved
ParallelTasktopipecat.pipeline.parallel_task.
- Fixed PlayHT TTS service to work properly async.
- Fixed an issue with
SileroVADAnalyzerthat would cause memory to keep growing indefinitely.
- Added
DailyTransport.participants()andDailyTransport.participant_counts().
-
Added
OpenAITTSService. -
Allow passing
output_formatandmodel_idtoCartesiaTTSServiceto change audio sample format and the model to use. -
Added
DailyRESTHelperwhich helps you create Daily rooms and tokens in an easy way. -
PipelineTasknow has ahas_finished()method to indicate if the task has completed. If a task is never ranhas_finished()will return False. -
PipelineRunnernow supports SIGTERM. If received, the runner will be cancelled.
-
Fixed an issue where
BaseInputTransportandBaseOutputTransportwhere stopping push tasks before pushingEndFrameframes could cause the bots to get stuck. -
Fixed an error closing local audio transports.
-
Fixed an issue with Deepgram TTS that was introduced in the previous release.
-
Fixed
AnthropicLLMServiceinterruptions. If an interruption occurred, ausermessage could be appended after the previoususermessage. Anthropic does not allow that because it requires alternateuserandassistantmessages.
-
The
BaseInputTransportdoes not pull audio frames from sub-classes any more. Instead, sub-classes now push audio frames into a queue in the base class. Also,DailyInputTransportnow pushes audio frames every 20ms instead of 10ms. -
Remove redundant camera input thread from
DailyInputTransport. This should improve performance a little bit when processing participant videos. -
Load Cartesia voice on startup.
-
Added WebsocketServerTransport. This will create a websocket server and will read messages coming from a client. The messages are serialized/deserialized with protobufs. See
examples/websocket-serverfor a detailed example. -
Added function calling (LLMService.register_function()). This will allow the LLM to call functions you have registered when needed. For example, if you register a function to get the weather in Los Angeles and ask the LLM about the weather in Los Angeles, the LLM will call your function. See https://platform.openai.com/docs/guides/function-calling
-
Added new
LangchainProcessor. -
Added Cartesia TTS support (https://cartesia.ai/)
-
Fixed SileroVAD frame processor.
-
Fixed an issue where
camera_out_enabledwould cause the highg CPU usage if no image was provided.
- Removed unnecessary audio input tasks.
-
Exposed
on_dialin_readyfor Daily transport SIP endpoint handling. This notifies when the Daily room SIP endpoints are ready. This allows integrating with third-party services like Twilio. -
Exposed Daily transport
on_app_messageevent. -
Added Daily transport
on_call_state_updatedevent. -
Added Daily transport
start_recording(),stop_recordingandstop_dialout.
-
Added
PipelineParams. This replaces theallow_interruptionsargument inPipelineTaskand will allow future parameters in the future. -
Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
-
GoogleLLMService
api_keyargument is now mandatory.
-
Daily tranport
dialin-readydoesn't not block anymore and it now handles timeouts. -
Fixed AzureLLMService.
- Fixed an issue handling Daily transport
dialin-readyevent.
-
Added Daily transport
start_dialout()to be able to make phone or SIP calls. See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout -
Added Daily transport support for dial-in use cases.
-
Added Daily transport events:
on_dialout_connected,on_dialout_stopped,on_dialout_errorandon_dialout_warning. See https://reference-python.daily.co/api_reference.html#daily.EventHandler
-
Added vision support to Anthropic service.
-
Added
WakeCheckFilterwhich allows you to pass information downstream only if you say a certain phrase/word.
-
FrameSerializer.serialize()andFrameSerializer.deserialize()are nowasync. -
Filterhas been renamed toFrameFilterand it's now underprocessors/filters.
-
Fixed Anthropic service to use new frame types.
-
Fixed an issue in
LLMUserResponseAggregatorandUserResponseAggregatorthat would cause frames after a brief pause to not be pushed to the LLM. -
Clear the audio output buffer if we are interrupted.
-
Re-add exponential smoothing after volume calculation. This makes sure the volume value being used doesn't fluctuate so much.
- In order to improve interruptions we now compute a loudness level using pyloudnorm. The audio coming WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm applied to the signal, however we don't do that on our local PyAudio signals. This means that currently incoming audio from PyAudio is kind of broken. We will fix it in future releases.
-
Fixed an issue where
StartInterruptionFramewould causeLLMUserResponseAggregatorto push the accumulated text causing the LLM respond in the wrong task. TheStartInterruptionFrameshould not trigger any new LLM response because that would be spoken in a different task. -
Fixed an issue where tasks and threads could be paused because the executor didn't have more tasks available. This was causing issues when cancelling and recreating tasks during interruptions.
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorinternal messages are now exposed through themessagesproperty.
- Fixed an issue where
LLMAssistantResponseAggregatorwas not accumulating the full response but short sentences instead. If there's an interruption we only accumulate what the bot has spoken until now in a long response as well.
- Fixed an issue in
DailyOuputTransportwhere transport messages were not being sent.
-
Added
google.generativeaimodel support, including vision. This newgoogleservice defaults to usinggemini-1.5-flash-latest. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added vision support to
openaiservice. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context.
-
Added
VADParamsso you can control voice confidence level and others. -
VADAnalyzernow uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low.
-
Fixed an issue where TTSService was not pushing TextFrames downstream.
-
Fixed issues with Ctrl-C program termination.
-
Fixed an issue that was causing
StopTaskFrameto actually not exit thePipelineTask.
-
DailyTransport: don't publish camera and audio tracks if not enabled. -
Fixed an issue in
BaseInputTransportthat was causing frames pushed downstream not pushed in the right order.
- Quick hot fix for receiving
DailyTransportMessage.
-
Added
DailyTransporteventon_participant_left. -
Added support for receiving
DailyTransportMessage.
-
Images are now resized to the size of the output camera. This was causing images not being displayed.
-
Fixed an issue in
DailyTransportthat would not allow the input processor to shutdown if no participant ever joined the room. -
Fixed base transports start and stop. In some situation processors would halt or not shutdown properly.
-
MoondreamServiceargumentmodel_idis nowmodel. -
VADAnalyzerarguments have been renamed for more clarity.
-
Fixed an issue with
DailyInputTransportandDailyOutputTransportthat could cause some threads to not start properly. -
Fixed
STTService. Addmax_silence_secsandmax_buffer_secsto handle better what's being passed to the STT service. Also add exponential smoothing to the RMS. -
Fixed
WhisperSTTService. Addno_speech_probto avoid garbage output text.
- Added
DailyTranscriptionSettingsto be able to specify transcription settings much easier (e.g. language).
-
Updated
simple-chatbotwith Spanish. -
Add missing dependencies in some of the examples.
- Allow stopping pipeline tasks with new
StopTaskFrame.
- TTS, STT and image generation service now use
AsyncGenerator.
DailyTransport: allow registering for participant transcriptions even if input transport is not initialized yet.
- Updated
storytelling-chatbot.
-
Added Intel GPU support to
MoondreamService. -
Added support for sending transport messages (e.g. to communicate with an app at the other end of the transport).
-
Added
FrameProcessor.push_error()to easily send anErrorFrameupstream.
- Fixed Azure services (TTS and image generation).
- Updated
simple-chatbot,moondream-chatbotandtranslation-chatbotexamples.
Many things have changed in this version. Many of the main ideas such as frames, processors, services and transports are still there but some things have changed a bit.
-
Frames describe the basic units for processing. For example, text, image or audio frames. Or control frames to indicate a user has started or stopped speaking. -
FrameProcessors process frames (e.g. they convert aTextFrameto anImageRawFrame) and push new frames downstream or upstream to their linked peers. -
FrameProcessors can be linked together. The easiest wait is to use thePipelinewhich is a container for processors. Linking processors allow frames to travel upstream or downstream easily. -
Transports are a way to send or receive frames. There can be local transports (e.g. local audio or native apps), network transports (e.g. websocket) or service transports (e.g. https://daily.co). -
Pipelines are just a processor container for other processors. -
A
PipelineTaskknow how to run a pipeline. -
A
PipelineRunnercan run one or more tasks and it is also used, for example, to capture Ctrl-C from the user.
-
Added
FireworksLLMService. -
Added
InterimTranscriptionFrameand enable interim results inDailyTransporttranscriptions.
FalImageGenServicenow uses newfal_clientpackage.
-
FalImageGenService: useasyncio.to_threadto not block main loop when generating images. -
Allow
TranscriptionFrameafter an end frame (transcriptions can be delayed and received afterUserStoppedSpeakingFrame).
- Add
use_cpuargument toMoondreamService.
-
Added
FalImageGenService.InputParams. -
Added
URLImageFrameandUserImageFrame. -
Added
UserImageRequestFrameand allow requesting an image from a participant. -
Added base
VisionServiceandMoondreamService
-
Don't pass
image_sizetoImageGenService, images should have their own size. -
ImageFramenow receives a tuple(width,height)to specify the size. -
on_first_other_participant_joinednow gets a participant argument.
- Check if camera, speaker and microphone are enabled before writing to them.
DailyTransportonly subscribe to desired participant video track.
-
Use
camera_bitrateandcamera_framerate. -
Increase
camera_framerateto 30 by default.
- Fixed
LocalTransport.read_audio_frames.
- Added project optional dependencies
[silero,openai,...].
-
Moved thransports to its own directory.
-
Use
OPENAI_API_KEYinstead ofOPENAI_CHATGPT_API_KEY.
- Don't write to microphone/speaker if not enabled.
-
Added live translation example.
-
Fix foundational examples.
- Added
storybotandchatbotexamples.
Initial public release.