Releases: pipecat-ai/pipecat
v0.0.106
Added
-
Added optional
servicefield toServiceUpdateSettingsFrame(and its subclassesLLMUpdateSettingsFrame,TTSUpdateSettingsFrame,STTUpdateSettingsFrame) to target a specific service instance. Whenserviceis set, only the matching service applies the settings; others forward the frame unchanged. This enables updating a single service when multiple services of the same type exist in the pipeline.
(PR #4004) -
Added
sip_providerandroom_geoparameters toconfigure()in the Daily runner. These convenience parameters let callers specify a SIP provider name and geographic region directly without manually constructingDailyRoomPropertiesandDailyRoomSipParams.
(PR #4005) -
Added
PerplexityLLMAdapterthat automatically transforms conversation messages to satisfy Perplexity's stricter API constraints (strict role alternation, no non-initial system messages, last message must be user/tool). Previously, certain conversation histories could cause Perplexity API errors that didn't occur with OpenAI (PerplexityLLMServicesubclassesOpenAILLMServicesince Perplexity uses an OpenAI-compatible API).
(PR #4009) -
Added DTMF input event support to the Daily transport. Incoming DTMF tones are now received via Daily's
on_dtmf_eventcallback and pushed into the pipeline asInputDTMFFrame, enabling bots to react to keypad presses from phone callers.
(PR #4047) -
Added
WakePhraseUserTurnStartStrategyfor triggering user turns based on wake phrases, with support forsingle_activationmode. DeprecatesWakeCheckFilter.
(PR #4064) -
Added
default_user_turn_start_strategies()anddefault_user_turn_stop_strategies()helper functions for composing custom strategy lists.
(PR #4064)
Changed
-
Changed tool result JSON serialization to use
ensure_ascii=False, preserving UTF-8 characters instead of escaping them. This reduces context size and token usage for non-English languages.
(PR #3457) -
OpenAIRealtimeSTTService'snoise_reductionparameter is now part ofOpenAIRealtimeSTTSettings, making it runtime-updatable viaSTTUpdateSettingsFrame. The directnoise_reductioninit argument is deprecated as of 0.0.106.
(PR #3991) -
Updated
sarvamaidependency from0.1.26a2(alpha) to0.1.26(stable release).
(PR #3997) -
SimliVideoServicenow extendsAIServiceinstead ofFrameProcessor, aligning it with the HeyGen and Tavus video services. It supportsSimliVideoService.Settings(...)for configuration and usesstart()/stop()/cancel()lifecycle methods. Existing constructor usage (api_key,face_id, etc.) remains unchanged.
(PR #4001) -
Update
pipecat-ai-small-webrtc-prebuiltto2.4.0.
(PR #4023) -
Nova Sonic assistant text transcripts are now delivered in real-time using speculative text events instead of delayed final text events. Previously, assistant text only arrived after all audio had finished playing, causing laggy transcripts in client UIs. Speculative text arrives before each audio chunk, providing text synchronized with what the bot is saying. This also simplifies the internal text handling by removing the interruption re-push hack and assistant text buffer.
(PR #4042) -
Updated
daily-pythondependency to 0.25.0.
(PR #4047) -
Added
enable_dialoutparameter toconfigure()inpipecat.runner.dailyto support dial-out rooms. Also narrowed misleadingOptionaltype hints and deduplicated token expiry calculation.
(PR #4048) -
Extended
ProcessFrameResultto stop strategies, allowing a stop strategy to short-circuit evaluation of subsequent strategies by returningSTOP.
(PR #4064) -
GradiumSTTServicenow takes both anencodingandsample_rateconstructor argument which is assmebled in the class to form theinput_format. PCM accepts8000,16000, and24000Hz sample rates.
(PR #4066) -
Improved
GradiumSTTServicetranscription accuracy by reworking how text fragments are accumulated and finalized. Previously, trailing words could be dropped when the server'sflushedresponse arrived before all text tokens were delivered. The service now uses a short aggregation delay after flush to capture trailing tokens, producing complete utterances.
(PR #4066)
Deprecated
-
SimliVideoService.InputParamsis deprecated. Use the direct constructor parametersmax_session_length,max_idle_time, andenable_logginginstead.
(PR #4001) -
Deprecated
LocalSmartTurnAnalyzerV2andLocalCoreMLSmartTurnAnalyzer. UseLocalSmartTurnAnalyzerV3instead. Instantiating these analyzers will now emit aDeprecationWarning.
(PR #4012) -
Deprecated
WakeCheckFilterin favor ofWakePhraseUserTurnStartStrategy.
(PR #4064)
Fixed
-
Fixed an issue where the default model for
OpenAILLMServiceandAzureLLMServicewas mistakenly reverted togpt-4o. The defaults are now restored togpt-4.1.
(PR #4000) -
Fixed a race condition where
EndTaskFramecould cause the pipeline to shut down before in-flight frames (e.g. LLM function call responses) finished processing.EndTaskFrameandStopTaskFramenow flow through the pipeline asControlFrames, ensuring all pending work is flushed before shutdown begins.CancelTaskFrameandInterruptionTaskFrameremain immediate (SystemFrame).
(PR #4006) -
Fixed
ParallelPipelinedropping or misordering frames during lifecycle synchronization. Buffered frames are now flushed in the correct order relative to synchronization frames (StartFramegoes first,EndFrame/CancelFramego after), and frames added to the buffer during flush are also drained.
(PR #4007) -
Fixed
TTSServicepotentially canceling in-flight audio during shutdown. The stop sequence now waits for all queued audio contexts to finish processing before canceling the stop frame task.
(PR #4007) -
Fixed
Languageenum values (e.g.Language.ES) not being converted to service-specific codes when passed viasettings=Service.Settings(language=Language.ES)at init time. This caused API errors (e.g. 400 from Rime) because the raw enum was sent instead of the expected language code (e.g."spa"). Runtime updates viaUpdateSettingsFramewere unaffected. The fix centralizes conversion in the baseTTSServiceandSTTServiceclasses so all services handle this consistently.
(PR #4024) -
Fixed
DeepgramSTTServiceignoring thebase_urlscheme when usingws://orhttp://. Previously these were silently overwritten withwss:///https://, breaking air-gapped or private deployments that don't use TLS. All scheme choices (wss://,https://,ws://,http://, or bare hostname) are now respected.
(PR #4026) -
Fixed
LLMSwitcher.register_function()andregister_direct_function()not accepting or forwarding thetimeout_secsparameter.
(PR #4037) -
Fixed empty user transcriptions in Nova Sonic causing spurious interruptions. Previously, an empty transcription could trigger an interruption of the assistant's response even though the user hadn't actually spoken.
(PR #4042) -
Fixed
SonioxSTTServiceandOpenAIRealtimeSTTServicecrash when language parameters contain plain strings instead ofLanguageenum values.
(PR #4046) -
Fixed premature user turn stops caused by late transcriptions arriving between turns. A stale transcript from the previous turn could persist into the next turn and trigger a stop before the current turn's real transcript arrived. Stop strategies are now reset at both turn start and turn stop to prevent state from leaking across turn boundaries.
(PR #4057) -
Fixed raw language strings like
"de-DE"silently failing when passed to TTS/STT services (e.g. ElevenLabs producing no audio). Raw strings now go through the sameLanguageenum resolution as enum values, so regional codes like"de-DE"are properly converted to service-expected formats like"de". Unrecognized strings log a warning instead of failing silently.
(PR #4058) -
Fixed Deepgram STT list-type settings (
keyterm,keywords,search,redact,replace) being stringified instead of passed as lists to the SDK, which caused them to be sent as literal strings (e.g."['pipecat']") in the nWebSocket query params.
(PR #4063) -
...
v0.0.105
Added
-
Added concurrent audio context support:
CartesiaTTSServicecan now synthesize the next sentence while the previous one is still playing, by settingpause_frame_processing=Falseand routing each sentence through its own audio context queue.
(PR #3804) -
Added custom video track support to Daily transport. Use
video_out_destinationsinDailyParamsto publish multiple video tracks simultaneously, mirroring the existingaudio_out_destinationsfeature.
(PR #3831) -
Added
ServiceSwitcherStrategyFailoverthat automatically switches to the next service when the active service reports a non-fatal error. Recovery policies can be implemented via theon_service_switchedevent handler.
(PR #3861) -
Added optional
timeout_secsparameter toregister_function()andregister_direct_function()for per-tool function call timeout control, overriding the globalfunction_call_timeout_secsdefault.
(PR #3915) -
Added
cloud-audio-onlyrecording option to Daily transport'senable_recordingproperty.
(PR #3916) -
Wired up
system_instructioninBaseOpenAILLMService,AnthropicLLMService, andAWSBedrockLLMServiceso it works as a default system prompt, matching the behavior of the Google services. This enables sharing a singleLLMContextacross multiple LLM services, where each service provides its own system instruction independently.llm = OpenAILLMService( api_key=os.getenv("OPENAI_API_KEY"), system_instruction="You are a helpful assistant.", ) context = LLMContext() @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): context.add_message({"role": "user", "content": "Please introduce yourself."}) await task.queue_frames([LLMRunFrame()])
(PR #3918)
-
Added
vad_thresholdparameter toAssemblyAIConnectionParamsfor configuring voice activity detection sensitivity in U3 Pro. Aligning this with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone" where AssemblyAI transcribes speech that VAD hasn't detected yet.
(PR #3927) -
Added
push_empty_transcriptsparameter toBaseWhisperSTTServiceandOpenAISTTServiceto allow empty transcripts to be pushed downstream asTranscriptionFrameinstead of discarding them (the default behavior). This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.
(PR #3930) -
LLM services (
BaseOpenAILLMService,AnthropicLLMService,AWSBedrockLLMService) now log a warning when bothsystem_instructionand a system message in the context are set. The constructor'ssystem_instructiontakes precedence.
(PR #3932) -
Runtime settings updates (via
STTUpdateSettingsFrame) now work for AWS Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and Soniox STT services. Previously, changing settings at runtime only stored the new values without reconnecting.
(PR #3946) -
Exposed
on_summary_appliedevent onLLMAssistantAggregator, allowing users to listen for context summarization events without accessing private members.
(PR #3947) -
Deepgram Flux STT settings (
keyterm,eot_threshold,eager_eot_threshold,eot_timeout_ms) can now be updated mid-stream viaSTTUpdateSettingsFramewithout triggering a reconnect. The new values are sent to Deepgram as a Configure WebSocket message on the existing connection.
(PR #3953) -
Added
system_instructionparameter torun_inferenceacross all LLM services, allowing callers to override the system prompt for one-shot inference calls. Used by_generate_summaryto pass the summarization prompt cleanly.
(PR #3968)
Changed
-
Audio context management (previously in
AudioContextTTSService) is now built intoTTSService. All WebSocket providers (cartesia,elevenlabs,asyncai,inworld,rime,gradium,resembleai) now inherit fromWebsocketTTSServicedirectly. Word-timestamp baseline is set automatically on the first audio chunk of each context instead of requiring each provider to callstart_word_timestamps()in their receive loop.
(PR #3804) -
Daily transport now uses
CustomVideoSource/CustomVideoTrackinstead ofVirtualCameraDevicefor the default camera output, mirroring how audio already works withCustomAudioSource/CustomAudioTrack.
(PR #3831) -
⚠️ UpdatedDeepgramSTTServiceto usedeepgram-sdkv6. TheLiveOptionsclass was removed from the SDK and is now provided by pipecat directly; import it frompipecat.services.deepgram.sttinstead ofdeepgram.
(PR #3848) -
ServiceSwitcherStrategybase class now provides ahandle_error()hook for subclasses to implement error-based switching.ServiceSwitcherdefaults toServiceSwitcherStrategyManualandstrategy_typeis now optional.
(PR #3861) -
Support for Voice Focus 2.0 models.
- Updated
aic-sdkto~=2.1.0to support Voice Focus 2.0 models. - Cleaned unused
ParameterFixedErrorexception handling inAICFilter
parameter setup.
(PR #3889)
- Updated
-
max_context_tokensandmax_unsummarized_messagesinLLMAutoContextSummarizationConfig(and deprecatedLLMContextSummarizationConfig) can now be set toNoneindependently to disable that summarization threshold. At least one must remain set.
(PR #3914) -
⚠️ Removedformatted_finalsandword_finalization_max_wait_timefromAssemblyAIConnectionParamsas these were v2 API parameters not supported in v3. Clarified thatformat_turnsonly applies to Universal-Streaming models; U3 Pro has automatic formatting built-in.
(PR #3927) -
Changed
DeepgramTTSServiceto send a Clear message on interruption instead of disconnecting and reconnecting the WebSocket, allowing the connection to persist throughout the session.
(PR #3958) -
Re-added
enhancement_levelsupport toAICFilterwith runtimeFilterEnableFramecontrol, applyingProcessorParameter.BypassandProcessorParameter.EnhancementLeveltogether.
(PR #3961) -
Updated
daily-pythondependency from~=0.23.0to~=0.24.0.
(PR #3970) -
Updated
FishAudioTTSServicedefault model froms1tos2-pro, matching Fish Audio's latest recommended model for improved quality and speed.
(PR #3973) -
AzureSTTServiceregionparameter is now optional whenprivate_endpointis provided. AValueErroris raised if neither is given, and a warning is logged if both are provided (private_endpointtakes priority).
(PR #3974)
Deprecated
-
Deprecated
AudioContextTTSServiceandAudioContextWordTTSService. SubclassWebsocketTTSServicedirectly instead; audio context management is now part of the baseTTSService.- Deprecated
WordTTSService,WebsocketWordTTSService, andInterruptibleWordTTSService. Word timestamp logic is now always active inTTSServiceand no longer needs to be opted into via a subclass.
(PR #3804)
- Deprecated
-
Deprecated
pipecat.services.google.llm_vertex,pipecat.services.google.llm_openai, andpipecat.services.google.gemini_live.llm_vertexmodules. Usepipecat.services.google.vertex.llm,pipecat.services.google.openai.llm, andpipecat.services.google.gemini_live.vertex.llminstead. The old import paths still work but will emit aDeprecationWarning.
(PR #3980)
Removed
⚠️ Removedsupports_word_timestampsparameter fromTTSService.__init__(). Word timestamp logic is now always active. Remove this argument from any custom subclasssuper().__init__()calls.
(PR #3804)
Fixed
-
Fixed
DeepgramSTTServicekeepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicitKeepAlivemessages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout.
(PR #3848) -
Fixed
BufferError: Existing exports of data: object cannot be re-sizedinAICFiltercaused by holding amemoryviewon the mutable audio buffer across async yield points.
(PR #3889) -
Fixed TTS context not being appended to the assistant message history when using
TTSSpeakFramewithappend_to_context=Truewith some TTS providers.
(PR [#3936](https://githu...
v0.0.104
Added
-
Added
TextAggregationMetricsDatametric measuring the time from the first LLM token to the first complete sentence, representing the latency cost of sentence aggregation in the TTS pipeline.
(PR #3696) -
Added support for using strongly-typed objects instead of dicts for updating service settings at runtime.
Instead of, say:
await task.queue_frame( STTUpdateSettingsFrame(settings={"language": Language.ES}) )
you'd do:
await task.queue_frame( STTUpdateSettingsFrame(delta=DeepgramSTTSettings(language=Language.ES)) )
Each service now vends strongly-typed classes like
DeepgramSTTSettingsrepresenting the service's runtime-updatable settings.
(PR #3714) -
Added support for specifying private endpoints for Azure Speech-to-Text, enabling use in private networks behind firewalls.
(PR #3764) -
Added
LemonSliceTransportandLemonSliceApito support adding real-time LemonSlice Avatars to any Daily room.
(PR #3791) -
Added
output_mediumparameter toAgentInputParamsandOneShotInputParamsin Ultravox service to control initial output medium (text or voice) at call creation time.
(PR #3806) -
Added
TurnMetricsDataas a generic metrics class for turn detection, with e2e processing time measurement.KrispVivaTurnnow emitsTurnMetricsDatawithe2e_processing_time_mstracking the interval from VAD speech-to-silence transition to turn completion.
(PR #3809) -
Added
on_audio_context_interrupted()andon_audio_context_completed()callbacks toAudioContextTTSService. Subclasses can override these to perform provider-specific cleanup instead of overriding_handle_interruption().
(PR #3814) -
Added
on_summary_appliedevent toLLMContextSummarizerfor observability, providing message counts before and after context summarization.
(PR #3855) -
Added
summary_message_templatetoLLMContextSummarizationConfigfor customizing how summaries are formatted when injected into context (e.g., wrapping in XML tags).
(PR #3855) -
Added
summarization_timeouttoLLMContextSummarizationConfig(default 120s) to prevent hung LLM calls from permanently blocking future summarizations.
(PR #3855) -
Added optional
llmfield toLLMContextSummarizationConfigfor routing summarization to a dedicated LLM service (e.g., a cheaper/faster model) instead of the pipeline's primary model.
(PR #3855) -
Add AssemblyAI u3-rt-pro model support with built-in turn detection mode
(PR #3856) -
Added
LLMSummarizeContextFrameto trigger on-demand context summarization from anywhere in the pipeline (e.g. a function call tool). Accepts an optionalconfig: LLMContextSummaryConfigto override summary generation settings per request.
(PR #3863) -
Added
LLMContextSummaryConfig(summary generation params:target_context_tokens,min_messages_after_summary,summarization_prompt) andLLMAutoContextSummarizationConfig(auto-trigger thresholds:max_context_tokens,max_unsummarized_messages, plus a nestedsummary_config). These replace the monolithicLLMContextSummarizationConfig.
(PR #3863) -
Added support for the
speed_alphaparameter to thearcanamodel inRimeTTSService.
(PR #3873) -
Added
ClientConnectedFrame, a newSystemFramepushed by all transports (Daily, LiveKit, FastAPI WebSocket, WebSocket Server, SmallWebRTC, HeyGen, Tavus) when a client connects. Enables observers to track transport readiness timing.
(PR #3881) -
Added
StartupTimingObserverfor measuring how long each processor'sstart()method takes during pipeline startup. Also measures transport readiness — the time fromStartFrameto first client connection — via theon_transport_timing_reportevent.
(PR #3881) -
Added
BotConnectedFramefor SFU transports andon_transport_timing_reportevent toStartupTimingObserverwith bot and client connection timing.
(PR #3881) -
Added optional
directionparameter toPipelineTask.queue_frame()andPipelineTask.queue_frames(), allowing frames to be pushed upstream from the end of the pipeline.
(PR #3883) -
Added
on_latency_breakdownevent toUserBotLatencyObserverproviding per-service TTFB, text aggregation, user turn duration, and function call latency metrics for each user-to-bot response cycle.
(PR #3885) -
Added
on_first_bot_speech_latencyevent toUserBotLatencyObservermeasuring the time from client connection to first bot speech. Anon_latency_breakdownis also emitted for this first speech event.
(PR #3885) -
Added
broadcast_interruption()toFrameProcessor. This method pushes anInterruptionFrameboth upstream and downstream directly from the calling processor, avoiding the round-trip through the pipeline task thatpush_interruption_task_frame_and_wait()required.
(PR #3896)
Changed
-
Added
text_aggregation_modeparameter toTTSServiceand all TTS subclasses with a newTextAggregationModeenum (SENTENCE,TOKEN). All text now flows through text aggregators regardless of mode, enabling pattern detection and tag handling in TOKEN mode.
(PR #3696) -
⚠️ Refactored runtime-updatable service settings to use strongly-typed classes (TTSSettings,STTSettings,LLMSettings, and service-specific subclasses) instead of plain dicts. Each service's_settingsnow holds these strongly-typed objects. For service maintainers, see changes in COMMUNITY_INTEGRATIONS.md.
(PR #3714) -
Word timestamp support has been moved from
WordTTSServiceintoTTSServicevia a newsupports_word_timestampsparameter. Services that previously extendedWordTTSService,AudioContextWordTTSService, orWebsocketWordTTSServicenow passsupports_word_timestamps=Trueto their parent__init__instead.
(PR #3786) -
Improved Ultravox TTFB measurement accuracy by using VAD speech end time instead of
UserStoppedSpeakingFrametiming.
(PR #3806) -
Aligned
UltravoxRealtimeLLMServiceframe handling with OpenAI/Gemini realtime services: addedInterruptionFramehandling with metrics cleanup, processing metrics at response boundaries, and improved agent transcript handling for both voice and text output modalities.
(PR #3806) -
Updated
OpenAIRealtimeLLMServicedefault model togpt-realtime-1.5.
(PR #3807) -
Added
api_keyparameter toKrispVivaSDKManager,KrispVivaTurn, andKrispVivaFilterfor Krisp SDK v1.6.1+ licensing. Falls back toKRISP_VIVA_API_KEYenvironment variable.
(PR #3809) -
Bumped
nltkminimum version from 3.9.1 to 3.9.3 to resolve a security vulnerability.
(PR #3811) -
ServiceSettingsUpdateFrames are nowUninterruptibleFrames. Generally speaking, you don't want a user interruption to prevent a service setting change from going into effect. Note that you usually don't useServiceSettingsUpdateFramedirectly, you use one of its subclasses:LLMUpdateSettingsFrameTTSUpdateSettingsFrameSTTUpdateSettingsFrame
(PR #3819)
-
Updated context summarization to use
userrole instead ofassistantfor summary messages.
(PR #3855) -
Rename
AssemblyAISTTServiceparametermin_end_of_turn_silence_when_confidentparameter tomin_turn_silence(old name still supported with deprecation warning)
(PR #3856) -
⚠️ RenamedLLMAssistantAggregatorParamsfields:enable_context_summarization→enable_auto_context_summarizationandcontext_summarization_config→auto_context_summarization_config(now acceptsLLMAutoContextSummarizationConfig). The old names still work with aDeprecationWarningfor one release cycle.
(PR #3863) -
ElevenLabsRealtimeSTTServicenow setsTranscriptionFrame.finalizedtoTruewhen usingCommitStrategy.MANUAL.
(PR #3865) -
Updated numba version pin from == to >=0.61.2
(PR #3868) -
Updated tracing code to use
ServiceSettingsdataclass API (given_fields(), attribute access) instead of dict-style access (.items(),in, subscript).
(PR [...
v0.0.103
Added
-
Added
"timestampTransportStrategy": "ASYNC"toInworldAITTSService. This allows timestamps info to trail audio chunks arrival, resulting in much better first audio chunk latency
(PR #3625) -
Added model-specific
InputParamstoRimeTTSService: arcana params (repetition_penalty,temperature,top_p) and mistv2 params (no_text_normalization,save_oovs,segment). Model, voice, and param changes now trigger WebSocket reconnection.
(PR #3642) -
Added
write_transport_frame()hook toBaseOutputTransportallowing transport subclasses to handle custom frame types that flow through the audio queue.
(PR #3719) -
Added
DailySIPTransferFrameandDailySIPReferFrameto the Daily transport. These frames queue SIP transfer and SIP REFER operations with audio, so the operation executes only after the bot finishes its current utterance.
(PR #3719) -
Added keepalive support to
SarvamSTTServiceto prevent idle connection timeouts (e.g. when used behind aServiceSwitcher).
(PR #3730) -
Added
UserIdleTimeoutUpdateFrameto enable or disable user idle detection at runtime by updating the timeout dynamically.
(PR #3748) -
Added
broadcast_sibling_idfield to the baseFrameclass. This field is automatically set bybroadcast_frame()andbroadcast_frame_instance()to the ID of the paired frame pushed in the opposite direction, allowing receivers to identify broadcast pairs.
(PR #3774) -
Added
ignored_sourcesparameter toRTVIObserverParamsandadd_ignored_source()/remove_ignored_source()methods toRTVIObserverto suppress RTVI messages from specific pipeline processors (e.g. a silent evaluation LLM).
(PR #3779) -
Added
DeepgramSageMakerTTSServicefor running Deepgram TTS models deployed on AWS SageMaker endpoints via HTTP/2 bidirectional streaming. Supports the Deepgram TTS protocol (Speak, Flush, Clear, Close), interruption handling, and per-turn TTFB metrics.
(PR #3785)
Changed
-
⚠️ RimeTTSServicenow defaults tomodel="arcana"and thewss://users-ws.rime.ai/ws3endpoint.InputParamsdefaults changed from mistv2-specific values toNone— only explicitly-set params are sent as query params.
(PR #3642) -
AICFilternow shares read-only AIC models via a singletonAICModelManager
inaic_filter.py.- Multiple filters using the same model path or
(model_id, model_download_dir)share one loaded model, with reference counting and concurrent load deduplication. - Model file I/O runs off the event loop so the filter does not block.
(PR #3684)
- Multiple filters using the same model path or
-
Added
X-User-AgentandX-Request-Idheaders toInworldTTSServicefor better traceability.
(PR #3706) -
DailyUpdateRemoteParticipantsFrameis no longer deprecated and is now queued with audio like other transport frames.
(PR #3719) -
Bumped Pillow dependency upper bound from
<12to<13to allow Pillow 12.x.
(PR #3728) -
Moved STT keepalive mechanism from
WebsocketSTTServiceto theSTTServicebase class, allowing any STT service (not just websocket-based ones) to use idle-connection keepalive via thekeepalive_timeoutandkeepalive_intervalparameters.
(PR #3730) -
Improved audio context management in
AudioContextTTSServiceby moving context ID tracking to the base class and addingreuse_context_id_within_turnparameter to control concurrent TTS request handling.- Added helper methods:
has_active_audio_context(),get_active_audio_context_id(),remove_active_audio_context(),reset_active_audio_context() - Simplified Cartesia, ElevenLabs, Inworld, Rime, AsyncAI, and Gradium TTS implementations by removing duplicate context management code
(PR #3732)
- Added helper methods:
-
UserIdleControlleris now always created with a default timeout of 0 (disabled). Theuser_idle_timeoutparameter changed fromOptional[float] = Nonetofloat = 0inUserTurnProcessor,LLMUserAggregatorParams, andUserIdleController.
(PR #3748) -
Change the version specifier from
>=0.2.8to~=0.2.8for thespeechmatics-voicepackage to ensure compatibility with future patch versions.
(PR #3761) -
Updated
InworldTTSServiceandInworldHttpTTSServiceto useASYNCtimestamp transport strategy by default
(PR #3765) -
Added
start_timeandend_timeparameters tostart_ttfb_metrics(),stop_ttfb_metrics(),start_processing_metrics(), andstop_processing_metrics()inFrameProcessorandFrameProcessorMetrics, allowing custom timestamps for metrics measurement.STTServicenow uses these instead of custom TTFB tracking.
(PR #3776) -
Updated default Anthropic model from
claude-sonnet-4-5-20250929toclaude-sonnet-4-6.
(PR #3792)
Deprecated
- Deprecated unused
Traceable,@traceable,@traced, andAttachmentStrategyinpipecat.utils.tracing.class_decorators. This module will be removed in a future release.
(PR #3733)
Fixed
-
Fixed race condition where
RTVIObservercould send messages beforeDailyTransportjoin completed. Outbound messages are now queued & delivered after the transport is ready.
(PR #3615) -
Fixed async generator cleanup in OpenAI LLM streaming to prevent
AttributeErrorwith uvloop on Python 3.12+ (MagicStack/uvloop#699).
(PR #3698) -
Fixed
SmallWebRTCTransportinput audio resampling to properly handle all sample rates, including 8kHz audio.
(PR #3713) -
Fixed a race condition in
RTVIObserverwhere bot output messages could be sent before the bot-started-speaking event.
(PR #3718) -
Fixed Grok Realtime
session.updatedevent parsing failure caused by the API returning prefixed voice names (e.g."human_Ara"instead of"Ara").
(PR #3720) -
Fixed context ID reuse issue in
ElevenLabsTTSService,InworldTTSService,RimeTTSService,CartesiaTTSService,AsyncAITTSService, andPlayHTTTSService. Services now properly reuse the same context ID across multiplerun_tts()invocations within a single LLM turn, preventing context tracking issues and incorrect lifecycle signaling.
(PR #3729) -
Fixed word timestamp interleaving issue in
ElevenLabsTTSServicewhen processing multiple sentences within a single LLM turn.
(PR #3729) -
Fixed tracing service decorators executing the wrapped function twice when the function itself raised an exception (e.g., LLM rate limit, TTS timeout).
(PR #3735) -
Fixed
LLMUserAggregatorbroadcasting mute events beforeStartFramereaches downstream processors.
(PR #3737) -
Fixed
UserIdleControllerfalse idle triggers caused by gaps between user and bot activity frames. The idle timer now starts only afterBotStoppedSpeakingFrameand is suppressed during active user turns and function calls.
(PR #3744) -
Fixed incorrect
sample_rateassignment inTavusInputTransport._on_participant_audio_data(was usingaudio.audio_framesinstead ofaudio.sample_rate).
(PR #3768) -
Fixed
RTVIObservernot processing upstream-only frames. Previously, all upstream frames were filtered out to avoid duplicate messages from broadcasted frames. Now only upstream copies of broadcasted frames are skipped.
(PR #3774) -
Fixed mutable default arguments in
LLMContextAggregatorPair.__init__()that could cause shared state across instances.
(PR #3782) -
Fixed
DeepgramSageMakerSTTServiceto properly track finalize lifecycle usingrequest_finalize()/confirm_finalize()and useis_final(instead ofis_final and speech_final) for final transcription detection, matchingDeepgramSTTServicebehavior.
(PR #3784) -
Fixed a race condition in
AudioContextTTSServicewhere the audio context could time out between consecutive TTS requests within the same turn, causing audio to be discarded.
(PR #3787) -
Fixed
push_interruption_task_frame_and_wait()hanging indefinitely when theInterruptionFramedoes not reach the pipeline sink within the timeout. Added atimeoutkeyword argument to customize the wait duration.
(PR [#3789](https://github.com...
v0.0.102
Added
-
Added
ResembleAITTSServicefor text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.
(PR #3134) -
Added
UserBotLatencyObserverfor tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded asturn.user_bot_latency_secondsattributes on OpenTelemetry turn spans.
(PR #3355) -
Added
append_to_contextparameter toTTSSpeakFramefor conditional LLM context addition.- Allows fine-grained control over whether text should be added to conversation context
- Defaults to
Trueto maintain backward compatibility
(PR #3584)
-
Added TTS context tracking system with
context_idfield to trace audio generation through the pipeline.TTSAudioRawFrame,TTSStartedFrame,TTSStoppedFramenow includecontext_idAggregatedTextFrameandTTSTextFramenow includecontext_id- Enables tracking which TTS request generated specific audio chunks
(PR #3584)
-
Added support for Inworld TTS Websocket Auto Mode for improved latency
(PR #3593) -
Added new frames for context summarization:
LLMContextSummaryRequestFrameandLLMContextSummaryResultFrame.
(PR #3621) -
Added context summarization feature to automatically compress conversation history when conversation length limits (by token or message count) are reached, enabling efficient long-running conversations.
- Configure via
enable_context_summarization=TrueinLLMAssistantAggregatorParams - Customize behavior with
LLMContextSummarizationConfig(max tokens, thresholds, etc.) - Automatically preserves incomplete function call sequences during summarization
- See new examples:
examples/foundational/54-context-summarization-openai.pyand
examples/foundational/54a-context-summarization-google.py
(PR #3621)
- Configure via
-
Added RTVI function call lifecycle events (
llm-function-call-started,llm-function-call-in-progress,llm-function-call-stopped) with configurable security levels viaRTVIObserverParams.function_call_report_level. Supports per-function control over what information is exposed (DISABLED,NONE,NAME, orFULL).
(PR #3630) -
Added
RequestMetadataFrameand metadata handling forServiceSwitcherto ensure STT services correctly emitSTTMetadataFramewhen switching between services. Only the active service's metadata is propagated downstream, switching services triggers the newly active service to re-emit its metadata, and proper frame ordering is maintained at startup.
(PR #3637) -
Added
STTMetadataFrameto broadcast STT service latency information at pipeline start.- STT services broadcast P99 time-to-final-segment (
ttfs_p99_latency) to downstream processors - Turn stop strategies automatically configure their STT timeout from this metadata
- Developers can override
ttfs_p99_latencyvia constructor argument for custom deployments - Added measured P99 values for STT providers.
- See stt-benchmark to measure latency for your configuration
(PR #3637)
- STT services broadcast P99 time-to-final-segment (
-
Added support for
is_sandboxparameter inLiveAvatarNewSessionRequestto enable sandbox mode for HeyGen LiveAvatar sessions.
(PR #3653) -
Added support for
video_settingsparameter inLiveAvatarNewSessionRequestto configure video encoding (H264/VP8) and quality levels.
(PR #3653) -
Added
OpenAIRealtimeSTTServicefor real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.
(PR #3656) -
Added
bulbul:v3-betaTTS model support for Sarvam AI with temperature control and 25 new speaker voices.
(PR #3671) -
Added
saaras:v3STT model support for Sarvam AI with newmodeparameter (transcribe, translate, verbatim, translit, codemix) and prompt support.
(PR #3671) -
Added new OpenAI TTS voice options
marinandcedar.
(PR #3682) -
Added
UserMuteStartedFrameandUserMuteStoppedFramesystem frames, and correspondinguser-mute-started/user-mute-stoppedRTVI messages, so clients can observe when mute strategies activate or deactivate.
(PR #3687)
Changed
-
Updated all 30+ TTS service implementations to support context tracking with
context_id.- Services now generate and propagate context IDs through TTS frames
- Enables end-to-end tracing of TTS requests through the pipeline
(PR #3584)
-
⚠️ TTSService.run_tts()now requires acontext_idparameter for context tracking.- Custom TTS service implementations must update their
run_tts()signature - Before:
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]: - After:
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
(PR #3584)
- Custom TTS service implementations must update their
-
Simplified context aggregators to use
frame.append_to_contextflag instead of tracking internal state.- Cleaner logic in
LLMResponseAggregatorandLLMResponseUniversalAggregator - More consistent behavior across aggregator implementations
(PR #3584)
- Cleaner logic in
-
Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0
(PR #3593) -
Changed
KokoroTTSServiceto usekokoro-onnxinstead ofkokoroas the underlying TTS engine.
(PR #3612) -
Improved user turn stop timing in
TranscriptionUserTurnStopStrategyandTurnAnalyzerUserTurnStopStrategy.- Timeout now starts on
VADUserStoppedSpeakingFramefor tighter, more predictable timing - Added support for finalized transcripts (
TranscriptionFrame.finalized=True) to trigger earlier - Added fallback timeout for edge cases where transcripts arrive without VAD events
- Removed
InterimTranscriptionFramehandling (no longer affects timing)
(PR #3637)
- Timeout now starts on
-
Improved the accuracy of the
UserBotLatencyObserverandUserBotLatencyLogObserverby measuring from the time when the user actually starts speaking.
(PR #3637) -
⚠️ Renamedtimeoutparameter touser_speech_timeoutinTranscriptionUserTurnStopStrategy.
(PR #3637) -
Updated the
VADUserStartedSpeakingFrameto includestart_secsandtimestampandVADUserStoppedSpeakingFrameto includestop_secsandtimestamp, removing the need to separately handle theSpeechControlParamsFramefor VADParams values.
(PR #3637) -
⚠️ RenamedTranscriptionUserTurnStopStrategytoSpeechTimeoutUserTurnStopStrategy. The old name is deprecated and will be removed in a future release.
(PR #3637) -
AssemblyAISTTServicenow automatically configures optimal settings for manual turn detection whenvad_force_turn_endpoint=True. This setsend_of_turn_confidence_threshold=1.0andmax_turn_silence=2000by default, which disables model-based turn detection and reduces latency by relying on external VAD for turn endpoints. Warnings are logged if conflicting settings are detected.
(PR #3644) -
Upgraded the
pipecat-ai-small-webrtc-prebuiltpackage to v2.1.0.
(PR #3652) -
Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar integration, with VP8 as the default video encoding.
(PR #3653) -
⚠️ The defaultVADParamsstop_secsdefault is changing from0.8seconds to0.2seconds. This change both simplifies the developer experience and improves the performance of STT services. With a shorterstop_secsvalue, STT services using a local VAD can finalize sooner, resulting in faster transcription.SpeechTimeoutUserTurnStopStrategy: control how long to wait for additional user speech usinguser_speech_timeout(default: 0.6 sec).TurnAnalyzerUserTurnStopStrategy: the turn analyzer automatically adjusts the user wait time based on the audio input.
(PR #3659)
-
Moved interruption wait event from per-processor instance state to
InterruptionFrameitself. AddedInterruptionFrame.complete()to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume anInterruptionFramebefore it reaches the pipeline sink must callframe.complete()to avoid stalling `push_interruption_...
v0.0.101
Added
-
Additions for
AICFilterandAICVADAnalyzer:- Added model downloading support to
AICFilterwithmodel_idandmodel_download_dirparameters. - Added
model_pathparameter toAICFilterfor loading local.aicmodelfiles. - Added unit tests for
AICFilterandAICVADAnalyzer.
(PR #3408)
- Added model downloading support to
-
Added handling for
server_content.interruptedsignal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm.
(PR #3429) -
Added new
GenesysFrameSerializerfor the Genesys AudioHook WebSocket protocol, enabling bidirectional audio streaming between Pipecat pipelines and Genesys Cloud contact center.
(PR #3500) -
Added
reached_upstream_typesandreached_downstream_typesread-only properties toPipelineTaskfor inspecting current frame filters.
(PR #3510) -
Added
add_reached_upstream_filter()andadd_reached_downstream_filter()methods toPipelineTaskfor appending frame types.
(PR #3510) -
Added
UserTurnCompletionLLMServiceMixinfor LLM services to detect and filter incomplete user turns. When enabled viafilter_incomplete_user_turnsinLLMUserAggregatorParams, the LLM outputs a turn completion marker at the start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete long). Incomplete turns are suppressed, and configurable timeouts automatically re-prompt the user.
(PR #3518) -
Added
FrameProcessor.broadcast_frame_instance(frame)method to broadcast a frame instance by extracting its fields and creating new instances for each direction.
(PR #3519) -
PipelineTasknow automatically addsRTVIProcessorand registersRTVIObserverwhenenable_rtvi=True(default), simplifying pipeline setup.
(PR #3519) -
Added
RTVIProcessor.create_rtvi_observer()factory method for creating RTVI observers.
(PR #3519) -
Added
video_out_codecparameter toTransportParamsallowing configuration of the preferred video codec (e.g.,"VP8","H264","H265") for video output inDailyTransport.
(PR #3520) -
Added
locationparameter to Google TTS services (GoogleHttpTTSService,GoogleTTSService,GeminiTTSService) for regional endpoint support.
(PR #3523) -
Added new
PIPECAT_SMART_TURN_LOG_DATAenvironment variable, which causes Smart Turn input data to be saved to disk
(PR #3525) -
Added
result_callbackparameter toUserImageRequestFrameto support deferred function call results.
(PR #3571) -
Added
function_call_timeout_secsparameter toLLMServiceto configure timeout for deferred function calls (defaults to 10.0 seconds).
(PR #3571) -
Added
vad_analyzerparameter toLLMUserAggregatorParams. VAD analysis is now handled inside theLLMUserAggregatorrather than in the transport, keeping voice activity detection closer to where it is consumed. Thevad_analyzeronBaseInputTransportis now deprecated.context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( vad_analyzer=SileroVADAnalyzer(), ), )
(PR #3583)
-
Added
VADProcessorfor detecting speech in audio streams within a pipeline. PushesVADUserStartedSpeakingFrame,VADUserStoppedSpeakingFrame, andUserSpeakingFramedownstream based on VAD state changes.
(PR #3583) -
Added
VADControllerfor managing voice activity detection state and emitting speech events independently of transport or pipeline processors.
(PR #3583) -
Added local
PiperTTSServicefor offline text-to-speech using Piper voice models. The existing HTTP-based service has been renamed toPiperHttpTTSService.
(PR #3585) -
main()inpipecat.runner.runnow accepts an optionalargparse.ArgumentParser, allowing bots to define custom CLI arguments accessible viarunner_args.cli_args.
(PR #3590) -
Added
KokoroTTSServicefor local text-to-speech synthesis using the Kokoro-82M model.
(PR #3595)
Changed
-
Updated
AICFilterandAICVADAnalyzerto use aic-sdk ~= 2.0.1.
(PR #3408) -
Improved the STT TTFB (Time To First Byte) measurement, reporting the delay between when the user stops speaking and when the final transcription is received. Note: Unlike traditional TTFB which measures from a discrete request, STT services receive continuous audio input—so we measure from speech end to final transcript, which captures the latency that matters for voice AI applications. In support of this change, added
finalizedfield toTranscriptionFrameto indicate when a transcript is the final result for an utterance.
(PR #3495) -
SarvamSTTServicenow defaultsvad_signalsandhigh_vad_sensitivitytoNone(omitted from connection parameters), improving latency by ~300ms compared to the previous defaults.
(PR #3495) -
Changed frame filter storage from tuples to sets in
PipelineTask.
(PR #3510) -
Changed default Inworld TTS model from
inworld-tts-1toinworld-tts-1.5-max.
(PR #3531) -
FrameSerializernow subclasses fromBaseObjectto enable event support.
(PR #3560) -
Added support for TTFS in
SpeechmaticsSTTServiceand set the default mode toEXTERNALto support Pipecat-controlled VAD.- Changed dependency to
speechmatics-voice[smart]>=0.2.8
(PR #3562)
- Changed dependency to
-
⚠️ Changed function call handling to use timeout-based completion instead of immediate callback execution.- Function calls that defer their results (e.g.,
UserImageRequestFrame) now use a timeout mechanism - The
result_callbackis invoked automatically when the deferred operation completes or after timeout - This change affects examples using
UserImageRequestFrame- theresult_callbackshould now be passed to the frame instead of being called immediately
(PR #3571)
- Function calls that defer their results (e.g.,
-
Pipecat runner now uses
DAILY_ROOM_URLinstead ofDAILY_SAMPLE_ROOM_URL.
(PR #3582) -
Updates to
GradiumSTTService:- Now flushes pending transcriptions when VAD detects the user stopped speaking, improving response latency.
GradiumSTTServicenow supportsInputParamsfor configuringlanguageanddelay_in_framessettings.
(PR #3587)
Deprecated
⚠️ Deprecatedvad_analyzerparameter onBaseInputTransport. Passvad_analyzertoLLMUserAggregatorParamsinstead or useVADProcessorin the pipeline.
(PR #3583)
Removed
- Removed deprecated
AICFilterparameters:enhancement_level,voice_gain,noise_gate_enable.
(PR #3408)
Fixed
-
Fixed an issue where if you were using
OpenRouterLLMServicewith a Gemini model, it wouldn't handle multiple"system"messages as expected (and as we do inGoogleLLMService), which is to convert subsequent ones into"user"messages. Instead, the latest"system"message would overwrite the previous ones.
(PR #3406) -
Transports now properly broadcast
InputTransportMessageFrameframes both upstream and downstream instead of only pushing downstream.
(PR #3519) -
Fixed
FrameProcessor.broadcast_frame()to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.
(PR #3519) -
Fixed OpenAI LLM services to emit
ErrorFrameon completion timeout, enabling proper error handling and LLMSwitcher failover.
(PR #3529) -
Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese, etc.) were being unnecessarily escaped to Unicode sequences when function call occurred.
(PR #3536) -
Fixed how audio tracks are synchronized inside the
AudioBufferProcessorto fix timing issues where silence and audio were misaligned between user and bot buffers.
(PR #3541) -
Fixed race condition in
OpenAIRealtimeBetaLLMServicethat could cause an error when truncating the conversation....
v0.0.100
Added
-
Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)
(PR #3169) -
Added
CambTTSService, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.
(PR #3349) -
Added the
additional_headersparam toWebsocketClientParams, allowingWebsocketClientTransportto send custom headers on connect, for cases such as authentication.
(PR #3461) -
Added
UserIdleControllerfor detecting user idle state, integrated intoLLMUserAggregatorandUserTurnProcessorvia optionaluser_idle_timeoutparameter. Emitson_user_turn_idleevent for application-level handling. DeprecatedUserIdleProcessorin favor of the new compositional approach.
(PR #3482) -
Added
on_user_mute_startedandon_user_mute_stoppedevent handlers toLLMUserAggregatorfor tracking user mute state changes.
(PR #3490)
Changed
-
Enhanced interruption handling in
AsyncAITTSServiceby supporting multi-context WebSocket sessions for more robust context management.
(PR #3287) -
Throttle
UserSpeakingFrameto broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech.
(PR #3483)
Deprecated
- For consistency with other package names, we just deprecated
pipecat.turns.mute(introduced in Pipecat 0.0.99) in favor ofpipecat.turns.user_mute.
(PR #3479)
Fixed
-
Corrected TTFB metric calculation in
AsyncAIHttpTTSService.
(PR #3287) -
Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
AWSNovaSonicLLMServiceGeminiLiveLLMServiceOpenAIRealtimeLLMServiceGrokRealtimeLLMService
The issue was that these services weren't pushing
LLMTextFrames. Now they do.
(PR #3446) -
Fixed an issue where
on_user_turn_stop_timeoutcould fire while a user is talking when usingExternalUserTurnStrategies.
(PR #3454) -
Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior.
(PR #3455) -
Fixed
MinWordsUserTurnStartStrategyto not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them.
(PR #3462) -
Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport.
(PR #3480) -
Fixed a
Mem0MemoryServiceissue where passingasync_mode: truewas causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR #3484) -
Fixed
AWSNovaSonicLLMService.reset_conversation(), which would previously error out. Now it successfully reconnects and "rehydrates" from the context object.
(PR #3486) -
Fixed
AzureTTSServicetranscript formatting issues:- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters
(PR #3489)
-
Fixed an issue where
UninterruptibleFrameframes would not be preserved in some cases.
(PR #3494) -
Fixed memory leak in
LiveKitTransportwhenvideo_in_enabledisFalse.
(PR #3499) -
Fixed an issue in
AIServicewhere unhandled exceptions instart(),stop(), orcancel()implementations would preventprocess_frame()to continue and thereforeStartFrame,EndFrame, orCancelFramefrom being pushed downstream, causing the pipeline to not start or stop properly.
(PR #3503) -
Moved
NVIDIATTSServiceandNVIDIASTTServiceclient initialization from constructor tostart()for better error handling.
(PR #3504) -
Optimized
NVIDIATTSServiceto process incoming audio frames immediately.
(PR #3509) -
Optimized
NVIDIASTTServiceby removing unnecessary queue and task.
(PR #3509) -
Fixed a
CambTTSServiceissue where client was being initialized in the constructor which wouldn't allow for proper Pipeline error handling.
(PR #3511)
v0.0.99
Added
-
Introducing user turn strategies. User turn strategies indicate when the user turn starts or stops. In conversational agents, these are often referred to as start/stop speaking or turn-taking plans or policies.
User turn start strategies indicate when the user starts speaking (e.g. using VAD events or when a user says one or more words).
User turn stop strategies indicate when the user stops speaking (e.g. using an end-of-turn detection model or by observing incoming transcriptions).
A list of strategies can be specified for both strategies; strategies are evaluated in order until one evaluates to true.
Available user turn start strategies:
- VADUserTurnStartStrategy
- TranscriptionUserTurnStartStrategy
- MinWordsUserTurnStartStrategy
- ExternalUserTurnStartStrategyAvailable user turn stop strategies:
- TranscriptionUserTurnStopStrategy
- TurnAnalyzerUserTurnStopStrategy
- ExternalUserTurnStopStrategyThe default strategies are:
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
- stop: [TranscriptionUserTurnStopStrategy]Turn strategies are configured when setting up
LLMContextAggregatorPair. For example:context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( user_turn_strategies=UserTurnStrategies( stop=[ TurnAnalyzerUserTurnStopStrategy( turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams()) ) ], ) ), )
In order to use the user turn strategies you must update to the new universal
LLMContextandLLMContextAggregatorPair. (PR #3045) -
Added
RNNoiseFilterfor real-time noise suppression using RNNoise neural network via pyrnnoise library. (PR #3205) -
Added
GrokRealtimeLLMServicefor xAI's Grok Voice Agent API with real-time voice conversations:- Support for real-time audio streaming with WebSocket connection
- Built-in server-side VAD (Voice Activity Detection)
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
- Built-in tools support: web_search, x_search, file_search
- Custom function calling with standard Pipecat tools schema
- Configurable audio formats (PCM at 8kHz-48kHz)
(PR #3267)
-
Added an approximation of TTFB for Ultravox.
(PR #3268) -
Added a new
AudioContextTTSServiceto the TTS service base classes. TheAudioContextWordTTSServicenow inherits fromAudioContextTTSServiceandWebsocketWordTTSService. (PR #3289) -
LLMUserAggregatornow exposes the following events:on_user_turn_started: triggered when a user turn startson_user_turn_stopped: triggered when a user turn endson_user_turn_stop_timeout: triggered when a user turn does not stop and times out
(PR #3291)
-
Introducing user mute strategies. User mute strategies indicate when user input should be muted based on the current system state.
In conversational agents, user mute strategies are used to prevent user input from interrupting bot speech, tool execution, or other critical system operations.
A list of strategies can be specified; all strategies are evaluated for every frame so that each strategy can maintain its internal state. A user frame is muted if any of the configured strategies indicates it should be muted.
Available user mute strategies:
FirstSpeechUserMuteStrategyMuteUntilFirstBotCompleteUserMuteStrategyAlwaysUserMuteStrategyFunctionCallUserMuteStrategy
User mute strategies replace the legacy
STTMuteFilterand provide a more flexible and composable approach to muting user input.User mute strategies are configured when setting up the
LLMContextAggregatorPair. For example:context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( user_mute_strategies=[ FirstSpeechUserMuteStrategy(), ] ), )
In order to use user mute strategies you should update to the new universal
LLMContextandLLMContextAggregatorPair.
(PR #3292) -
Added
use_sslparameter toNvidiaSTTService,NvidiaSegmentedSTTServiceandNvidiaTTSService.
(PR #3300) -
Added
enable_interruptionsconstructor argument to all user turn strategies. This tells theLLMUserAggregatorto push or not push anInterruptionFrame.
(PR #3316) -
Added
split_sentencesparameter toSpeechmaticsSTTServiceto control sentence splitting behavior for finals on sentence boundaries.
(PR #3328) -
Added word-level timestamp support to
AzureTTSServicefor accurate text-to-audio synchronization.
(PR #3334) -
Added
pronunciation_dict_idparameter toCartesiaTTSService.InputParamsandCartesiaHttpTTSService.InputParamsto support Cartesia's pronunciation dictionary feature for custom pronunciations.
(PR #3346) -
Added support for using the HeyGen LiveAvatar API with the
HeyGenTransport(see https://www.liveavatar.com/).
(PR #3357) -
Added image support to
OpenAIRealtimeLLMServiceviaInputImageRawFrame:- New
start_video_pausedparameter to control initial video input state - New
video_frame_detailparameter to set image processing quality ("auto", "low", or "high"). This corresponds to OpenAI Realtime'simage_detailparameter. set_video_input_paused()method to pause/resume video input at runtimeset_video_frame_detail()method to adjust video frame quality dynamically- Automatic rate limiting (1 frame per second) to prevent API overload
(PR #3360)
- New
-
Added
UserTurnProcessor, a frame processor built onUserTurnControllerthat pushesUserStartedSpeakingFrameandUserStoppedSpeakingFrameframes and interruptions based on the controller's user turn strategies.
(PR #3372) -
Added
UserTurnControllerto manage user turns. It emitson_user_turn_started,on_user_turn_stopped, andon_user_turn_stop_timeoutevents, and can be integrated into processors to detect and handle user turns.LLMUserAggregatorandUserTurnProcessorare implemented using this controller.
(PR #3372) -
Added
should_interruptproperty toDeepgramFluxSTTService,DeepgramSTTService, andSpeechmaticsSTTServiceto configure whether the bot should be interrupted when the external service detects user speech.
(PR #3374) -
LLMAssistantAggregatornow exposes the following events:on_assistant_turn_started: triggered when the assistant turn startson_assistant_turn_stopped: triggered when the assistant turn endson_assistant_thought: triggered when there's an assistant thought available
(PR #3385)
-
Added
KrispVivaTurnanalyzer for end of turn detection using the Krisp VIVA SDK (requireskrisp_audio).
(PR #3391) -
Added support for setting up a pipeline task from external files. You can now register custom pipeline task setup files by setting the
PIPECAT_SETUP_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_SETUP_FILES="setup1.py:setup.py:..."). Each file must define a function with the following signature:async def setup_pipeline_task(task: PipelineTask): ...
(PR #3397)
-
Added a keepalive task for
InworldTTSServiceto keep the service connected in the event of no generations for longer periods of time.
(PR #3403) -
Added
enable_vadtoParamsfor use in theGladiaSTTService. When enabled,GladiaSTTServiceacts as the turn controller, emittingUserStartedSpeakingFrame,UserStoppedSpeakingFrame, and optionallyInterruptionFrame.
(PR #3404) -
Added
should_interruptproperty toGladiaSTTServiceto configure whether the bot should be interrupted when the external service detects user speech.
(PR #3404) -
Added
VonageFrameSerializerfor the Vonage Video API Audio Connector WebSocket protocol.
(PR #3410) -
Added
append_trailing_spaceparameter toTTSServiceto automatically append a trailing space to text before sending to TTS, helping prevent some services from vocalizing trailing punctuation.
(PR #3424)
Changed
-
Updated
ElevenLabsRealtimeSTTServiceto accept theinclude_language_detectionparameter to detect language.stt = ElevenLabsRealtimeSTTService( api_key=os.getenv("...
v0.0.98
Added
-
Added
RimeNonJsonTTSServicewhich supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model.
(PR #3085) -
Added additional functionality related to "thinking", for Google and Anthropic LLMs.
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
AnthropicLLMService.ThinkingConfigGoogleLLMService.ThinkingConfig
- New frames for representing thoughts output by LLMs:
LLMThoughtStartFrameLLMThoughtTextFrameLLMThoughtEndFrame
- A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. See:
LLMThoughtEndFrame.signatureLLMAssistantAggregatorhandling of the above fieldAnthropicLLMAdapterhandling of"thought"context messages
- Google-specific logic for inserting thought signatures into the context, to help maintain thinking continuity in a chain of LLM calls. See:
GoogleLLMServicesendingLLMMessagesAppendFrames to add LLM-specific"thought_signature"messages to contextGeminiLLMAdapterhandling of"thought_signature"messages
- An expansion of
TranscriptProcessorto process LLM thoughts in addition to user and assistant utterances. See:TranscriptProcessor(process_thoughts=True)(defaults toFalse)ThoughtTranscriptionMessage, which is now also emitted with the
"on_transcript_update"event
(PR #3175)
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
-
Data and control frames can now be marked as non-interruptible by using the
UninterruptibleFramemixin. Frames marked asUninterruptibleFramewill not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions.
(PR #3189) -
Added
on_conversation_detectedevent toVoicemaiDetector.
(PR #3207) -
Added
x-goog-api-clientheader with Pipecat's version to all Google services' requests.
(PR #3208) -
Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
(PR #3210) -
Added to
AWSNovaSonicLLMServicefunctionality related to the new (and now default) Nova 2 Sonic model ("amazon.nova-2-sonic-v1:0"):- Added the
endpointing_sensitivityparameter to control how quickly the model decides the user has stopped speaking. - Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model.
(PR #3212)
- Added the
-
Ultravox Realtime is now a supported speech-to-speech service.
- Added
UltravoxRealtimeLLMServicefor the integration. - Added
49-ultravox-realtime.pyexample (with tool calling).
(PR #3227)
- Added
-
Added Daily PSTN dial-in support to the development runner with
--dialinflag. This includes:/daily-dialin-webhookendpoint that handles incoming Daily PSTN webhooks- Automatic Daily room creation with SIP configuration
DialinSettingsandDailyDialinRequesttypes inpipecat.runner.typesfor type-safe dial-in data- The runner now mimics Pipecat Cloud's dial-in webhook handling for local development
(PR #3235)
-
Add Gladia session id to logs for
GladiaSTTService.
(PR #3236) -
Added
InworldHttpTTSServicewhich uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously namedInworldTTSService.
(PR #3239) -
Added
language_hints_strictparameter toSonioxSTTServiceto strictly enforces language hints. This ensures that transcription occurs in the specified language.
(PR #3245) -
Added Pipecat library version info to the
aboutfield in thebot-readyRTVI message.
(PR #3248) -
Added
VisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame. This are used by vision services similar to LLM services.
(PR #3252)
Changed
-
FunctionCallInProgressFrameandFunctionCallResultFramehave changed from system frames to a control frame and a data frame, respectively, and are now both marked asUninterruptibleFrame.
(PR #3189) -
UserBotLatencyLogObservernow usesVADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrameto determine latency from user stopped speaking to bot started speaking.
(PR #3206) -
Updated
HeyGenVideoServiceandHeyGenTransportto support both HeyGen APIs (Interactive Avatar and Live Avatar).Using them is as simple as specifying the
service_typewhen creating theHeyGenVideoServiceand theHeyGenTransport:heyGen = HeyGenVideoService( api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"), service_type=ServiceType.LIVE_AVATAR, session=session, )
(PR #3210)
-
Made
"amazon.nova-2-sonic-v1:0"the new default model forAWSNovaSonicLLMService.
(PR #3212) -
Updated the
run_inferencemethods in the LLM service classes (AnthropicLLMService,AWSBedrockLLMService,GoogleLLMService, andOpenAILLMServiceand its base classes) to use the provided LLM configuration parameters.
(PR #3214) -
Updated default models for:
GeminiLiveLLMServicetogemini-2.5-flash-native-audio-preview-12-2025.GeminiLiveVertexLLMServicetogemini-live-2.5-flash-native-audio.
(PR #3228)
-
Changed the
reasonfield inEndFrame,CancelFrame,EndTaskFrame, andCancelTaskFramefromstrtoAnyto indicate that it can hold values other than strings.
(PR #3231) -
Updated websocket STT services to use the
WebsocketSTTServicebase class. This base class manages the websocket connection and handles reconnects.Updated services:
AssemblyAISTTServiceAWSTranscribeSTTServiceGladiaSTTServiceSonioxSTTService
(PR #3236)
-
Changed Inworld's TTS service implementations:
- Previously, the HTTP implementation was named
InworldTTSService. That has been moved toInworldHttpTTSService. This service now supports word-timestamp alignment data in both streaming and non-streaming modes. - Updated the
InworldTTSServiceclass to use Inworld's Websocket API. This class now has support for word-timestamp alignment data and tracks contexts for each user turn.
(PR #3239)
- Previously, the HTTP implementation was named
-
⚠️ Breaking change:WordTTSService.start_word_timestamps()andWordTTSService.reset_word_timestamps()are now async.
(PR #3240) -
Updated the current RTVI version to 1.1.0 to reflect recent additions and deprecations.
- New RTVI Messages:
send-textandbot-output - Deprecated Messages:
append-to-contextandbot-transcription
(PR #3248)
- New RTVI Messages:
-
MoondreamServicenow pushesVisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame.
(PR #3252)
Deprecated
FalSmartTurnAnalyzerandLocalSmartTurnAnalyzerare deprecated and will be removed in a future version. UseLocalSmartTurnAnalyzerV3instead.
(PR #3219)
Removed
- Removed the deprecated VLLM-based open source Ultravox STT service.
(PR #3227)
Fixed
-
Fixed a bug in
AWSNovaSonicLLMServicewhere we would mishandle cancelled tool calls in the context, resulting in errors.
(PR #3212) -
Better support conversation history with Gemini 2.5 Flash Image (model "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of previous images it had generated, so it wouldn't be able to iterate on them.
(PR #3224) -
Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview"). Prior to this fix, after the model generated an image the conversation would not be able to progress.
(PR #3224) -
Fixed an issue where
ElevenLabsHttpTTSServicewas not updating voice settings when receiving aTTSUpdateSettingsFrame.
(PR #3226) -
Fixed the return type for
SmallWebRTCRequestHandler.handle_web_request()function.
(PR #3230) -
Fix a bug in LLM context audio content handling
...
v0.0.97
Added
-
Added new Gradium services,
GradiumSTTServiceandGradiumTTSService, for speech-to-text and text-to-speech functionality using Gradium's API. -
Additions for
AsyncAITTSServiceandAsyncAIHttpTTSService:- Added new
languages:pt,nl,ar,ru,ro,ja,he,hy,tr,hi,zh. - Updated the default model to
asyncflow_multilingual_v1.0for improved accuracy and broader language coverage.
- Added new
-
Added optional tool and tool output filters for MCP services.
Changed
-
Updated Deepgram logging to include Deepgram request IDs for improved debugging.
-
Text Aggregation Improvements:
- Breaking Change:
BaseTextAggregator.aggregate()now returnsAsyncIterator[Aggregation]instead ofOptional[Aggregation]. This enables the aggregator to return multiple results based on the provided text. - Refactored text aggregators to use inheritance:
SkipTagsAggregatorandPatternPairAggregatornow inherit fromSimpleTextAggregator, reusing the base class's sentence detection logic.
- Breaking Change:
-
Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g.,
GoogleLLMService) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses. -
Updated
AICFilterto use Quail STT as the default model (AICModelType.QUAIL_STT). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters. -
If an unexpected exception is caught, or if
FrameProcessor.push_error()is called with an exception, the file name and line number where the exception occured are now logged. -
Updated Smart Turn model weights to v3.1.
-
Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.
-
Updated
CartesiaSTTServiceto return the full transcriptionresultin theTranscriptionFrameandInterimTranscriptionFrame. This provides access to word timestamp data. -
Added tracking headers (
X-Hume-Client-NameandX-Hume-Client-Version) to all requests made byHumeTTSServiceto the Hume API for better usage tracking and analytics.- Added
stop()andcancel()cleanup methods toHumeTTSServiceto properly close the HTTP client and prevent resource leaks.
- Added
Deprecated
-
NVIDIA Services name changes (all functionality is unchanged):
NimLLMServiceis now deprecated, useNvidiaLLMServiceinstead.RivaSTTServiceis now deprecated, useNvidiaSTTServiceinstead.RivaTTSServiceis now deprecated, useNvidiaTTSServiceinstead.- Use
uv pip install pipecat-ai[nvidia]instead ofuv pip install pipecat-ai[riva]
-
The
noise_gate_enableparameter inAICFilteris deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. UseAICFilter.create_vad_analyzer()for VAD functionality instead. -
Package
pipecat.syncis deprecated, usepipecat.utils.syncinstead.
Fixed
-
Fixed bug in
PatternPairAggregatorwhere pattern handlers could be called multiple times forKEEPorAGGREGATEpatterns. -
Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
-
Fixed an issue in
AWSTranscribeSTTServicewhere theregionarg was always set tous-east-1when providing an AWS_REGION env var. -
Fixed an issue in
SarvamTTSServicewhere the last sentence was not being spoken. Now, audio is flushed when the TTS services receives theLLMFullResponseEndFrameorEndFrame. -
Fixed an issue in
DeepgramTTSServicewhere aTTSStoppedFramewas incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call. -
Fixed an issue where
LLMTextFrame.skip_ttswas being overwritten by LLM services. -
Fixed an issue that caused
WebsocketServiceinstances to attempt reconnection during shutdown. -
Fixed an issue in
ElevenLabsTTSServicewhere character usage metrics were only reported on the first TTS generation per turn.