Releases: pipecat-ai/pipecat
v0.0.98
Added
-
Added
RimeNonJsonTTSServicewhich supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model.
(PR #3085) -
Added additional functionality related to "thinking", for Google and Anthropic LLMs.
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
AnthropicLLMService.ThinkingConfigGoogleLLMService.ThinkingConfig
- New frames for representing thoughts output by LLMs:
LLMThoughtStartFrameLLMThoughtTextFrameLLMThoughtEndFrame
- A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. See:
LLMThoughtEndFrame.signatureLLMAssistantAggregatorhandling of the above fieldAnthropicLLMAdapterhandling of"thought"context messages
- Google-specific logic for inserting thought signatures into the context, to help maintain thinking continuity in a chain of LLM calls. See:
GoogleLLMServicesendingLLMMessagesAppendFrames to add LLM-specific"thought_signature"messages to contextGeminiLLMAdapterhandling of"thought_signature"messages
- An expansion of
TranscriptProcessorto process LLM thoughts in addition to user and assistant utterances. See:TranscriptProcessor(process_thoughts=True)(defaults toFalse)ThoughtTranscriptionMessage, which is now also emitted with the
"on_transcript_update"event
(PR #3175)
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
-
Data and control frames can now be marked as non-interruptible by using the
UninterruptibleFramemixin. Frames marked asUninterruptibleFramewill not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions.
(PR #3189) -
Added
on_conversation_detectedevent toVoicemaiDetector.
(PR #3207) -
Added
x-goog-api-clientheader with Pipecat's version to all Google services' requests.
(PR #3208) -
Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
(PR #3210) -
Added to
AWSNovaSonicLLMServicefunctionality related to the new (and now default) Nova 2 Sonic model ("amazon.nova-2-sonic-v1:0"):- Added the
endpointing_sensitivityparameter to control how quickly the model decides the user has stopped speaking. - Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model.
(PR #3212)
- Added the
-
Ultravox Realtime is now a supported speech-to-speech service.
- Added
UltravoxRealtimeLLMServicefor the integration. - Added
49-ultravox-realtime.pyexample (with tool calling).
(PR #3227)
- Added
-
Added Daily PSTN dial-in support to the development runner with
--dialinflag. This includes:/daily-dialin-webhookendpoint that handles incoming Daily PSTN webhooks- Automatic Daily room creation with SIP configuration
DialinSettingsandDailyDialinRequesttypes inpipecat.runner.typesfor type-safe dial-in data- The runner now mimics Pipecat Cloud's dial-in webhook handling for local development
(PR #3235)
-
Add Gladia session id to logs for
GladiaSTTService.
(PR #3236) -
Added
InworldHttpTTSServicewhich uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously namedInworldTTSService.
(PR #3239) -
Added
language_hints_strictparameter toSonioxSTTServiceto strictly enforces language hints. This ensures that transcription occurs in the specified language.
(PR #3245) -
Added Pipecat library version info to the
aboutfield in thebot-readyRTVI message.
(PR #3248) -
Added
VisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame. This are used by vision services similar to LLM services.
(PR #3252)
Changed
-
FunctionCallInProgressFrameandFunctionCallResultFramehave changed from system frames to a control frame and a data frame, respectively, and are now both marked asUninterruptibleFrame.
(PR #3189) -
UserBotLatencyLogObservernow usesVADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrameto determine latency from user stopped speaking to bot started speaking.
(PR #3206) -
Updated
HeyGenVideoServiceandHeyGenTransportto support both HeyGen APIs (Interactive Avatar and Live Avatar).Using them is as simple as specifying the
service_typewhen creating theHeyGenVideoServiceand theHeyGenTransport:heyGen = HeyGenVideoService( api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"), service_type=ServiceType.LIVE_AVATAR, session=session, )
(PR #3210)
-
Made
"amazon.nova-2-sonic-v1:0"the new default model forAWSNovaSonicLLMService.
(PR #3212) -
Updated the
run_inferencemethods in the LLM service classes (AnthropicLLMService,AWSBedrockLLMService,GoogleLLMService, andOpenAILLMServiceand its base classes) to use the provided LLM configuration parameters.
(PR #3214) -
Updated default models for:
GeminiLiveLLMServicetogemini-2.5-flash-native-audio-preview-12-2025.GeminiLiveVertexLLMServicetogemini-live-2.5-flash-native-audio.
(PR #3228)
-
Changed the
reasonfield inEndFrame,CancelFrame,EndTaskFrame, andCancelTaskFramefromstrtoAnyto indicate that it can hold values other than strings.
(PR #3231) -
Updated websocket STT services to use the
WebsocketSTTServicebase class. This base class manages the websocket connection and handles reconnects.Updated services:
AssemblyAISTTServiceAWSTranscribeSTTServiceGladiaSTTServiceSonioxSTTService
(PR #3236)
-
Changed Inworld's TTS service implementations:
- Previously, the HTTP implementation was named
InworldTTSService. That has been moved toInworldHttpTTSService. This service now supports word-timestamp alignment data in both streaming and non-streaming modes. - Updated the
InworldTTSServiceclass to use Inworld's Websocket API. This class now has support for word-timestamp alignment data and tracks contexts for each user turn.
(PR #3239)
- Previously, the HTTP implementation was named
-
⚠️ Breaking change:WordTTSService.start_word_timestamps()andWordTTSService.reset_word_timestamps()are now async.
(PR #3240) -
Updated the current RTVI version to 1.1.0 to reflect recent additions and deprecations.
- New RTVI Messages:
send-textandbot-output - Deprecated Messages:
append-to-contextandbot-transcription
(PR #3248)
- New RTVI Messages:
-
MoondreamServicenow pushesVisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame.
(PR #3252)
Deprecated
FalSmartTurnAnalyzerandLocalSmartTurnAnalyzerare deprecated and will be removed in a future version. UseLocalSmartTurnAnalyzerV3instead.
(PR #3219)
Removed
- Removed the deprecated VLLM-based open source Ultravox STT service.
(PR #3227)
Fixed
-
Fixed a bug in
AWSNovaSonicLLMServicewhere we would mishandle cancelled tool calls in the context, resulting in errors.
(PR #3212) -
Better support conversation history with Gemini 2.5 Flash Image (model "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of previous images it had generated, so it wouldn't be able to iterate on them.
(PR #3224) -
Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview"). Prior to this fix, after the model generated an image the conversation would not be able to progress.
(PR #3224) -
Fixed an issue where
ElevenLabsHttpTTSServicewas not updating voice settings when receiving aTTSUpdateSettingsFrame.
(PR #3226) -
Fixed the return type for
SmallWebRTCRequestHandler.handle_web_request()function.
(PR #3230) -
Fix a bug in LLM context audio content handling
...
v0.0.97
Added
-
Added new Gradium services,
GradiumSTTServiceandGradiumTTSService, for speech-to-text and text-to-speech functionality using Gradium's API. -
Additions for
AsyncAITTSServiceandAsyncAIHttpTTSService:- Added new
languages:pt,nl,ar,ru,ro,ja,he,hy,tr,hi,zh. - Updated the default model to
asyncflow_multilingual_v1.0for improved accuracy and broader language coverage.
- Added new
-
Added optional tool and tool output filters for MCP services.
Changed
-
Updated Deepgram logging to include Deepgram request IDs for improved debugging.
-
Text Aggregation Improvements:
- Breaking Change:
BaseTextAggregator.aggregate()now returnsAsyncIterator[Aggregation]instead ofOptional[Aggregation]. This enables the aggregator to return multiple results based on the provided text. - Refactored text aggregators to use inheritance:
SkipTagsAggregatorandPatternPairAggregatornow inherit fromSimpleTextAggregator, reusing the base class's sentence detection logic.
- Breaking Change:
-
Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g.,
GoogleLLMService) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses. -
Updated
AICFilterto use Quail STT as the default model (AICModelType.QUAIL_STT). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters. -
If an unexpected exception is caught, or if
FrameProcessor.push_error()is called with an exception, the file name and line number where the exception occured are now logged. -
Updated Smart Turn model weights to v3.1.
-
Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.
-
Updated
CartesiaSTTServiceto return the full transcriptionresultin theTranscriptionFrameandInterimTranscriptionFrame. This provides access to word timestamp data. -
Added tracking headers (
X-Hume-Client-NameandX-Hume-Client-Version) to all requests made byHumeTTSServiceto the Hume API for better usage tracking and analytics.- Added
stop()andcancel()cleanup methods toHumeTTSServiceto properly close the HTTP client and prevent resource leaks.
- Added
Deprecated
-
NVIDIA Services name changes (all functionality is unchanged):
NimLLMServiceis now deprecated, useNvidiaLLMServiceinstead.RivaSTTServiceis now deprecated, useNvidiaSTTServiceinstead.RivaTTSServiceis now deprecated, useNvidiaTTSServiceinstead.- Use
uv pip install pipecat-ai[nvidia]instead ofuv pip install pipecat-ai[riva]
-
The
noise_gate_enableparameter inAICFilteris deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. UseAICFilter.create_vad_analyzer()for VAD functionality instead. -
Package
pipecat.syncis deprecated, usepipecat.utils.syncinstead.
Fixed
-
Fixed bug in
PatternPairAggregatorwhere pattern handlers could be called multiple times forKEEPorAGGREGATEpatterns. -
Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
-
Fixed an issue in
AWSTranscribeSTTServicewhere theregionarg was always set tous-east-1when providing an AWS_REGION env var. -
Fixed an issue in
SarvamTTSServicewhere the last sentence was not being spoken. Now, audio is flushed when the TTS services receives theLLMFullResponseEndFrameorEndFrame. -
Fixed an issue in
DeepgramTTSServicewhere aTTSStoppedFramewas incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call. -
Fixed an issue where
LLMTextFrame.skip_ttswas being overwritten by LLM services. -
Fixed an issue that caused
WebsocketServiceinstances to attempt reconnection during shutdown. -
Fixed an issue in
ElevenLabsTTSServicewhere character usage metrics were only reported on the first TTS generation per turn.
v0.0.96
🦃 Happy Thanksgiving! 🦃
Added
-
Added
AWSBedrockAgentCoreProcessorto support invoking an AgentCore-hosted agent in a Pipecat pipeline. -
Enhanced error handling across the framework:
-
Added
on_errorcallback toFrameProcessorfor centralized error handling. -
Renamed
push_error(error: ErrorFrame)topush_error_frame(error: ErrorFrame)for clarity. -
Added new
push_errormethod for simplified error reporting:async def push_error(error_msg: str, exception: Optional[Exception] = None, fatal: bool = False)
-
Standardized error logging by replacing
logger.exceptioncalls withlogger.errorthroughout the codebase.
-
-
Added
cache_read_input_tokens,cache_creation_input_tokensandreasoning_tokensto OTel spans for LLM call -
Added
LiveKitRESTHelperutility class for managing LiveKit rooms via REST API. -
Added
DeepgramSageMakerSTTServicewhich connects to a SageMaker hosted Deepgram STT model. Added07c-interruptible-deepgram-sagemaker.pyfoundational example. -
Added
SageMakerBidiClientto connect to SageMaker hosted BiDi compatible services. -
Added support for
include_timestampsandenable_logginginElevenLabsRealtimeSTTService. Wheninclude_timestampsis enabled, timestamp data is included in theTranscriptionFrame'sresultparameter. -
Added optional speaking rate control to
InworldTTSService. -
Introduced a new
AggregatedTextFrametype to support passing text along with anaggregated_byfield to describe the type of text included.TTSTextFrames now inherit fromAggregatedTextFrame. With this inheritance, an observer can watch forAggregatedTextFrames to accumlate the perceived output and determine whether or not the text was spoken based on if that frame is also aTTSTextFrame.With this frame, the llm token stream can be transformed into custom composable chunks, allowing for aggregation outside the TTS service. This makes it possible to listen for or handle those aggregations and sets the stage for doing things like composing a best effort of the perceived llm output in a more digestable form and to do so whether or not it is processed by a TTS or if even a TTS exists.
-
Introduced
LLMTextProcessor: A new processor meant to allow customization for how LLMTextFrames should be aggregated and considered. It's purpose is to turnLLMTextFrames intoAggregatedTextFrames. By default, a TTSService will still aggregateLLMTextFrames by sentence for the service to consume. However, if you wish to override how the llm text is aggregated, you should no longer override the TTS's internal text_aggregator, but instead, insert this processor between your LLM and TTS in the pipeline. -
New
bot-outputRTVI message to represent what the bot actually "says".-
The
RTVIObservernow emitsbot-outputmessages based off the newAggregatedTextFrames (bot-tts-textandbot-llm-textare still supported and generated, butbot-transcriptis now deprecated in lieu of this new, more thorough, message). -
The new
RTVIBotOutputMessageincludes the fields:-
spoken: A boolean indicating whether the text was spoken by TTS -
aggregated_by: A string representing how the text was aggregated ("sentence", "word", "my custom aggregation")
-
-
Introduced new fields to
RTVIObserverto support the newbot-output
messaging:-
bot_output_enabled: Defaults to True. Set to false to disable bot-output messages. -
skip_aggregator_types: Defaults toNone. Set to a list of strings that match aggregation types that should not be included in bot-output messages. (Ex.credit_card)
-
-
Introduced new methods,
add_text_transformer()andremove_text_transformer(), toRTVIObserverto support providing (and subsequently removing) callbacks for various types of aggregations (or all aggregations with*) that can modify the text before being sent as abot-outputortts-textmessage. (Think obscuring the credit card or inserting extra detail the client might want that the context doesn't need.)
-
-
In
MiniMaxHttpTTSService:-
Added support for speech-2.6-hd and speech-2.6-turbo models
-
Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, and Tamil
-
Added new emotions: calm and fluent
-
-
Added
enable_loggingtoSimliVideoServiceinput parameters. It's disabled by default.
Changed
-
Updated
FishAudioTTSServicedefault model tos1. -
Updated
DeepgramTTSServiceto use Deepgram's TTS websocket API.⚠️ This is a potential breaking change, which only affects you if you're self-hostingDeepgramTTSService. The new service uses Websockets and improves TTFB latency. -
Updated
daily-pythonto 0.22.0. -
BaseTextAggregatorchanges:Modified the BaseTextAggregator type so that when text gets aggregated, metadata can be associated with it. Currently, that just means a
type, so that the aggregation can be classified or described. Changes made to support this:-
⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white space characters before returning their aggregation fromaggregation()or.text. This way all aggregators have a consistent contract allowing downstream use to know how to stitch aggregations back together. -
Introduced a new
Aggregationdataclass to represent both the aggregatedtextand a string identifying thetypeof aggregation (ex. "sentence", "word", "my custom aggregation") -
⚠️ Breaking change:BaseTextAggregator.textnow returns anAggregation(instead ofstr).Before:
aggregated_text = myAggregator.text
Now:
aggregated_text = myAggregator.text.text
-
⚠️ Breaking change:BaseTextAggregator.aggregate()now returnsOptional[Aggregation](instead ofOptional[str]).Before:
aggregation = myAggregator.aggregate(text) print(f"successfully aggregated text: {aggregation}")
Now:
aggregation = myAggregator.aggregate(text) if aggregation: print(f"successfully aggregated text: {aggregation.text}")
-
SimpleTextAggregator,SkipTagsAggregator,PatternPairAggregatorupdated to produce/consumeAggregationobjects. -
All uses of the above Aggregators have been updated accordingly.
-
-
Augmented the
PatternPairAggregatorso that matched patterns can be treated as their own aggregation, taking advantage of the new. To that end:-
Introduced a new, preferred version of
add_patternto support a new option for treating a match as a separate aggregation returned fromaggregate(). This replaces the now deprecatedadd_pattern_pairmethod and you provide aMatchActionin lieu of theremove_matchfield.-
MatchActionenum:REMOVE,KEEP,AGGREGATE, allowing customization for how a match should be handled.-
REMOVE: The text along with its delimiters will be removed from the streaming text. Sentence aggregation will continue on as if this text did not exist. -
KEEP: The delimiters will be removed, but the content between them will be kept. Sentence aggregation will continue on with the internal text included. -
AGGREGATE: The delimiters will be removed and the content between will be treated as a separate aggregation. Any text before the start of the pattern will be returned early, whether or not a complete sentence was found. Then the pattern will be returned. Then the aggregation will continue on sentence matching after the closing delimiter is found. The content between the delimiters is not aggregated by sentence. It is aggregated as one single block of text.
-
-
PatternMatchnow extendsAggregationand provides richer info to handlers.
-
-
⚠️ Breaking change: ThePatternMatchtype returned to handlers registered viaon_pattern_matchhas been updated to subclass from the newAggregationtype, which means thatcontenthas been replaced withtextandpattern_idhas been replaced withtype:async dev on_match_tag(match: PatternMatch): pattern = match.type # instead of match.pattern_id text = match.text # instead of match.content
-
-
TextFramenow includes the fieldappend_to_contextto support setting whether or not the encompassing text should be added to the LLM context (by the LLM assistant aggregator). It defaults toTrue. -
TTSServicebase class updates:-
TTSServices now accept a newskip_aggregator_typesto avoid speaking certain aggregation types (now determined/returned by the aggregator) -
Introduced the ability to do a just-in-time transform of text before it gets sent to the TTS service via callbacks you can set up via a new init field,
text_transformsor a new methodadd_text_transformer(). This makes it possible to do things like introduce TTS-specific tags for spelling or emotion or change the pronunciation of something on the fly.remove_text_transformerhas also been added to support removing a registered transform callback. -
TTS services push
AggregatedTextFramein addition toTTSTextFrames when either an aggregation occurs that should not be spoken or when the TTS service supports word-by-word timestamping. In the latter case, theTTSServicepreliminarily generates anAggregatedTextFrame, aggregated by sentence to generate the full sentence content as early as possible.
-
-
Updated
CartesiaTTSService:-
Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the
LLMTextProcessor -
Added convenienc...
-
v0.0.95
Added
-
Added ai-coustics integrated VAD (
AICVADAnalyzer) withAICFilterfactory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity. -
Added a watchdog to
DeepgramFluxSTTServiceto prevent dangling tasks in case the user was speaking and we stop receiving audio. -
Introduced a minimum confidence parameter in
DeepgramFluxSTTServiceto avoid generating transcriptions below a defined threshold. -
Added
ElevenLabsRealtimeSTTServicewhich implements the Realtime STT service from ElevenLabs. -
Added word-level timestamps support to Hume TTS service
Changed
-
⚠️ Breaking change:LLMContext.create_image_message(),LLMContext.create_audio_message(),LLMContext.add_image_frame_message()andLLMContext.add_audio_frames_message()are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images. -
ConsumerProcessornow queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed. -
BaseTextFilteronly require subclasses to implement thefilter()method. -
Extracted the logic for retrying connections, and create a new
send_with_retrymethod insideWebSocketService. -
Refactored
DeepgramFluxSTTServiceto automatically reconnect if sending a message fails. -
Updated all STT and TTS services to use consistent error handling pattern with
push_error()method for better pipeline error event integration. -
Added support for
maybe_capture_participant_camera()andmaybe_capture_participant_screen()forSmallWebRTCTransportin the runner utils. -
Added Hindi support for Rime TTS services.
-
Updated
GeminiTTSServiceto use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now usescredentials/credentials_pathfor authentication. Theapi_keyparameter is deprecated. Also, added support forpromptparameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis. -
Updated language mappings for the Google and Gemini TTS services to match official documentation.
Deprecated
- The
api_keyparameter inGeminiTTSServiceis deprecated. Usecredentialsorcredentials_pathinstead for Google Cloud authentication.
Fixed
-
Fixed a
SimliVideoServiceconnection issue. -
Fixed an issue in the
Runnerwhere, when usingSmallWebRTCTransport, therequest_datawas not being passed to theSmallWebRTCRunnerArgumentsbody. -
Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.
-
Fixed an issue where
NeuphonicTTSServicewasn't pushingTTSTextFrames, meaning assistant messages weren't being written to context. -
Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal
LLMContext. -
Fixed issue where
DeepgramFluxSTTServicefailed to connect if passing akeytermortagcontaining a space. -
Prevented
HeyGenVideoServicefrom automatically disconnecting after 5 minutes.
v0.0.94
Deprecated
- The
KrispFilteris deprecated and will be removed in a future version. Use theKrispVivaFilterinstead.
Removed
LivekitFrameSerializerhas been removed. UseLiveKitTransportinstead.
Fixed
- Fixed a bug related to
LLMAssistantAggregatorwhere spaces were sometimes missing from assistant messages in context.
v0.0.93
Added
-
Added support for Sarvam Speech-to-Text service (
SarvamSTTService) with streaming WebSocket support forsaarika(STT) andsaaras(STT-translate) models. -
Added support for passing in a
ToolsSchemain lieu of a list of provider- specific dicts when initializingOpenAIRealtimeLLMServiceor when updating it usingLLMUpdateSettingsFrame. -
Added
TransportParams.audio_out_silence_secs, which specifies how many seconds of silence to output when anEndFramereaches the output transport. This can help ensure that all audio data is fully delivered to clients. -
Added new
FrameProcessor.broadcast_frame()method. This will push two instances of a given frame class, one upstream and the other downstream.await self.broadcast_frame(UserSpeakingFrame)
-
Added
MetricsLogObserverfor logging performance metrics fromMetricsFrameinstances. Supports filtering viainclude_metricsparameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics). -
Added
pronunciation_dictionary_locatorstoElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added support for loading external observers. You can now register custom pipeline observers by setting the
PIPECAT_OBSERVER_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."). Each file must define a function with the following signature:async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]: ...
-
Added support for new sonic-3 languages in
CartesiaTTSServiceandCartesiaHttpTTSService. -
EndFrameandEndTaskFramehave an optionalreasonfield to indicate why the pipeline is being ended. -
CancelFrameandCancelTaskFramehave an optionalreasonfield to indicate why the pipeline is being canceled. This can be also specified when you cancel a task withPipelineTask.cancel(reason="cancellation reason"). -
Added
include_prob_metricsparameter to Whisper STT services to enable access to probability metrics from transcription results. -
Added utility functions
extract_whisper_probability(),extract_openai_gpt4o_probability(), andextract_deepgram_probability()to extract probability metrics fromTranscriptionFrameobjects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (aFunctionSchema-less function). -
Added
MCPClient.get_tools_schema()andMCPClient.register_tools_schema()as a two-step alternative toMCPClient.register_tools(), to allow users to pass MCP tools to, say,GeminiLiveLLMService(as well as other speech-to-speech services) in the constructor. -
Added support for passing in an
LLMSwichertoMCPClient.register_tools()(as well as the newMCPClient.register_tools_schema()). -
Added
cpu_countparameter toLocalSmartTurnAnalyzerV3. This is set to1by default for more predictable performance on low-CPU systems.
Changed
-
Improved
concatenate_aggregated_text()to one word outputs from OpenAI Realtime and Gemini Live. Text fragments are now correctly concatenated without spaces when these patterns are detected. -
STTMuteFilterno longer sendsSTTMuteFrameto the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed after the STT service itself. -
Improved
GoogleSTTServiceerror handling to properly catch gRPCAbortedexceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs. -
Bumped the
fastapidependency's upperbound to<0.122.0. -
Updated the default model for
GoogleVertexLLMServicetogemini-2.5-flash. -
Updated the
GoogleVertexLLMServiceto use theGoogleLLMServiceas a base
class instead of theOpenAILLMService. -
Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages.
Removed
- Removed
needs_mcp_alternate_schema()fromLLMService. The mechanism that relied on it went away.
Fixed
-
Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators.
-
Fixed
DeepgramSTTService._disconnect()to properly awaitis_connected()method call, which is an async coroutine in the Deepgram SDK. -
Fixed an issue where the
SmallWebRTCRequestdataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients whereAPIRequestis used. -
Fixed a bug in
GeminiLiveLLMServicewhere in some circumstances it wouldn't respond after a tool call. -
Fixed
GeminiLiveLLMServicesession resumption after a connection timeout. -
GeminiLiveLLMServicenow properly supports context-provided system instruction and tools. -
Fixed
GoogleLLMServicetoken counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks.
v0.0.92
🎃 The Haunted Edition 👻
Added
-
Added a new
DeepgramHttpTTSService, which delivers a meaningful reduction in latency when compared to theDeepgramTTSService. -
Add support for
speaking_rateinput parameter inGoogleHttpTTSService. -
Added
enable_speaker_diarizationandenable_language_identificationtoSonioxSTTService. -
Added
SpeechmaticsTTSService, which uses Speechmatic's TTS API. Updated examples 07a* to use the new TTS service. -
Added support for including images or audio to LLM context messages using
LLMContext.create_image_message()orLLMContext.create_image_url_message()(not all LLMs support URLs) andLLMContext.create_audio_message(). For example, when creatingLLMMessagesAppendFrame:message = LLMContext.create_image_message(image=..., size= ...) await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
-
New event handlers for the
DeepgramFluxSTTService:on_start_of_turn,on_turn_resumed,on_end_of_turn,on_eager_end_of_turn,on_update. -
Added
generation_configparameter support toCartesiaTTSServiceandCartesiaHttpTTSServicefor Cartesia Sonic-3 models. Includes a newGenerationConfigclass withvolume(0.5-2.0),speed(0.6-1.5), andemotion(60+ options) parameters for fine-grained speech generation control. -
Expanded support for univeral
LLMContexttoOpenAIRealtimeLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
OpenAIRealtimeLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Note:
TranscriptionFrames andInterimTranscriptionFrames now go upstream fromOpenAIRealtimeLLMService, so if you're usingTranscriptProcessor, say, you'll want to adjust accordingly:pipeline = Pipeline( [ transport.input(), context_aggregator.user(), # BEFORE llm, transcript.user(), # AFTER transcript.user(), llm, transport.output(), transcript.assistant(), context_aggregator.assistant(), ] )
Also worth noting: whether or not you use the new context-setup pattern with
OpenAIRealtimeLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: OpenAIContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: OpenAIRealtimeLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
RealtimeMessagesUpdateFrameandRealtimeFunctionCallResultFramehave been deprecated, since they're no longer used byOpenAIRealtimeLLMService. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen forLLMContextFrames for context updates.Finally,
LLMTextFrames are no longer pushed fromOpenAIRealtimeLLMServicewhen it's configured withoutput_modalities=['audio']. If you need to process its output, listen forTTSTextFrames instead. -
Expanded support for universal
LLMContexttoGeminiLiveLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
GeminiLiveLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Worth noting: whether or not you use the new context-setup pattern with
GeminiLiveLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: GeminiLiveContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: GeminiLiveLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
LLMTextFrames are no longer pushed fromGeminiLiveLLMServicewhen it's configured withmodalities=GeminiModalities.AUDIO. If you need to process its output, listen forTTSTextFrames instead.
Changed
-
The development runner's
/startendpoint now supports passingdailyRoomPropertiesanddailyMeetingTokenPropertiesin the request body whencreateDailyRoomis true. Properties are validated against theDailyRoomPropertiesandDailyMeetingTokenPropertiestypes respectively and passed to Daily's room and token creation APIs. -
UserImageRawFramenew fieldsappend_to_contextandtext. Theappend_to_contextfield indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). Thetextfield, if set, might also guide the LLM or the vision service on how to analyze the image. -
UserImageRequestFramenew fielsappend_to_contextandtext. Both fields will be used to set the same fields on the capturedUserImageRawFrame. -
UserImageRequestFramedon't require function call name and ID anymore. -
Updated
MoondreamServiceto processUserImageRawFrame. -
VisionServiceexpectsUserImageRawFramein order to analyze images. -
DailyTransporttriggerson_errorevent if transcription can't be started or stopped. -
DailyTransportupdates:start_dialout()now returns two values:session_idanderror.start_recording()now returns two values:stream_idanderror. -
Updated
daily-pythonto 0.21.0. -
SimliVideoServicenow acceptsapi_keyandface_idparameters directly, with optionalparamsformax_session_lengthandmax_idle_timeconfiguration, aligning with other Pipecat service patterns. -
Updated the default model to
sonic-3forCartesiaTTSServiceandCartesiaHttpTTSService. -
FunctionFilternow has afilter_system_framesarg, which controls whether or not SystemFrames are filtered. -
Upgraded
aws_sdk_bedrock_runtimeto v0.1.1 to resolve potential CPU issues when runningAWSNovaSonicLLMService.
Deprecated
-
The
expect_stripped_wordsparameter ofLLMAssistantAggregatorParamsis ignored when used with the newerLLMAssistantAggregator, which now handles word spacing automatically. -
LLMService.request_image_frame()is deprecated, push aUserImageRequestFrameinstead. -
UserResponseAggregatoris deprecated and will be removed in a future version. -
The
send_transcription_framesargument toOpenAIRealtimeLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.openai.realtime.contextandpipecat.services.openai.realtime.framesare deprecated, as they're no longer used byOpenAIRealtimeLLMService. See "Added" section for details. -
SimliVideoServicesimli_configparameter is deprecated. Useapi_keyandface_idparameters instead.
Removed
-
Removed
enable_non_final_tokensandmax_non_final_tokens_duration_msfromSonioxSTTService. -
Removed the
aiohttp_sessionarg fromSarvamTTSServiceas it's no longer used.
Fixed
-
Fixed a
PipelineTaskissue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle. -
Fixed an issue in
HumeTTSServicethat was only using Octave 2, which does not support thedescriptionfield. Now, if a description is provided, it switches to Octave 1. -
Fixed an issue where
DailyTransportwould timeout prematurely on join and on leave. -
Fixed an issue in the runner where starting a DailyTransport room via
/startdidn't support using theDAILY_SAMPLE_ROOM_URLenv var. -
Fixed an issue in
ServiceSwitcherwhere theSTTServices would result in all STT services producingTranscriptionFrames.
Other
-
Updated all vision 12-series foundational examples to load images from a file.
-
Added 14-series video examples for different services. These new examples request an image from the user camera through a function call.
v0.0.91
Added
-
It is now possible to start a bot from the
/startendpoint when using the runner Daily's transport. This follows the Pipecat Cloud format withcreateDailyRoomandbodyfields in the POST request body. -
Added an ellipsis character (
…) to the end of sentence detection in the string utils. -
Expanded support for universal
LLMContexttoAWSNovaSonicLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
AWSNovaSonicLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime.)Worth noting: whether or not you use the new context-setup pattern with
AWSNovaSonicLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: AWSNovaSonicContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: AWSNovaSonicLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
-
Added support for
bulbul:v3model inSarvamTTSServiceandSarvamHttpTTSService. -
Added
keyterms_promptparameter toAssemblyAIConnectionParams. -
Added
speech_modelparameter toAssemblyAIConnectionParamsto access the multilingual model. -
Added support for trickle ICE to the
SmallWebRTCTransport. -
Added support for updating
OpenAITTSServicesettings (instructionsandspeed) at runtime viaTTSUpdateSettingsFrame. -
Added
--whatsappflag to runner to better surface WhatsApp transport logs. -
Added
on_connectedandon_disconnectedevents to TTS and STT websocket-based services. -
Added an
aggregate_sentencesarg inElevenLabsHttpTTSService, where the default value is True. -
Added a
room_propertiesarg to the Daily runner'sconfigure()method, allowingDailyRoomPropertiesto be provided. -
The runner
--folderargument now supports downloading files from subdirectories.
Changed
-
RunnerArgumentsnow include thebodyfield, so there's no need to add it to subclasses. Also, allRunnerArgumentsfields are now keyword-only. -
CartesiaSTTServicenow inherits fromWebsocketSTTService. -
Package upgrades:
daily-pythonupgraded to 0.20.0.openaiupgraded to support up to 2.x.x.openpipeupgraded to support up to 5.x.x.
-
SpeechmaticsSTTServiceupdated dependencies forspeechmatics-rt>=0.5.0.
Deprecated
-
The
send_transcription_framesargument toAWSNovaSonicLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.aws.nova_sonic.contexthave been deprecated due to changes to supportLLMContext. See "Changed" section for details.
Fixed
-
Fixed an issue where the
RTVIProcessorwas sending duplicateUserStartedSpeakingFrameandUserStoppedSpeakingFramemessages. -
Fixed an issue in
AWSBedrockLLMServicewhere bothtemperatureandtop_pwere always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, andInputParamsdefaults have been changed toNoneto rely on AWS Bedrock's built-in model defaults. -
Fixed an issue in
RivaSegmentedSTTServicewhere a runtime error occurred due to a mismatch in the_handle_transcriptionmethod's signature. -
Fixed multiple pipeline task cancellation issues.
asyncio.CancelledErroris now handled properly inPipelineTaskmaking it possible to cancel an asyncio task that it's executing aPipelineRunnercleanly. Also,PipelineTask.cancel()does not block anymore waiting for theCancelFrameto reach the end of the pipeline (going back to the behavior in < 0.0.83). -
Fixed an issue in
ElevenLabsTTSServiceandElevenLabsHttpTTSServicewhere the Flash models would split words, resulting in a space being inserted between words. -
Fixed an issue where audio filters'
stop()would not be called when usingCancelFrame. -
Fixed an issue in
ElevenLabsHttpTTSService, whereapply_text_normalizationwas incorrectly set as a query parameter. It's now being added as a request parameter. -
Fixed an issue where
RimeHttpTTSServiceandPiperTTSServicecould generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio. -
Fixed an issue in
SpeechmaticsSTTServicewhereAdditionalVocabEntryitems needed to havesounds_likefor the session to start.
Other
-
Added foundational example
47-sentry-metrics.py, demonstrating how to use theSentryMetricsprocessor. -
Added foundational example
14x-function-calling-openpipe.py.
v0.0.90
Added
-
Added audio filter
KrispVivaFilterusing the Krisp VIVA SDK. -
Added
--folderargument to the runner, allowing files saved in that folder to be downloaded fromhttp://HOST:PORT/file/FILE. -
Added
GeminiLiveVertexLLMService, for accessing Gemini Live via Google Vertex AI. -
Added some new configuration options to
GeminiLiveLLMService:thinkingenable_affective_dialogproactivity
Note that these new configuration options require using a newer model than the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last two require specifying
http_options=HttpOptions(api_version="v1alpha"). -
Added
on_pipeline_errorevent toPipelineTask. This event will get fired when anErrorFrameis pushed (useFrameProcessor.push_error()).@task.event_handler("on_pipeline_error") async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame): ...
-
Added a
service_tierInputParamto theBaseOpenAILLMService. This parameter can influence the latency of the response. For example"priority"will result in faster completions, but in exchange for a higher price.
Changed
- Updated
GeminiLiveLLMServiceto use thegoogle-genailibrary rather than use WebSockets directly.
Deprecated
-
LivekitFrameSerializeris now deprecated. UseLiveKitTransportinstead. -
pipecat.service.openai_realtimeis now deprecated, usepipecat.services.openai.realtimeinstead orpipecat.services.azure.realtimefor Azure Realtime. -
pipecat.service.aws_nova_sonicis now deprecated, usepipecat.services.aws.nova_sonicinstead. -
GeminiMultimodalLiveLLMServiceis now deprecated, useGeminiLiveLLMService.
Fixed
-
Fixed a
GoogleVertexLLMServiceissue that would generate an error if no token information was returned. -
GeminiLiveLLMServicewill now end gracefully (i.e. after the bot has finished) upon receiving anEndFrame. -
GeminiLiveLLMServicewill try to seamlessly reconnect when it loses its connection.
v0.0.89
Fixed
- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen when using interruption strategies and processors that block interruption frames (e.g.
STTMuteFilter).