All notable changes to pipecat will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed an issue with
SileroVADAnalyzerthat would cause memory to keep growing indefinitely.
- Added
DailyTransport.participants()andDailyTransport.participant_counts().
-
Added
OpenAITTSService. -
Allow passing
output_formatandmodel_idtoCartesiaTTSServiceto change audio sample format and the model to use. -
Added
DailyRESTHelperwhich helps you create Daily rooms and tokens in an easy way. -
PipelineTasknow has ahas_finished()method to indicate if the task has completed. If a task is never ranhas_finished()will return False. -
PipelineRunnernow supports SIGTERM. If received, the runner will be canceled.
-
Fixed an issue where
BaseInputTransportandBaseOutputTransportwhere stopping push tasks before pushingEndFrameframes could cause the bots to get stuck. -
Fixed an error closing local audio transports.
-
Fixed an issue with Deepgram TTS that was introduced in the previous release.
-
Fixed
AnthropicLLMServiceinterruptions. If an interruption occurred, ausermessage could be appended after the previoususermessage. Anthropic does not allow that because it requires alternateuserandassistantmessages.
-
The
BaseInputTransportdoes not pull audio frames from sub-classes any more. Instead, sub-classes now push audio frames into a queue in the base class. Also,DailyInputTransportnow pushes audio frames every 20ms instead of 10ms. -
Remove redundant camera input thread from
DailyInputTransport. This should improve performance a little bit when processing participant videos. -
Load Cartesia voice on startup.
-
Added WebsocketServerTransport. This will create a websocket server and will read messages coming from a client. The messages are serialized/deserialized with protobufs. See
examples/websocket-serverfor a detailed example. -
Added function calling (LLMService.register_function()). This will allow the LLM to call functions you have registered when needed. For example, if you register a function to get the weather in Los Angeles and ask the LLM about the weather in Los Angeles, the LLM will call your function. See https://platform.openai.com/docs/guides/function-calling
-
Added new
LangchainProcessor. -
Added Cartesia TTS support (https://cartesia.ai/)
-
Fixed SileroVAD frame processor.
-
Fixed an issue where
camera_out_enabledwould cause the highg CPU usage if no image was provided.
- Removed unnecessary audio input tasks.
-
Exposed
on_dialin_readyfor Daily transport SIP endpoint handling. This notifies when the Daily room SIP endpoints are ready. This allows integrating with third-party services like Twilio. -
Exposed Daily transport
on_app_messageevent. -
Added Daily transport
on_call_state_updatedevent. -
Added Daily transport
start_recording(),stop_recordingandstop_dialout.
-
Added
PipelineParams. This replaces theallow_interruptionsargument inPipelineTaskand will allow future parameters in the future. -
Fixed Deepgram Aura TTS base_url and added ErrorFrame reporting.
-
GoogleLLMService
api_keyargument is now mandatory.
-
Daily tranport
dialin-readydoesn't not block anymore and it now handles timeouts. -
Fixed AzureLLMService.
- Fixed an issue handling Daily transport
dialin-readyevent.
-
Added Daily transport
start_dialout()to be able to make phone or SIP calls. See https://reference-python.daily.co/api_reference.html#daily.CallClient.start_dialout -
Added Daily transport support for dial-in use cases.
-
Added Daily transport events:
on_dialout_connected,on_dialout_stopped,on_dialout_errorandon_dialout_warning. See https://reference-python.daily.co/api_reference.html#daily.EventHandler
-
Added vision support to Anthropic service.
-
Added
WakeCheckFilterwhich allows you to pass information downstream only if you say a certain phrase/word.
Filterhas been renamed toFrameFilterand it's now underprocessors/filters.
-
Fixed Anthropic service to use new frame types.
-
Fixed an issue in
LLMUserResponseAggregatorandUserResponseAggregatorthat would cause frames after a brief pause to not be pushed to the LLM. -
Clear the audio output buffer if we are interrupted.
-
Re-add exponential smoothing after volume calculation. This makes sure the volume value being used doesn't fluctuate so much.
- In order to improve interruptions we now compute a loudness level using pyloudnorm. The audio coming WebRTC transports (e.g. Daily) have an Automatic Gain Control (AGC) algorithm applied to the signal, however we don't do that on our local PyAudio signals. This means that currently incoming audio from PyAudio is kind of broken. We will fix it in future releases.
-
Fixed an issue where
StartInterruptionFramewould causeLLMUserResponseAggregatorto push the accumulated text causing the LLM respond in the wrong task. TheStartInterruptionFrameshould not trigger any new LLM response because that would be spoken in a different task. -
Fixed an issue where tasks and threads could be paused because the executor didn't have more tasks available. This was causing issues when cancelling and recreating tasks during interruptions.
LLMUserResponseAggregatorandLLMAssistantResponseAggregatorinternal messages are now exposed through themessagesproperty.
- Fixed an issue where
LLMAssistantResponseAggregatorwas not accumulating the full response but short sentences instead. If there's an interruption we only accumulate what the bot has spoken until now in a long response as well.
- Fixed an issue in
DailyOuputTransportwhere transport messages were not being sent.
-
Added
google.generativeaimodel support, including vision. This newgoogleservice defaults to usinggemini-1.5-flash-latest. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added vision support to
openaiservice. Example inexamples/foundational/12a-describe-video-gemini-flash.py. -
Added initial interruptions support. The assistant contexts (or aggregators) should now be placed after the output transport. This way, only the completed spoken context is added to the assistant context.
-
Added
VADParamsso you can control voice confidence level and others. -
VADAnalyzernow uses an exponential smoothed volume to improve speech detection. This is useful when voice confidence is high (because there's someone talking near you) but volume is low.
-
Fixed an issue where TTSService was not pushing TextFrames downstream.
-
Fixed issues with Ctrl-C program termination.
-
Fixed an issue that was causing
StopTaskFrameto actually not exit thePipelineTask.
-
DailyTransport: don't publish camera and audio tracks if not enabled. -
Fixed an issue in
BaseInputTransportthat was causing frames pushed downstream not pushed in the right order.
- Quick hot fix for receiving
DailyTransportMessage.
-
Added
DailyTransporteventon_participant_left. -
Added support for receiving
DailyTransportMessage.
-
Images are now resized to the size of the output camera. This was causing images not being displayed.
-
Fixed an issue in
DailyTransportthat would not allow the input processor to shutdown if no participant ever joined the room. -
Fixed base transports start and stop. In some situation processors would halt or not shutdown properly.
-
MoondreamServiceargumentmodel_idis nowmodel. -
VADAnalyzerarguments have been renamed for more clarity.
-
Fixed an issue with
DailyInputTransportandDailyOutputTransportthat could cause some threads to not start properly. -
Fixed
STTService. Addmax_silence_secsandmax_buffer_secsto handle better what's being passed to the STT service. Also add exponential smoothing to the RMS. -
Fixed
WhisperSTTService. Addno_speech_probto avoid garbage output text.
- Added
DailyTranscriptionSettingsto be able to specify transcription settings much easier (e.g. language).
-
Updated
simple-chatbotwith Spanish. -
Add missing dependencies in some of the examples.
- Allow stopping pipeline tasks with new
StopTaskFrame.
- TTS, STT and image generation service now use
AsyncGenerator.
DailyTransport: allow registering for participant transcriptions even if input transport is not initialized yet.
- Updated
storytelling-chatbot.
-
Added Intel GPU support to
MoondreamService. -
Added support for sending transport messages (e.g. to communicate with an app at the other end of the transport).
-
Added
FrameProcessor.push_error()to easily send anErrorFrameupstream.
- Fixed Azure services (TTS and image generation).
- Updated
simple-chatbot,moondream-chatbotandtranslation-chatbotexamples.
Many things have changed in this version. Many of the main ideas such as frames, processors, services and transports are still there but some things have changed a bit.
-
Frames describe the basic units for processing. For example, text, image or audio frames. Or control frames to indicate a user has started or stopped speaking. -
FrameProcessors process frames (e.g. they convert aTextFrameto anImageRawFrame) and push new frames downstream or upstream to their linked peers. -
FrameProcessors can be linked together. The easiest wait is to use thePipelinewhich is a container for processors. Linking processors allow frames to travel upstream or downstream easily. -
Transports are a way to send or receive frames. There can be local transports (e.g. local audio or native apps), network transports (e.g. websocket) or service transports (e.g. https://daily.co). -
Pipelines are just a processor container for other processors. -
A
PipelineTaskknow how to run a pipeline. -
A
PipelineRunnercan run one or more tasks and it is also used, for example, to capture Ctrl-C from the user.
-
Added
FireworksLLMService. -
Added
InterimTranscriptionFrameand enable interim results inDailyTransporttranscriptions.
FalImageGenServicenow uses newfal_clientpackage.
-
FalImageGenService: useasyncio.to_threadto not block main loop when generating images. -
Allow
TranscriptionFrameafter an end frame (transcriptions can be delayed and received afterUserStoppedSpeakingFrame).
- Add
use_cpuargument toMoondreamService.
-
Added
FalImageGenService.InputParams. -
Added
URLImageFrameandUserImageFrame. -
Added
UserImageRequestFrameand allow requesting an image from a participant. -
Added base
VisionServiceandMoondreamService
-
Don't pass
image_sizetoImageGenService, images should have their own size. -
ImageFramenow receives a tuple(width,height)to specify the size. -
on_first_other_participant_joinednow gets a participant argument.
- Check if camera, speaker and microphone are enabled before writing to them.
DailyTransportonly subscribe to desired participant video track.
-
Use
camera_bitrateandcamera_framerate. -
Increase
camera_framerateto 30 by default.
- Fixed
LocalTransport.read_audio_frames.
- Added project optional dependencies
[silero,openai,...].
-
Moved thransports to its own directory.
-
Use
OPENAI_API_KEYinstead ofOPENAI_CHATGPT_API_KEY.
- Don't write to microphone/speaker if not enabled.
-
Added live translation example.
-
Fix foundational examples.
- Added
storybotandchatbotexamples.
Initial public release.