All notable changes to this project will be documented in this file. This change log follows the conventions of keepachangelog.com.
- Support for Google Gemini llm. See gemini example flow
- Support for text based agents. See text chat example
- Support for different sample rates for the audio splitter processor. See twilio-websocket example for demonstration
- Support for latest openai models (gpt5, gpt4.1, o series and more) See openai.clj for the schema definition
- Twilio transport in Added extra parameter
transport/send-twilio-serializer?to prevent it from including a:transport/serializerin the::frame/system-config-changeissued - Realtime out transport Extra parameter for
transport/serializeras initial argument to the processor - Sentence assembler: Added pipeline interruption support. The processor will drop partial accumulator sentences and go into interrupted mode when a
::frame/control-inerrupt-startis received. All new llm-chunks received while in the interrupted state are dropped and the processor resumes assembling sentences on receiving a::frame/control-interrupt-stop - Elevenlabs TTS: Added pipeline interruption support. The processor will drop partial generated audio and go into interrupted mode when a
::frame/control-inerrupt-startis received. All new speak-frames received while in the interrupted state are dropped and the processor resumes on receiving a::frame/control-interrupt-stop - LLM Processors (Openai and Google): Added pipeline interruption support. The processor will cancel in-flight request for inference when receiving a
::frame/control-interrupt-start. - Transport Out (both realtime and speakers out): Added pipeline interruption support. The process will drain the playback queue when a
::frame/control-interrupt-startwill start and go into interrupted state in which new::frame/audio-out-rawframes will be dropped until a::frame/control-interrupt-stopwill be received - Transport In (twilio, async and microphone): Support for predefined vad processors. Currently the only supported one is
vad.analyser/silerobut hopefully more will come in the future.Example
{:transport-in {:proc transport-in/microphone-transport-in
:args {:vad/analyser :vad.analyser/silero}}} ;; the silero VAD instantiation and cleanup is handled by simulflow- Transport In (twilio, async and microphone: Moved params parsing to use malli schemas, see the schemas defined here.
- Examples: Added example of local microphone AI agent with interruption capability. See here
- Transport In (twilio, async and microphone): Added muting support. When transports receive
mute-input-start, the input isn't processed further and processing resumes whenmute-input-stopframe is received. Useful for guided conversations where you don't want the user to speak over the bot, during initial greetings or during function calls. - User Context Aggregator: Now emits
frame/llm-tool-call-requestwhen a the llm requests a tool call. This frame can be used to make the agent say something while the tool handler is called, or trigger a mute filter while executing tool call. - User Context Aggregator: Now emits
frame/llm-tool-call-resultframes when a the result of a tool call is done. This frame can be used to make the agent say something, or trigger a mute filter while executing tool call. - Mute filter processor: Added mute processor that mutes user input based on specific strategies
- Moved most of the llm logic from openai processor to an utils folder to be used by multiple processors like gemini
- Scenario Manager Fixed a bug where normal utility tool functions were treated as transition functions that returned a nil transition node
- Activity Monitor Fixed a bug where the ping count was not reset when the user said something.
- Silero VAD Fixed reflection warnings when instantiating silero VAD
- Voice Activity Detection protocols that the transport-in processors can use. See protocol here
- Silero Voice Activity Detection model running through Onnx Runtime
- parameter
:vad/analyserto all transport-in processor params to pass a VAD analyser like Silero to be ran on each new audio chunk. This is useful for logic that handles AI interruptions and improves turn taking. - Added malli schema for
audio-output-rawframes - Added
simulflow.frame/sendhelper function that outputs frames based on their appropriate direction - used now in most processors - Twilio Integration: Twilio serializer with
make-twilio-serializerfor WebSocket communication - Audio Resampler Processor: New processor for real-time sample rate conversion
- System Frame Router: New
system-frame-routerprocessor for handling system message routing in complex flows - Interruption Support: New frame types for bot interruption handling:
control-interrupt-start,control-interrupt-stop,bot-interrupt
- Made 16kHz signed PCM mono audio be the default audio that runs through the pipeline. All
audio-input-rawframes that come through pe pipeline are expected to be this way. - POTENTIAL BREAKING
frame/audio-output-raware now expected to contain the sample-rate of the audio. The sample rate will be used for chunking and final output conversion. If you have custom Text to speech generators, you need to update them to use it. - Changed examples to use silero vad model for VAD instead of relying on Deepgram
- BREAKING Processors that outputted
system-frames(see frame/system-frames set for a list of system frames) will output them exclusively on:sys-outchannel and normal frames on:outchannel. - Audio Processing: Enhanced audio conversion with improved µ-law/PCM support and step-by-step conversion planning
- Audio Conversion: Fixed critical bug in PCM 16kHz to Twilio µ-law 8kHz conversion where downsampling must occur before encoding conversion
- Transport: Removed duplicate system frame passthrough that was causing duplicate frames
- Command system to express IO commands from transform function for easier testing - still alfa
- Realtime out transport: Configurable buffering duration between audio chunks:
audio.out/sending-interval(defaults to half of chunk duration) - Realtime out transport: Configurable silence detection threshold:
activity-detection/silence-threshold-ms(defaults to 4 x chunk duration)
- Schema Description: Fixed
schema/->describe-parametersfunction to properly handle:orclauses, now displays human-readable descriptions like "Type: (set of string or vector of string)" instead of unhelpful "Type: :or" - Schema Description: Added recursive schema type description with
describe-schema-typehelper function that properly handles complex nested types including:or,:set,:vector, and basic types - Activity Monitor: Fixed a bug where the transition function wasn't returning correct state
- Realtime out transport: Fixed buffering for realtime transport out
- Explicit timestamp support for frames with
java.util.Date(#inst reader macro) and millisecond integers - Frame creation functions now support optional
optsparameter for timestamp control - Utility functions
normalize-timestampandtimestamp->datefor timestamp conversion - Microphone Transport: Added pure functions
process-mic-bufferandmic-resource-configfor better testability and REPL-driven development - Transport Testing: Comprehensive test suite for transport layer covering microphone transport, audio splitter, and realtime speakers
- Realtime Speakers Out: Added comprehensive test coverage including realistic LLM → audio splitter → speakers pipeline integration test simulating end-to-end audio processing flow
- Realtime Speakers Out: Added extensive unit tests for describe, init, transition, transform, timer handling, system config, serializer integration, and timing accuracy validation
- BREAKING: Frame types now use proper
simulflow.framenamespace (e.g.,:simulflow.frame/user-speech-start) - Fixed schema typos in
user-speech-stopandbot-speech-stopframe definitions - Microphone Transport: Refactored
microphone-transport-into use multi-arity function pattern (mic-transport-in-fn) for better flow integration - Audio Splitter: Refactored
audio-splitterto use multi-arity function pattern (audio-splitter-fn) for better flow integration and consistency with other transport processors
- Better developer experience with static analysis support for frame functions
- Enhanced frame validation and error messages
- More idiomatic Clojure code with proper namespaced keywords
- Activity Monitor: Refactored core logic into pure
transformfunction, making it fully testable and following data-centric functional patterns - ElevenLabs TTS: Extracted transform logic into pure
elevenlabs-tts-transformfunction, improving testability and separation of concerns from WebSocket lifecycle management - ElevenLabs TTS: Migrated from classic threads (
flow/futurize) to virtual threads (vthread-loop) for better performance and resource efficiency - ElevenLabs TTS: Extracted WebSocket configuration logic into pure
create-websocket-configfunction - Microphone Transport: Enhanced error handling with structured logging and non-blocking channel operations using
offer!instead of blocking>!! - Microphone Transport: Migrated to virtual threads (
vthread-loop) for better concurrency performance and resource utilization - Microphone Transport: Improved timestamp accuracy by capturing timestamps at audio capture time rather than processing time
- Microphone Transport: Added graceful frame dropping when channel is full to prevent system backpressure in real-time audio scenarios
- Audio Splitter: Extracted pure functions for audio byte array splitting and chunk size calculation, improving testability and following data-centric design principles
- Audio Splitter: Enhanced with comprehensive edge case handling (nil audio, zero chunk size, exact division) and data integrity verification
- Transport Architecture: Extracted pure functions for audio buffer processing, resource configuration, and audio splitting, improving testability and following data-centric design principles
- Realtime Speakers Out: Following Activity Monitor Pattern, moved business logic from init! background loops to pure transform function for better testability and functional programming alignment
- Realtime Speakers Out: Enhanced with timer-based speech detection in transform function, replacing background monitoring loops with explicit timer tick event handling
- Realtime Speakers Out: Improved state management with explicit data structures and pure function transformations for speech start/stop detection and audio timing calculations
- Test Quality: Added realistic integration testing simulating LLM audio generation → audio splitting → realtime speakers with proper timing validation and data integrity verification
- Updated dependencies to latest
- Unused dependencies: onnx-runtime + java.data
0.1.3-alpha - 2025-04-13
- Change frame format from records to maps with specific meta for easier debugging
- Functionality to describe a process parameters with malli schema only
- Google llm support. Example usage: gemini.clj
- Scenario Manager for handling complex conversation flows
- Activity Monitor to ping user or end call when no activity is detected for specific period
- Bot speaking events tracking
- Support for tool use. See llm-context-aggregator.clj
- Twilio transport in support. See
twilio-transport-intransport.clj - More tests for context aggregation
- Support for dynamic context change Usecase: We have an initial prompt and tools to use. We want to change it based on the custom parameters that are inputted throught the twilio websocket. Example: On the twilio websocket, we can give custom parameters like script-name, overrides like user name, etc.
We can use the config-change frame to do this. And every processor takes what it cares about from it. However, you add very specific functionality to the twilio-in transport. So, what you need to do is add a custom-params->config argument.
:transport-in {:proc transport/twilio-transport-in
:args {:transport/in-ch in
:twilio/handle-event (fn [event]
{:out {:llm/context ".."
:llm/registered-tools [...]}})}- Underlying pipeline implementation to use core.async.flow` (currently unreleased)
pipeline.clj- Removed in favor ofcore.async.flow