Skip to content

Latest commit

 

History

History
236 lines (218 loc) · 9.16 KB

File metadata and controls

236 lines (218 loc) · 9.16 KB

Archived entries from file /Users/ovistoica/workspace/simulflow/TODO.org

Add support for configuration change.

Usecase: We have an initial prompt and tools to use. We want to change it based on the custom parameters that are inputted throught the twilio websocket. Example: On the twilio websocket, we can give custom parameters like script-name, overrides like user name, etc.

We can use the config-change frame to do this. And every processor takes what it cares about from it. However, you add very specific functionality to the twilio-in transport. So, what you need to do is add a custom-params->config argument.

:transport-in {:proc transport/twilio-transport-in
               :args {:transport/in-ch in
                      :twilio/handle-event (fn [event]
                                             {:out {:llm/context ".."
                                                    :llm/registered-tools [...]}})}

add core.async.flow support

Research a way to add clj-kondo schema type hints for frames in the macro

(defframe my-cool-frame
  "This is a cool frame"
  {:type :frame.cool/hello
   :schema [:map
            [:messages LLMContextMessages]
            [:tools LLMTools]]})

Add tools calls support

Create buffered output transport that sends chunks of 20ms at a 10ms interval

Handle Start/Stop interruption frames in LLM and TTS and other assemblers

Add assembler that takes in interim transcripts based on VAD

Add VAD events from deepgram

Add schema validation with defaults

Change tool_call declaration to include the handler, to enable changing available tools on the fly

If the function :handler returns a channel, the tool-caller will block until a result is put on the channel, optionally with a timeout
:functions [{:type :function
                  :function
                  {:name "record_party_size"
                   :handler (fn [{:keys [size]}] ...)
                   :description "Record the number of people in the party"
                   :parameters
                   {:type :object
                    :properties
                    {:size {:type :integer
                            :minimum 1
                            :maximum 12}}
                    :required [:size]}
                   :transition-to :get-time}}]

After this, basically we can just emit a frame/llm-context and that will update the current context. However the scenario manager needs to

Fix end the call function

Differences between pipecat and simulflow

  1. (I think) simulflow TTS processors whould keep a :pipeline/interrupted? state because when the processor receives a speak-frame, it sends it on the websocket connection to the actual TTS provider that may send one or more events back that need to be accumulated to construct the full audio eequivalent of the text from the speak-frame. Therefore we keep the pipeline/interrupted? flag so when new data is received on the websocket the processor drops them.
  2. We need a way to clear the “playback queue”. Currently the playback queue is represented by the file:src/simulflow/transport/out.clj::audio-write-ch (a/chan 1024)\[audio-write-channel]] defined. There is a drain-channel! function which will work but we need to introduce two channels to communicate with the [[file:src/simulflow/transport/out.clj::(vthread-loop \[\]][process running in a vthread]] that sends audio to out. One for commands to drain audio, and one on which to take audio from (the current existing one)
  3. Pipecat uses a bidirectional queue system between processors:

Transport in <-> Transcriptor <-> Context Aggregator <-> LLM <-> TTS <-> a