|
3 | 3 |
|
4 | 4 | * TODO Make new scenario node trigger the activity detection timer as well |
5 | 5 |
|
| 6 | +* TODO Create Mute Filter processor that makes transport in mute the user based on certain conditions |
| 7 | + |
6 | 8 | * DONE Add standard =vad= keywords that are supported by simulflow by default (currently just =silero/vad=) |
7 | 9 | CLOSED: [2025-08-27 Wed 13:23] |
8 | 10 | :LOGBOOK: |
@@ -253,6 +255,29 @@ will receive back the same system frame from the system route simply by the |
253 | 255 | nature of the setup. |
254 | 256 |
|
255 | 257 | ** TODO Make assistant context aggregator support interrupt :mvp: |
| 258 | + |
| 259 | +** Adding to context only what the user has heard before interruption happened. |
| 260 | + |
| 261 | +We'll need to keep a =context-id= or =sentence-id= for each resulting |
| 262 | +=audio-out-raw= frame from TTS service. The original =sentence-id= will be kept on |
| 263 | +each =audio-out-raw= resulting from [[file:src/simulflow/transport.clj::(def audio-splitter][audio splitter]] to provide realtime. |
| 264 | + |
| 265 | +The TTS processor will output a word-timestamp frame with the same =sentence-id= |
| 266 | +so it can be matched when playback happens. |
| 267 | + |
| 268 | +The [[file:src/simulflow/transport/out.clj::(def realtime-out-processor][transport-out]] processor will receive the realtime =audio-out-raw= frames and |
| 269 | +keep the =sentence-id= in local state until it has been played back: |
| 270 | +1. Depending which one comes it first: =word-timestamp= frame or the |
| 271 | + =audio-out-raw=, the state will keep a map of ={sentence-id: |
| 272 | + {word-timestamps started-playback?}}= |
| 273 | +2. When the first audio frame is played back, started-playback? is turned to true |
| 274 | +3. A new command will be added: =:command/output-words=. The handler from the |
| 275 | + init processor will receive the =word-timestamp= data and wait based on the |
| 276 | + computed end time of each word to send back to the transform a =word-played= |
| 277 | + msg which will output a =WordPlayedFrame= or a =word-heard= that has the |
| 278 | + =sentence-id=, =word= and if it marks the sentence end. |
| 279 | +4. The LLMSentenc |
| 280 | + |
256 | 281 | * TODO Add support for first message greeting in the pipeline :mvp: |
257 | 282 | * TODO Add support for [[https://github.com/fixie-ai/ultravox][ultravox]] |
258 | 283 |
|
|
0 commit comments