Skip to content

Commit 7a6f870

Browse files
committed
Update TODOs
1 parent 77cb207 commit 7a6f870

File tree

2 files changed

+27
-1
lines changed

2 files changed

+27
-1
lines changed

TODO.org

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33

44
* TODO Make new scenario node trigger the activity detection timer as well
55

6+
* TODO Create Mute Filter processor that makes transport in mute the user based on certain conditions
7+
68
* DONE Add standard =vad= keywords that are supported by simulflow by default (currently just =silero/vad=)
79
CLOSED: [2025-08-27 Wed 13:23]
810
:LOGBOOK:
@@ -253,6 +255,29 @@ will receive back the same system frame from the system route simply by the
253255
nature of the setup.
254256

255257
** TODO Make assistant context aggregator support interrupt :mvp:
258+
259+
** Adding to context only what the user has heard before interruption happened.
260+
261+
We'll need to keep a =context-id= or =sentence-id= for each resulting
262+
=audio-out-raw= frame from TTS service. The original =sentence-id= will be kept on
263+
each =audio-out-raw= resulting from [[file:src/simulflow/transport.clj::(def audio-splitter][audio splitter]] to provide realtime.
264+
265+
The TTS processor will output a word-timestamp frame with the same =sentence-id=
266+
so it can be matched when playback happens.
267+
268+
The [[file:src/simulflow/transport/out.clj::(def realtime-out-processor][transport-out]] processor will receive the realtime =audio-out-raw= frames and
269+
keep the =sentence-id= in local state until it has been played back:
270+
1. Depending which one comes it first: =word-timestamp= frame or the
271+
=audio-out-raw=, the state will keep a map of ={sentence-id:
272+
{word-timestamps started-playback?}}=
273+
2. When the first audio frame is played back, started-playback? is turned to true
274+
3. A new command will be added: =:command/output-words=. The handler from the
275+
init processor will receive the =word-timestamp= data and wait based on the
276+
computed end time of each word to send back to the transform a =word-played=
277+
msg which will output a =WordPlayedFrame= or a =word-heard= that has the
278+
=sentence-id=, =word= and if it marks the sentence end.
279+
4. The LLMSentenc
280+
256281
* TODO Add support for first message greeting in the pipeline :mvp:
257282
* TODO Add support for [[https://github.com/fixie-ai/ultravox][ultravox]]
258283

doc/implementation/drafts.org

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,8 @@ refactored.
4646

4747
* Audio in transport
4848

49-
** TODO Provide a transport in processor that just takes a in channel and receives in frames on it (might be there already)
49+
** DONE Provide a transport in processor that just takes a in channel and receives in frames on it (might be there already)
50+
CLOSED: [2025-09-03 Wed 09:46]
5051

5152
* Interruptions - Make the pipeline interruptible either through VAD or Smart Turn detecton
5253

0 commit comments

Comments
 (0)