Skip to content

Add control for segment clearing#29

Open
ChenghaoMou wants to merge 2 commits intopipecat-ai:mainfrom
ChenghaoMou:fix/segment-clear
Open

Add control for segment clearing#29
ChenghaoMou wants to merge 2 commits intopipecat-ai:mainfrom
ChenghaoMou:fix/segment-clear

Conversation

@ChenghaoMou
Copy link
Copy Markdown

This change makes sure the segment is only cleared after a positive turn detection. This should address the issue mentioned in #25 (comment)

@marcus-daily
Copy link
Copy Markdown
Contributor

Thanks for this! I've tested it and I still don't think the previous incomplete segments are getting included in the complete inference. For example, see the following example where the "complete" segment is shorter than the "incomplete" segment (which means the "incomplete" must have been discarded).

Processing segment (2.37s)...
--------
Prediction: Incomplete
Probability of complete: 0.0126
Inference time: 27.76 ms
Listening for speech...


Processing segment (1.89s)...
--------
Prediction: Complete
Probability of complete: 0.8139
Inference time: 57.68 ms
Listening for speech...

@ChenghaoMou
Copy link
Copy Markdown
Author

@marcus-daily You are right. The speech active flag is also impacting how the buffer accumulates. I will update this PR soon.

@ChenghaoMou
Copy link
Copy Markdown
Author

@marcus-daily I have added a turn flag in addition to the vad/active flag. Here is part of my local testing logs:

>> VAD span started
>> VAD span ended
Processing segment (4.38s)...
--------
Prediction: Incomplete
Probability of complete: 0.0188
Inference time: 31.13 ms
Listening for speech...
>> VAD span started
>> VAD span ended
Processing segment (6.37s)...
--------
Prediction: Incomplete
Probability of complete: 0.0122
Inference time: 41.99 ms
Listening for speech...
>> VAD span started
>> VAD span ended
Processing segment (8.22s)...
--------
Prediction: Complete
Probability of complete: 0.8749
Inference time: 30.35 ms
[WARN] Turn ended
Listening for speech...
>> VAD span started
>> VAD span ended
Processing segment (2.02s)...
--------
Prediction: Complete
Probability of complete: 0.9360
Inference time: 36.49 ms
[WARN] Turn ended
Listening for speech...

I have left some printing statements in for now so you can also test it. I will clean them up before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants