Several issues introduced in version 20230306 on audio with silences (repeated text, segment "id" not unique/increasing) #1058
-
This issue looks more general than #1046 and it might be related to #730 (however this was after v20230306). Take this audio: bonjour_vous_allez_bien.mp3 For this command:
below are the differences between outputs of previous version 20230124 (left), new version 20230306 (middle) and 20230306 with
This became obvious when I used greedy decoding, but a similar thing can be observed with less options (with beam search).
On this command, the differences between outputs of previous version 20230124 (left), new version 20230306 (middle) and 20230306 with I also noticed that the following command is particularly long to run:
It takes 1 minute with 4 CPU, whereas it takes 6 sec without |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
I've just tested 20230307 and the two problems are still here:
|
Beta Was this translation helpful? Give feedback.
-
Hi! Thanks for reporting this. The non-unique About the discrepancy between ["text"] and segment["text"], I think it has something to do with these lines: Lines 345 to 356 in aac47c9 EDIT: the culprit was actually this line: Line 290 in aac47c9 will push a fix soon. |
Beta Was this translation helpful? Give feedback.
-
This was an issue where the new I have a fix merged in #1060, so hopefully this resolves your issue! Please let me know if it continues. |
Beta Was this translation helpful? Give feedback.
-
I think the |
Beta Was this translation helpful? Give feedback.
This was an issue where the new
transcribe()
was mishandling theall_tokens
variable which affected the prompts to be more prone to repetitions and also caused the discrepancy between the top-level"text"
field and the segment-level"text"
fields in the JSON response.I have a fix merged in #1060, so hopefully this resolves your issue! Please let me know if it continues.