Nova-3 STT – Incorrect Word Durations & Duplicate/Overlapping Segments #1472
Replies: 4 comments 1 reply
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @ofirBigvu, I'm sorry that these cases have been unexpected and difficult as you build with Deepgram. In rare cases, our timestamp algorithm falls back to estimating timestamps with more basic algorithms. This tends to be a rare enough case that while it can impact individual calls, it is not a widespread occurrence, and can be accepted or handled as each application finds best. What percentage of your test calls have you observed this on? What is the critical blocker on your application side when this does occur? If you share full JSON transcripts (or larger excerpts), we can look further to better explain these cases. Since you have the test audio files - do you find that these issues are deterministic and can always be replicated with a particular audio file? Or is it nondeterministic and only sometimes occurs on the same audio? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
🐛 Nova-3 STT – Incorrect Word Durations & Duplicate/Overlapping Segments
Hello Deepgram Team,
My name is Ofir, and I’m a backend developer integrating the Nova-3 STT model into our production pipeline. While testing the new model, I encountered two critical issues that affect timestamp reliability and transcript integrity.
Issue 1: Words Produced With Unrealistically Long Durations
In several responses from Nova-3, some words span many seconds — far beyond any plausible spoken duration.
Example A
"word": "i",
"start": 687.42,
"end": 706.94,
"confidence": 0.22619629,
"punctuated_word": "I"
Duration: 19.52 seconds
Request ID:
a0820f89-8a91-4371-998e-14d59814cd02Example B
"word": "so",
"start": 96.945,
"end": 115.744995,
"confidence": 0.45412618,
"punctuated_word": "So"
Duration: 18.80 seconds
Request ID:
fcca0528-f0e7-4fa6-9d86-9b0366474ccdImpact:
These durations break word-level alignment and make captioning and transcript syncing unreliable.
Issue 2: Duplicate Word Sequences With Overlapping Timestamps
Some transcripts contain duplicated segments where an entire word sequence appears twice with slightly shifted timestamps.
Request ID:
7cb9c1f2-0294-4e64-8dd3-0e2eb76902a5First occurrence:
Duplicate overlapping occurrence:
Impact:
This results in duplicated text and inconsistent timing, affecting transcript accuracy and alignment pipelines.
Summary
Request
Could the Deepgram team investigate whether this is
A model-level timing bug, Or an unintended behavior of the Nova-3 architecture?
I can provide full JSON responses or audio files if needed.
Thank you!
Ofir
Beta Was this translation helpful? Give feedback.
All reactions