Incorrect Transcription in Flux While Nova-3 Produces Correct Output #1497
-
|
We are observing a transcription inconsistency between Flux and Nova-3 for the same audio input. Ground truth: Flux transcription (Request ID: 6cd3fb23-079a-4ee4-8df1-a070642b3fc9): Nova-3 transcription (Request ID: d663e8e5-513a-4bae-a3eb-2ef5ca0f3442): In this case, Flux produces incorrect word segmentation (e.g., “ad pain” instead of “admin”), which changes the intended meaning. Nova-3, by contrast, matches the ground truth exactly for the same audio. Our use case involves turn-taking and rapid, interactive speech, which is why we prefer using Flux during live voice calls and rely on Nova-3 for post-transcription from the session recordings. However, this inconsistency between the models creates a reliability issue in our workflow. We’d appreciate your guidance on what might be causing this inconsistency and how we can improve Flux’s transcription accuracy. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
|
Our voice service is built in Python, but we don’t use Deepgram’s SDK; instead, we connect via websocket and send HTTP requests directly. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
The main thing here is that it isn't an apples-to-apples comparison between Nova-3 batch vs Flux streaming. Nova-3 in batch mode has full utterance context, while Flux streaming has to make low-latency, incremental decisions. That difference can lead to transcription discrepancies, and because the models are optimized for different latency vs. accuracy tradeoffs, we don’t expect full parity between them.