Transcription Errors When Using Nova-3 #1499
-
|
Hi, We have a voice agent service built in Python, where we use Deepgram’s WebSocket API for real-time speech transcription. During live sessions, the user’s speech is transcribed via Flux. After the session ends, we make an HTTP request with the session’s recording to get the transcription of the whole session where we utilize the Nova-3 model. Also note that in all our recordings, we have 2 channels: the user and the AI agent. The agent channel includes a synthetic voice generated by a Text-to-Speech model, and the user channel is organic. Our question is related to the second step where we try to get the transcription of the whole recording. In the sample whose request ID is given below, around 6:50, the user says “Please update the logo”, but it is transcribed as “Please update लो.” by Nova-3. Request ID: 25186fbb-4406-430b-bc4b-49a99c73b368 We would appreciate your help and insights into why this might be happening and how we can improve such cases. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @afurkankjf , as I investigate your request ID, I see that the detected language for the user's channel was Hindi. You say that the user spoke "Please update the logo". Do you expect users to be speaking a specific language when interacting with your application? If so, then setting that language explicitly will be really helpful in ensuring the correct transcript. Otherwise, incorrect transcriptions may occur when a phrase in English sounds like a phrase in a different language. Additionally, I see that transcript's paragraph says, "Update logo. Please update लो.". In your application, do you know why the first "update logo" didn't trigger a tool call? Did Deepgram not send you back an One other thing to check are the confidence scores of the transcript. With a low enough confidence score, probably < 0.65 or so, you can have your bot ask for clarification. That may prove to be useful in these sort of scenarios. |
Beta Was this translation helpful? Give feedback.
Hi @afurkankjf , as I investigate your request ID, I see that the detected language for the user's channel was Hindi. You say that the user spoke "Please update the logo".
Do you expect users to be speaking a specific language when interacting with your application? If so, then setting that language explicitly will be really helpful in ensuring the correct transcript. Otherwise, incorrect transcriptions may occur when a phrase in English sounds like a phrase in a different language.
Additionally, I see that transcript's paragraph says, "Update logo. Please update लो.". In your application, do you know why the first "update logo" didn't trigger a tool call? Did Deepgram not send you back an
…