Use System.Speech.Synthesis for implementing a IVR #678

Zax · 2022-01-28T14:06:33Z

Zax
Jan 28, 2022

Hello,
first of all thanks for the work done! For me it is very precious. I wanted to ask you for help.
I am trying to implement an IVR with your library.
I wanted to use the Windows SpeechSynthesizer, using Windows built-in voices.
I am able to make the call correctly and have the text indicated by me read aloud,
however, the voice on the phone is deformed and decidedly not very pleasant.
If instead the same message I record it to a file or play it locally feels good.
I think it's a sampling problem or something, but I'm not an expert at all.

This my code:

using SIPSorcery.Media;
using SIPSorcery.SIP.App;
using SIPSorcery.SIP;
using System.Threading;
using SIPSorceryMedia.Windows;
using System.Threading.Tasks;
using SIPSorceryMedia.Abstractions;
using System.Speech.Synthesis;
using System.IO;
using System.Linq;

namespace IVR.Test
{
  internal static class Program
  {
    private static readonly string DESTINATION = "*******";
    private static readonly string Username = "*******";
    private static readonly string Password = "*******";
    private static readonly SIPEndPoint OUTBOUND_PROXY = null;

    static async Task Main(string[] args)
    {
      var numero = "******";
      var tts = "Ciao Paolo, come stai?";
      using var synth = new SpeechSynthesizer();
      InstalledVoice voce = synth.GetInstalledVoices().FirstOrDefault(v => v.Enabled && v.VoiceInfo.Culture.Name == "it-IT");
      synth.SelectVoice(voce.VoiceInfo?.Name);
      synth.SetOutputToDefaultAudioDevice();
      // test locale
      // synth.Speak(testo);
      CancellationTokenSource exitCts = new CancellationTokenSource();
      var sipTransport = new SIPTransport();
      sipTransport.EnableTraceLogs();
      var userAgent = new SIPUserAgent(sipTransport, OUTBOUND_PROXY);
      userAgent.ClientCallFailed += (uac, error, sipResponse) => exitCts.Cancel();
      userAgent.OnCallHungup += (_) => exitCts.Cancel();

      var windowsAudio = new WindowsAudioEndPoint(new AudioEncoder());
      var voipMediaSession = new VoIPMediaSession(windowsAudio.ToMediaEndPoints());
      voipMediaSession.AcceptRtpFromAny = true;
      voipMediaSession.AudioExtrasSource.AudioSamplePeriodMilliseconds = 20;

      var callTask = userAgent.Call(numero + DESTINATION, Username, Password, voipMediaSession);
      bool callResult = await callTask;
      if (callResult)
      {
        await windowsAudio.PauseAudio();
        try
        {
          await Task.Delay(1000, exitCts.Token);
          using (var streamAudio = new MemoryStream())
          {
            synth.SetOutputToWaveStream(streamAudio);
            synth.Speak(tts);
            streamAudio.Position = 0;
            await voipMediaSession.AudioExtrasSource.SendAudioFromStream(streamAudio, AudioSamplingRatesEnum.Rate16KHz);
            synth.SetOutputToNull();
          }
          await voipMediaSession.AudioExtrasSource.PauseAudio();
          await Task.Delay(200, exitCts.Token);
          userAgent.Hangup();
        }
        catch (TaskCanceledException)
        { }
      }
      if (userAgent?.IsHangingUp == true)
        await Task.Delay(1000);
      sipTransport.Shutdown();
    }
  }
}

sipsorcery · 2022-01-29T08:59:12Z

sipsorcery
Jan 29, 2022
Maintainer

Is the synthesised voice understandable at all? If so then it's probably a sampling rate problem. If it was an incorrect audio format or codec it would just sound like noise.

From what I can gather the System.Speech.Synthesis, and its newer .NET Core version Microsoft.CognitiveServices.Speech, call out to the Azure speech service to do the synthesis. If that is the csae this example may be useful.

0 replies

Zax · 2022-01-31T12:09:51Z

Zax
Jan 31, 2022
Author

Hi, thanks for the reply.

I also believe it is a sampling rate problem. The words of the voice are distinguishable, even if distorted. To make you understand the problem better I tried to save the same sentence (Hello World) read by the same voice both locally (SetOutputToWaveFile) and by capturing it from the phone. I am attaching the two mp3 files.
Unfortunately I cannot use Azure as a TTS service because the program must work without external resources, but I will definitely do a test to see if I have the same problem.

Thanks again

HelloWorld-test.mp3.zip
HelloWorld-caputered.mp3.zip

0 replies

Zax · 2022-01-31T16:13:10Z

Zax
Jan 31, 2022
Author

Hi, I found that by default the synth.SetOutputToWaveStream () method produces a 22KHz stream, but you can change the preferences this way:

synth.SetOutputToAudioStream(streamAudio, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 16000, 2, null));

...and indeed the voice improves. In your opinion what are the best parameters for SpeechAudioFormatInfo considering the next call:

await voipMediaSession.AudioExtrasSource.SendAudioFromStream(streamAudio, AudioSamplingRatesEnum.Rate16KHz);

thank a lot

2 replies

sipsorcery Jan 31, 2022
Maintainer

For SIP and WebRTC the PCM format always uses 8KHz so that's the ideal frequency.

When you sue 16KHz, as in your example, the sipsorcery library does a crude downsampling by skipping every 2nd sample. The speech synthesiser will produce a better 8Khz sample than that.

Zax Jan 31, 2022
Author

thank you for your suggestion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use System.Speech.Synthesis for implementing a IVR #678

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Use System.Speech.Synthesis for implementing a IVR #678

Uh oh!

Zax Jan 28, 2022

Replies: 3 comments · 2 replies

Uh oh!

sipsorcery Jan 29, 2022 Maintainer

Uh oh!

Zax Jan 31, 2022 Author

Uh oh!

Zax Jan 31, 2022 Author

Uh oh!

sipsorcery Jan 31, 2022 Maintainer

Uh oh!

Zax Jan 31, 2022 Author

Zax
Jan 28, 2022

Replies: 3 comments 2 replies

sipsorcery
Jan 29, 2022
Maintainer

Zax
Jan 31, 2022
Author

Zax
Jan 31, 2022
Author

sipsorcery Jan 31, 2022
Maintainer

Zax Jan 31, 2022
Author