Skip to content

Commit 7c614d7

Browse files
authored
Merge pull request #229013 from eric-urban/eur/speech-openai
Speech to speech chat with OpenAI
2 parents 2d53a84 + d2a7787 commit 7c614d7

File tree

8 files changed

+316
-1
lines changed

8 files changed

+316
-1
lines changed
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.subservice: speech-service
5+
ms.date: 02/28/2023
6+
ms.topic: include
7+
ms.author: eur
8+
---
9+
10+
> [!div class="checklist"]
11+
> * Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services)
12+
> * [Create a Microsoft Azure OpenAI Service resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesOpenAI) in the Azure portal.
13+
> * Deploy a [model](/azure/cognitive-services/openai/concepts/models) in your Azure OpenAI resource. For more information about model deployment, see the Azure OpenAI [resource deployment guide](/azure/cognitive-services/openai/how-to/create-resource).
14+
> * Get the Azure OpenAI resource key and endpoint. After your Azure OpenAI resource is deployed, select **Go to resource** to view and manage keys. For more information about Cognitive Services resources, see [Get the keys for your resource](~/articles/cognitive-services/cognitive-services-apis-create-account.md#get-the-keys-for-your-resource).
15+
> * [Create a Speech resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices) in the Azure portal.
16+
> * Get the Speech resource key and region. After your Speech resource is deployed, select **Go to resource** to view and manage keys. For more information about Cognitive Services resources, see [Get the keys for your resource](~/articles/cognitive-services/cognitive-services-apis-create-account.md#get-the-keys-for-your-resource).
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.subservice: speech-service
5+
ms.topic: include
6+
ms.date: 01/25/2022
7+
ms.author: eur
8+
---
9+
10+
You can use the [Azure portal](~/articles/cognitive-services/cognitive-services-apis-create-account.md#clean-up-resources) or [Azure Command Line Interface (CLI)](~/articles/cognitive-services/cognitive-services-apis-create-account-cli.md#clean-up-resources) to remove the Azure OpenAI and Speech resources you created.

articles/cognitive-services/Speech-Service/includes/common/environment-variables-clu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ author: eric-urban
33
ms.service: cognitive-services
44
ms.subservice: speech-service
55
ms.topic: include
6-
ms.date: 09/14/2022
6+
ms.date: 02/28/2023
77
ms.author: eur
88
---
99

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.subservice: speech-service
5+
ms.topic: include
6+
ms.date: 02/28/2023
7+
ms.author: eur
8+
---
9+
10+
Your application must be authenticated to access Cognitive Services resources. For production, use a secure way of storing and accessing your credentials. For example, after you [get a key](~/articles/cognitive-services/cognitive-services-apis-create-account.md#get-the-keys-for-your-resource) for your <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices" title="Create a Speech resource" target="_blank">Speech resource</a>, write it to a new environment variable on the local machine running the application.
11+
12+
> [!TIP]
13+
> Don't include the key directly in your code, and never post it publicly. See the Cognitive Services [security](../../../security-features.md) article for more authentication options like [Azure Key Vault](../../../use-key-vault.md).
14+
15+
To set the environment variables, open a console window, and follow the instructions for your operating system and development environment.
16+
- To set the `OPEN_AI_KEY` environment variable, replace `your-openai-key` with one of the keys for your resource.
17+
- To set the `OPEN_AI_ENDPOINT` environment variable, replace `your-openai-endpoint` with one of the regions for your resource.
18+
- To set the `SPEECH_KEY` environment variable, replace `your-speech-key` with one of the keys for your resource.
19+
- To set the `SPEECH_REGION` environment variable, replace `your-speech-region` with one of the regions for your resource.
20+
21+
#### [Windows](#tab/windows)
22+
23+
```console
24+
setx OPEN_AI_KEY your-openai-key
25+
setx OPEN_AI_ENDPOINT your-openai-endpoint
26+
setx SPEECH_KEY your-speech-key
27+
setx SPEECH_REGION your-speech-region
28+
```
29+
30+
> [!NOTE]
31+
> If you only need to access the environment variable in the current running console, you can set the environment variable with `set` instead of `setx`.
32+
33+
After you add the environment variables, you may need to restart any running programs that will need to read the environment variable, including the console window. For example, if you are using Visual Studio as your editor, restart Visual Studio before running the example.
34+
35+
#### [Linux](#tab/linux)
36+
37+
```bash
38+
export OPEN_AI_KEY=your-openai-key
39+
export OPEN_AI_ENDPOINT=your-openai-endpoint
40+
export SPEECH_KEY=your-speech-key
41+
export SPEECH_REGION=your-speech-region
42+
```
43+
44+
After you add the environment variables, run `source ~/.bashrc` from your console window to make the changes effective.
45+
46+
#### [macOS](#tab/macos)
47+
48+
##### Bash
49+
50+
Edit your .bash_profile, and add the environment variables:
51+
52+
```bash
53+
export OPEN_AI_KEY=your-openai-key
54+
export OPEN_AI_ENDPOINT=your-openai-endpoint
55+
export SPEECH_KEY=your-speech-key
56+
export SPEECH_REGION=your-speech-region
57+
```
58+
59+
After you add the environment variables, run `source ~/.bash_profile` from your console window to make the changes effective.
60+
61+
##### Xcode
62+
63+
For iOS and macOS development, you set the environment variables in Xcode. For example, follow these steps to set the environment variable in Xcode 13.4.1.
64+
65+
1. Select **Product** > **Scheme** > **Edit scheme**
66+
1. Select **Arguments** on the **Run** (Debug Run) page
67+
1. Under **Environment Variables** select the plus (+) sign to add a new environment variable.
68+
1. Enter `SPEECH_KEY` for the **Name** and enter your Speech resource key for the **Value**.
69+
70+
Repeat the steps to set other required environment variables.
71+
72+
For more configuration options, see the [Xcode documentation](https://help.apple.com/xcode/#/dev745c5c974).
73+
***
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.topic: include
5+
ms.date: 02/28/2023
6+
ms.author: eur
7+
---
8+
9+
> [!IMPORTANT]
10+
> To complete the steps in this guide, access must be granted to Microsoft Azure OpenAI Service in the desired Azure subscription. Currently, access to this service is granted only by application. You can apply for access to Azure OpenAI by completing the form at [https://aka.ms/oai/access](https://aka.ms/oai/access).
11+
12+
In this how-to guide, you can use [Speech](../../../overview.md) to converse with [Azure OpenAI](/azure/cognitive-services/openai/overview). The text recognized by the Speech service is sent to Azure OpenAI. The text response from Azure OpenAI is then synthesized by the Speech service.
13+
14+
Speak into the microphone to start a conversation with Azure OpenAI.
15+
- Azure Cognitive Services Speech recognizes your speech and converts it into text (speech-to-text).
16+
- Your request as text is sent to Azure OpenAI.
17+
- Azure Cognitive Services Speech synthesizes (text-to-speech) the response from Azure OpenAI to the default speaker.
18+
19+
Although the experience of this example is a back-and-forth exchange, Azure OpenAI doesn't remember the context of your conversation.
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
---
2+
author: eric-urban
3+
ms.service: cognitive-services
4+
ms.topic: include
5+
ms.date: 03/07/2023
6+
ms.author: eur
7+
---
8+
9+
[!INCLUDE [Header](../../common/python.md)]
10+
11+
[!INCLUDE [Introduction](intro.md)]
12+
13+
## Prerequisites
14+
15+
[!INCLUDE [Prerequisites](../../common/azure-prerequisites-openai.md)]
16+
17+
## Set up the environment
18+
19+
The Speech SDK for Python is available as a [Python Package Index (PyPI) module](https://pypi.org/project/azure-cognitiveservices-speech/). The Speech SDK for Python is compatible with Windows, Linux, and macOS.
20+
- You must install the [Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, 2019, and 2022](/cpp/windows/latest-supported-vc-redist?view=msvc-170&preserve-view=true) for your platform. Installing this package for the first time might require a restart.
21+
- On Linux, you must use the x64 target architecture.
22+
23+
Install a version of [Python from 3.7 to 3.10](https://www.python.org/downloads/). First check the [SDK installation guide](../../../quickstarts/setup-platform.md?pivots=programming-language-python) for any more requirements.
24+
25+
Install the following Python libraries: `os`, `requests`, `json`
26+
27+
### Set environment variables
28+
29+
This example requires environment variables named `OPEN_AI_KEY`, `OPEN_AI_ENDPOINT`, `SPEECH_KEY`, and `SPEECH_REGION`.
30+
31+
[!INCLUDE [Environment variables](../../common/environment-variables-openai.md)]
32+
33+
## Recognize speech from a microphone
34+
35+
Follow these steps to create a new console application.
36+
37+
1. Open a command prompt where you want the new project, and create a new file named `openai-speech.py`.
38+
1. Run this command to install the Speech SDK:
39+
```console
40+
pip install azure-cognitiveservices-speech
41+
```
42+
1. Run this command to install the OpenAI SDK:
43+
```console
44+
pip install openai
45+
```
46+
> [!NOTE]
47+
> This library is maintained by OpenAI (not Microsoft Azure). Refer to the [release history](https://github.com/openai/openai-python/releases) or the [version.py commit history](https://github.com/openai/openai-python/commits/main/openai/version.py) to track the latest updates to the library.
48+
49+
1. Copy the following code into `openai-speech.py`:
50+
51+
```Python
52+
import os
53+
import azure.cognitiveservices.speech as speechsdk
54+
import openai
55+
56+
# This example requires environment variables named "OPEN_AI_KEY" and "OPEN_AI_ENDPOINT"
57+
# Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/
58+
openai.api_key = os.environ.get('OPEN_AI_KEY')
59+
openai.api_base = os.environ.get('OPEN_AI_ENDPOINT')
60+
openai.api_type = 'azure'
61+
openai.api_version = '2022-12-01'
62+
63+
# This will correspond to the custom name you chose for your deployment when you deployed a model.
64+
deployment_id='text-davinci-002'
65+
66+
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
67+
speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION'))
68+
audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
69+
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
70+
71+
# Should be the locale for the speaker's language.
72+
speech_config.speech_recognition_language="en-US"
73+
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_output_config)
74+
75+
# The language of the voice that responds on behalf of Azure OpenAI.
76+
speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural'
77+
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
78+
79+
# Prompts Azure OpenAI with a request and synthesizes the response.
80+
def ask_openai(prompt):
81+
82+
# Ask Azure OpenAI
83+
response = openai.Completion.create(engine=deployment_id, prompt=prompt, max_tokens=100)
84+
text = response['choices'][0]['text'].replace('\n', '').replace(' .', '.').strip()
85+
print('Azure OpenAI response:' + text)
86+
87+
# Azure text-to-speech output
88+
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
89+
90+
# Check result
91+
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
92+
print("Speech synthesized to speaker for text [{}]".format(text))
93+
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
94+
cancellation_details = speech_synthesis_result.cancellation_details
95+
print("Speech synthesis canceled: {}".format(cancellation_details.reason))
96+
if cancellation_details.reason == speechsdk.CancellationReason.Error:
97+
print("Error details: {}".format(cancellation_details.error_details))
98+
99+
# Continuously listens for speech input to recognize and send as text to Azure OpenAI
100+
def chat_with_open_ai():
101+
while True:
102+
print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.")
103+
try:
104+
# Get audio from the microphone and then send it to the TTS service.
105+
speech_recognition_result = speech_recognizer.recognize_once_async().get()
106+
107+
# If speech is recognized, send it to Azure OpenAI and listen for the response.
108+
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
109+
if speech_recognition_result.text == "Stop.":
110+
print("Conversation ended.")
111+
break
112+
print("Recognized speech: {}".format(speech_recognition_result.text))
113+
ask_openai(speech_recognition_result.text)
114+
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
115+
print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
116+
break
117+
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
118+
cancellation_details = speech_recognition_result.cancellation_details
119+
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
120+
if cancellation_details.reason == speechsdk.CancellationReason.Error:
121+
print("Error details: {}".format(cancellation_details.error_details))
122+
print("Did you set the speech resource key and region values?")
123+
except EOFError:
124+
break
125+
126+
# Main
127+
128+
try:
129+
chat_with_open_ai()
130+
except Exception as err:
131+
print("Encountered exception. {}".format(err))
132+
```
133+
1. To increase or decrease the number of tokens returned by Azure OpenAI, change the `max_tokens` parameter. For more information tokens and cost implications, see [Azure OpenAI tokens](/azure/cognitive-services/openai/overview#tokens) and [Azure OpenAI pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
134+
135+
Run your new console application to start speech recognition from a microphone:
136+
137+
```console
138+
python openai-speech.py
139+
```
140+
141+
> [!IMPORTANT]
142+
> Make sure that you set the `OPEN_AI_KEY`, `OPEN_AI_ENDPOINT`, `SPEECH__KEY` and `SPEECH__REGION` environment variables as described [previously](#set-environment-variables). If you don't set these variables, the sample will fail with an error message.
143+
144+
Speak into your microphone when prompted. The console output includes the prompt for you to begin speaking, then your request as text, and then the response from Azure OpenAI as text. The response from Azure OpenAI should be converted from text to speech and then output to the default speaker.
145+
146+
```console
147+
PS C:\dev\openai\python> python.exe .\openai-speech.py
148+
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
149+
Recognized speech:Make a comma separated list of all continents.
150+
Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America
151+
Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America]
152+
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
153+
Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses.
154+
Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)
155+
Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)]
156+
Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.
157+
Conversation ended.
158+
PS C:\dev\openai\python>
159+
```
160+
161+
## Remarks
162+
Now that you've completed the quickstart, here are some more considerations:
163+
164+
- To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md).
165+
- To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/cognitive-services/speech-service/supported-languages.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio.
166+
- To use a different [model](/azure/cognitive-services/openai/concepts/models#model-summary-table-and-region-availability), replace `text-davinci-002` with the ID of another [deployment](/azure/cognitive-services/openai/how-to/create-resource#deploy-a-model). Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/).
167+
- Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the [content filtering](/azure/cognitive-services/openai/concepts/content-filter) article.
168+
169+
## Clean up resources
170+
171+
[!INCLUDE [Delete resource](../../common/delete-resource.md)]
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: "Azure OpenAI speech to speech chat - Speech service"
3+
titleSuffix: Azure Cognitive Services
4+
description: In this how-to guide, you can use Speech to converse with Azure OpenAI. The text recognized by the Speech service is sent to Azure OpenAI. The text response from Azure OpenAI is then synthesized by the Speech service.
5+
services: cognitive-services
6+
author: eric-urban
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: how-to
11+
ms.date: 03/07/2023
12+
ms.author: eur
13+
ms.devlang: python
14+
keywords: speech to text, openai
15+
---
16+
17+
# Azure OpenAI speech to speech chat
18+
19+
[!INCLUDE [Python include](./includes/quickstarts/openai-speech/python.md)]
20+
21+
## Next steps
22+
23+
- [Learn more about Speech](overview.md)
24+
- [Learn more about Azure OpenAI](/azure/cognitive-services/openai/overview)

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,8 @@ items:
238238
- name: How to use Pronunciation Assessment
239239
href: how-to-pronunciation-assessment.md
240240
displayName: pronounce, learn language, assess pron
241+
- name: Azure OpenAI speech to speech chat
242+
href: openai-speech.md
241243
- name: Conversation transcription
242244
items:
243245
- name: Conversation Transcription overview

0 commit comments

Comments
 (0)