|
| 1 | +--- |
| 2 | +author: eric-urban |
| 3 | +ms.service: cognitive-services |
| 4 | +ms.topic: include |
| 5 | +ms.date: 03/07/2023 |
| 6 | +ms.author: eur |
| 7 | +--- |
| 8 | + |
| 9 | +[!INCLUDE [Header](../../common/python.md)] |
| 10 | + |
| 11 | +[!INCLUDE [Introduction](intro.md)] |
| 12 | + |
| 13 | +## Prerequisites |
| 14 | + |
| 15 | +[!INCLUDE [Prerequisites](../../common/azure-prerequisites-openai.md)] |
| 16 | + |
| 17 | +## Set up the environment |
| 18 | + |
| 19 | +The Speech SDK for Python is available as a [Python Package Index (PyPI) module](https://pypi.org/project/azure-cognitiveservices-speech/). The Speech SDK for Python is compatible with Windows, Linux, and macOS. |
| 20 | +- You must install the [Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, 2019, and 2022](/cpp/windows/latest-supported-vc-redist?view=msvc-170&preserve-view=true) for your platform. Installing this package for the first time might require a restart. |
| 21 | +- On Linux, you must use the x64 target architecture. |
| 22 | + |
| 23 | +Install a version of [Python from 3.7 to 3.10](https://www.python.org/downloads/). First check the [SDK installation guide](../../../quickstarts/setup-platform.md?pivots=programming-language-python) for any more requirements. |
| 24 | + |
| 25 | +Install the following Python libraries: `os`, `requests`, `json` |
| 26 | + |
| 27 | +### Set environment variables |
| 28 | + |
| 29 | +This example requires environment variables named `OPEN_AI_KEY`, `OPEN_AI_ENDPOINT`, `SPEECH_KEY`, and `SPEECH_REGION`. |
| 30 | + |
| 31 | +[!INCLUDE [Environment variables](../../common/environment-variables-openai.md)] |
| 32 | + |
| 33 | +## Recognize speech from a microphone |
| 34 | + |
| 35 | +Follow these steps to create a new console application. |
| 36 | + |
| 37 | +1. Open a command prompt where you want the new project, and create a new file named `openai-speech.py`. |
| 38 | +1. Run this command to install the Speech SDK: |
| 39 | + ```console |
| 40 | + pip install azure-cognitiveservices-speech |
| 41 | + ``` |
| 42 | +1. Run this command to install the OpenAI SDK: |
| 43 | + ```console |
| 44 | + pip install openai |
| 45 | + ``` |
| 46 | + > [!NOTE] |
| 47 | + > This library is maintained by OpenAI (not Microsoft Azure). Refer to the [release history](https://github.com/openai/openai-python/releases) or the [version.py commit history](https://github.com/openai/openai-python/commits/main/openai/version.py) to track the latest updates to the library. |
| 48 | + |
| 49 | +1. Copy the following code into `openai-speech.py`: |
| 50 | + |
| 51 | + ```Python |
| 52 | + import os |
| 53 | + import azure.cognitiveservices.speech as speechsdk |
| 54 | + import openai |
| 55 | + |
| 56 | + # This example requires environment variables named "OPEN_AI_KEY" and "OPEN_AI_ENDPOINT" |
| 57 | + # Your endpoint should look like the following https://YOUR_OPEN_AI_RESOURCE_NAME.openai.azure.com/ |
| 58 | + openai.api_key = os.environ.get('OPEN_AI_KEY') |
| 59 | + openai.api_base = os.environ.get('OPEN_AI_ENDPOINT') |
| 60 | + openai.api_type = 'azure' |
| 61 | + openai.api_version = '2022-12-01' |
| 62 | + |
| 63 | + # This will correspond to the custom name you chose for your deployment when you deployed a model. |
| 64 | + deployment_id='text-davinci-002' |
| 65 | + |
| 66 | + # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION" |
| 67 | + speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'), region=os.environ.get('SPEECH_REGION')) |
| 68 | + audio_output_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True) |
| 69 | + audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True) |
| 70 | + |
| 71 | + # Should be the locale for the speaker's language. |
| 72 | + speech_config.speech_recognition_language="en-US" |
| 73 | + speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_output_config) |
| 74 | + |
| 75 | + # The language of the voice that responds on behalf of Azure OpenAI. |
| 76 | + speech_config.speech_synthesis_voice_name='en-US-JennyMultilingualNeural' |
| 77 | + speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config) |
| 78 | + |
| 79 | + # Prompts Azure OpenAI with a request and synthesizes the response. |
| 80 | + def ask_openai(prompt): |
| 81 | + |
| 82 | + # Ask Azure OpenAI |
| 83 | + response = openai.Completion.create(engine=deployment_id, prompt=prompt, max_tokens=100) |
| 84 | + text = response['choices'][0]['text'].replace('\n', '').replace(' .', '.').strip() |
| 85 | + print('Azure OpenAI response:' + text) |
| 86 | + |
| 87 | + # Azure text-to-speech output |
| 88 | + speech_synthesis_result = speech_synthesizer.speak_text_async(text).get() |
| 89 | + |
| 90 | + # Check result |
| 91 | + if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted: |
| 92 | + print("Speech synthesized to speaker for text [{}]".format(text)) |
| 93 | + elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled: |
| 94 | + cancellation_details = speech_synthesis_result.cancellation_details |
| 95 | + print("Speech synthesis canceled: {}".format(cancellation_details.reason)) |
| 96 | + if cancellation_details.reason == speechsdk.CancellationReason.Error: |
| 97 | + print("Error details: {}".format(cancellation_details.error_details)) |
| 98 | + |
| 99 | + # Continuously listens for speech input to recognize and send as text to Azure OpenAI |
| 100 | + def chat_with_open_ai(): |
| 101 | + while True: |
| 102 | + print("Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation.") |
| 103 | + try: |
| 104 | + # Get audio from the microphone and then send it to the TTS service. |
| 105 | + speech_recognition_result = speech_recognizer.recognize_once_async().get() |
| 106 | + |
| 107 | + # If speech is recognized, send it to Azure OpenAI and listen for the response. |
| 108 | + if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech: |
| 109 | + if speech_recognition_result.text == "Stop.": |
| 110 | + print("Conversation ended.") |
| 111 | + break |
| 112 | + print("Recognized speech: {}".format(speech_recognition_result.text)) |
| 113 | + ask_openai(speech_recognition_result.text) |
| 114 | + elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch: |
| 115 | + print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details)) |
| 116 | + break |
| 117 | + elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled: |
| 118 | + cancellation_details = speech_recognition_result.cancellation_details |
| 119 | + print("Speech Recognition canceled: {}".format(cancellation_details.reason)) |
| 120 | + if cancellation_details.reason == speechsdk.CancellationReason.Error: |
| 121 | + print("Error details: {}".format(cancellation_details.error_details)) |
| 122 | + print("Did you set the speech resource key and region values?") |
| 123 | + except EOFError: |
| 124 | + break |
| 125 | + |
| 126 | + # Main |
| 127 | + |
| 128 | + try: |
| 129 | + chat_with_open_ai() |
| 130 | + except Exception as err: |
| 131 | + print("Encountered exception. {}".format(err)) |
| 132 | + ``` |
| 133 | +1. To increase or decrease the number of tokens returned by Azure OpenAI, change the `max_tokens` parameter. For more information tokens and cost implications, see [Azure OpenAI tokens](/azure/cognitive-services/openai/overview#tokens) and [Azure OpenAI pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/). |
| 134 | + |
| 135 | +Run your new console application to start speech recognition from a microphone: |
| 136 | + |
| 137 | +```console |
| 138 | +python openai-speech.py |
| 139 | +``` |
| 140 | + |
| 141 | +> [!IMPORTANT] |
| 142 | +> Make sure that you set the `OPEN_AI_KEY`, `OPEN_AI_ENDPOINT`, `SPEECH__KEY` and `SPEECH__REGION` environment variables as described [previously](#set-environment-variables). If you don't set these variables, the sample will fail with an error message. |
| 143 | +
|
| 144 | +Speak into your microphone when prompted. The console output includes the prompt for you to begin speaking, then your request as text, and then the response from Azure OpenAI as text. The response from Azure OpenAI should be converted from text to speech and then output to the default speaker. |
| 145 | + |
| 146 | +```console |
| 147 | +PS C:\dev\openai\python> python.exe .\openai-speech.py |
| 148 | +Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. |
| 149 | +Recognized speech:Make a comma separated list of all continents. |
| 150 | +Azure OpenAI response:Africa, Antarctica, Asia, Australia, Europe, North America, South America |
| 151 | +Speech synthesized to speaker for text [Africa, Antarctica, Asia, Australia, Europe, North America, South America] |
| 152 | +Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. |
| 153 | +Recognized speech: Make a comma separated list of 1 Astronomical observatory for each continent. A list should include each continent name in parentheses. |
| 154 | +Azure OpenAI response:Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica) |
| 155 | +Speech synthesized to speaker for text [Mauna Kea Observatories (North America), La Silla Observatory (South America), Tenerife Observatory (Europe), Siding Spring Observatory (Australia), Beijing Xinglong Observatory (Asia), Naukluft Plateau Observatory (Africa), Rutherford Appleton Laboratory (Antarctica)] |
| 156 | +Azure OpenAI is listening. Say 'Stop' or press Ctrl-Z to end the conversation. |
| 157 | +Conversation ended. |
| 158 | +PS C:\dev\openai\python> |
| 159 | +``` |
| 160 | + |
| 161 | +## Remarks |
| 162 | +Now that you've completed the quickstart, here are some more considerations: |
| 163 | + |
| 164 | +- To change the speech recognition language, replace `en-US` with another [supported language](~/articles/cognitive-services/speech-service/supported-languages.md). For example, `es-ES` for Spanish (Spain). The default language is `en-US` if you don't specify a language. For details about how to identify one of multiple languages that might be spoken, see [language identification](~/articles/cognitive-services/speech-service/language-identification.md). |
| 165 | +- To change the voice that you hear, replace `en-US-JennyMultilingualNeural` with another [supported voice](~/articles/cognitive-services/speech-service/supported-languages.md#prebuilt-neural-voices). If the voice doesn't speak the language of the text returned from Azure OpenAI, the Speech service doesn't output synthesized audio. |
| 166 | +- To use a different [model](/azure/cognitive-services/openai/concepts/models#model-summary-table-and-region-availability), replace `text-davinci-002` with the ID of another [deployment](/azure/cognitive-services/openai/how-to/create-resource#deploy-a-model). Keep in mind that the deployment ID isn't necessarily the same as the model name. You named your deployment when you created it in [Azure OpenAI Studio](https://oai.azure.com/). |
| 167 | +- Azure OpenAI also performs content moderation on the prompt inputs and generated outputs. The prompts or responses may be filtered if harmful content is detected. For more information, see the [content filtering](/azure/cognitive-services/openai/concepts/content-filter) article. |
| 168 | + |
| 169 | +## Clean up resources |
| 170 | + |
| 171 | +[!INCLUDE [Delete resource](../../common/delete-resource.md)] |
0 commit comments