Skip to content

Commit 4e1b3ec

Browse files
[OpenAI] Add Whisper (Azure#27109)
### Packages impacted by this PR @azure/openai ### Issues associated with this PR None for whisper but has a rudimentary fix for Azure#26953 ### Describe the problem that is addressed by this PR Adds support for speech to text capabilities. See the changelog entry and the samples for more details about the addition. Few notes: - Bring Your Own Data tests are skipped because the new version deployment doesn't support it yet, hopefully the support should be there soon - @azure/core-rest-pipeline's `formDataPolicy` doesn't support file uploads. I added a custom version of the policy in openai that supports file uploads and uses an actively maintained 3rd party library. - adds a fix for Azure#26953 that doesn't rely on core changes (see the changes in `src/api/getSSE.ts` and `src/api/getSSE.browser.ts` files. A better fix is in Azure#27000 but that is still being reviewed. ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? N/A ### Are there test cases added in this PR? _(If not, why?)_ Yes ### Provide a list of related PRs _(if any)_ N/A ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [x] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [x] Added a changelog (if necessary) --------- Co-authored-by: Minh-Anh Phan <[email protected]>
1 parent 245548f commit 4e1b3ec

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1783
-197
lines changed

common/config/rush/pnpm-lock.yaml

Lines changed: 31 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sdk/openai/openai/CHANGELOG.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# Release History
22

3-
## 1.0.0-beta.6 (Unreleased)
3+
## 1.0.0-beta.6 (2023-09-21)
44

55
### Features Added
66

7-
### Breaking Changes
7+
- Introduces speech to text and translation capabilities for a wide variety of audio file formats.
8+
- Adds `getAudioTranscription` and `getAudioTranslation` methods for transcribing and translating audio files. The result can be either a simple JSON structure with just a `text` field or a more detailed JSON structure containing the text alongside additional information. In addition, VTT (Web Video Text Tracks), SRT (SubRip Text), and plain text formats are also supported. The type of the result depends on the `format` parameter if specified, otherwise, a simple JSON output is assumed. The methods could take as input an optional text prompt to guide the model's style or continue a previous audio segment. The language of the prompt should match that of the audio file.
9+
- The available model at the time of this release supports the following list of audio file formats: m4a, mp3, wav, ogg, flac, webm, mp4, mpga, mpeg, and oga.
810

911
### Bugs Fixed
1012

11-
- Return `usage` information when available.
12-
- Return `error` information in `ContentFilterResults` when available.
13-
14-
### Other Changes
13+
- Returns `usage` information when available.
14+
- Fixes a bug where errors weren't properly being thrown from the streaming methods.
15+
- Returns `error` information in `ContentFilterResults` when available.
1516

1617
## 1.0.0-beta.5 (2023-08-25)
1718

sdk/openai/openai/README.md

Lines changed: 61 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ non-Azure OpenAI inference endpoint, making it a great choice for even non-Azure
66

77
Use the client library for Azure OpenAI to:
88

9-
* [Create a completion for text][msdocs_openai_completion]
10-
* [Create a chat completion with ChatGPT][msdocs_openai_chat_completion]
9+
* [Create a completion for text][get_completions_sample]
10+
* [Create a chat completion with ChatGPT][list_chat_completion_sample]
1111
* [Create a text embedding for comparisons][msdocs_openai_embedding]
12-
* [Use your own data with Azure OpenAI][msdocs_openai_custom_data]
12+
* [Use your own data with Azure OpenAI][byod_sample]
13+
* [Generate images][get_images_sample]
14+
* [Transcribe and Translate audio files][transcribe_audio_sample]
1315

1416
Azure OpenAI is a managed service that allows developers to deploy, tune, and generate content from OpenAI models on Azure resources.
1517

@@ -20,6 +22,7 @@ Checkout the following examples:
2022
- [Summarize Text](#summarize-text-with-completion)
2123
- [Generate Images](#generate-images-with-dall-e-image-generation-models)
2224
- [Analyze Business Data](#analyze-business-data)
25+
- [Transcribe and Translate audio files](#transcribe-and-translate-audio-files)
2326

2427
Key links:
2528

@@ -140,6 +143,10 @@ async function main(){
140143
console.log(choice.text);
141144
}
142145
}
146+
147+
main().catch((err) => {
148+
console.error("The sample encountered an error:", err);
149+
});
143150
```
144151

145152
## Examples
@@ -179,6 +186,10 @@ async function main(){
179186
}
180187
}
181188
}
189+
190+
main().catch((err) => {
191+
console.error("The sample encountered an error:", err);
192+
});
182193
```
183194
184195
### Generate Multiple Completions With Subscription Key
@@ -212,6 +223,10 @@ async function main(){
212223
console.log(`Chatbot: ${completion}`);
213224
}
214225
}
226+
227+
main().catch((err) => {
228+
console.error("The sample encountered an error:", err);
229+
});
215230
```
216231
217232
### Summarize Text with Completion
@@ -254,6 +269,9 @@ async function main(){
254269
console.log(`Summarization: ${completion}`);
255270
}
256271

272+
main().catch((err) => {
273+
console.error("The sample encountered an error:", err);
274+
});
257275
```
258276
### Generate images with DALL-E image generation models
259277
@@ -276,6 +294,10 @@ async function main() {
276294
console.log(`Image generation result URL: ${image.url}`);
277295
}
278296
}
297+
298+
main().catch((err) => {
299+
console.error("The sample encountered an error:", err);
300+
});
279301
```
280302
281303
### Analyze Business Data
@@ -285,7 +307,7 @@ This example generates chat responses to input chat questions about your busines
285307
286308
```javascript
287309
const { OpenAIClient } = require("@azure/openai");
288-
const { DefaultAzureCredential } = require("@azure/identity")
310+
const { DefaultAzureCredential } = require("@azure/identity");
289311

290312
async function main(){
291313
const endpoint = "https://myaccount.openai.azure.com/";
@@ -323,6 +345,36 @@ async function main(){
323345
}
324346
}
325347
}
348+
349+
main().catch((err) => {
350+
console.error("The sample encountered an error:", err);
351+
});
352+
```
353+
354+
### Transcribe and translate audio files
355+
356+
The speech to text and translation capabilities of Azure OpenAI can be used to transcribe and translate a wide variety of audio file formats. The following example shows how to use the `getAudioTranscription` method to transcribe audio into the language the audio is in. You can also translate and transcribe the audio into English using the `getAudioTranslation` method.
357+
358+
The audio file can be loaded into memory using the NodeJS file system APIs. In the browser, the file can be loaded using the `FileReader` API and the output of `arrayBuffer` instance method can be passed to the `getAudioTranscription` method.
359+
360+
```js
361+
const { OpenAIClient, AzureKeyCredential } = require("@azure/openai");
362+
const fs = require("fs/promises");
363+
364+
async function main() {
365+
console.log("== Transcribe Audio Sample ==");
366+
367+
const client = new OpenAIClient(endpoint, new AzureKeyCredential(azureApiKey));
368+
const deploymentName = "whisper-deployment";
369+
const audio = await fs.readFile("< path to an audio file >");
370+
const result = await client.getAudioTranscription(deploymentName, audio);
371+
372+
console.log(`Transcription: ${result.text}`);
373+
}
374+
375+
main().catch((err) => {
376+
console.error("The sample encountered an error:", err);
377+
});
326378
```
327379
328380
## Troubleshooting
@@ -340,9 +392,11 @@ setLogLevel("info");
340392
For more detailed instructions on how to enable logs, you can look at the [@azure/logger package docs](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/core/logger).
341393
342394
<!-- LINKS -->
343-
[msdocs_openai_completion]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/completions.js
344-
[msdocs_openai_chat_completion]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/listChatCompletions.js
345-
[msdocs_openai_custom_data]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples-dev/bringYourOwnData.ts
395+
[get_completions_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/completions.js
396+
[list_chat_completion_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/listChatCompletions.js
397+
[byod_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/bringYourOwnData.js
398+
[get_images_sample]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/openai/openai/samples/v1-beta/javascript/getImages.js
399+
[transcribe_audio_sample]: https://github.com/Azure/azure-sdk-for-js/tree/openai/add-whisper/sdk/openai/openai/samples-dev/audioTranscription.ts
346400
[msdocs_openai_embedding]: https://learn.microsoft.com/azure/cognitive-services/openai/concepts/understand-embeddings
347401
[azure_openai_completions_docs]: https://learn.microsoft.com/azure/cognitive-services/openai/how-to/completions
348402
[defaultazurecredential]: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/identity/identity#defaultazurecredential

sdk/openai/openai/assets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
"AssetsRepo": "Azure/azure-sdk-assets",
33
"AssetsRepoPrefixPath": "js",
44
"TagPrefix": "js/openai/openai",
5-
"Tag": "js/openai/openai_353545d522"
5+
"Tag": "js/openai/openai_85d9317957"
66
}
347 KB
Binary file not shown.
162 KB
Binary file not shown.
471 KB
Binary file not shown.
134 KB
Binary file not shown.
600 KB
Binary file not shown.
471 KB
Binary file not shown.

0 commit comments

Comments
 (0)