You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Packages impacted by this PR
@azure/openai
### Issues associated with this PR
None for whisper but has a rudimentary fix for
Azure#26953
### Describe the problem that is addressed by this PR
Adds support for speech to text capabilities. See the changelog entry
and the samples for more details about the addition.
Few notes:
- Bring Your Own Data tests are skipped because the new version
deployment doesn't support it yet, hopefully the support should be there
soon
- @azure/core-rest-pipeline's `formDataPolicy` doesn't support file
uploads. I added a custom version of the policy in openai that supports
file uploads and uses an actively maintained 3rd party library.
- adds a fix for Azure#26953
that doesn't rely on core changes (see the changes in
`src/api/getSSE.ts` and `src/api/getSSE.browser.ts` files. A better fix
is in Azure#27000 but that is
still being reviewed.
### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?
N/A
### Are there test cases added in this PR? _(If not, why?)_
Yes
### Provide a list of related PRs _(if any)_
N/A
### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_
### Checklists
- [x] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [x] Added a changelog (if necessary)
---------
Co-authored-by: Minh-Anh Phan <[email protected]>
Copy file name to clipboardExpand all lines: sdk/openai/openai/CHANGELOG.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,18 @@
1
1
# Release History
2
2
3
-
## 1.0.0-beta.6 (Unreleased)
3
+
## 1.0.0-beta.6 (2023-09-21)
4
4
5
5
### Features Added
6
6
7
-
### Breaking Changes
7
+
- Introduces speech to text and translation capabilities for a wide variety of audio file formats.
8
+
- Adds `getAudioTranscription` and `getAudioTranslation` methods for transcribing and translating audio files. The result can be either a simple JSON structure with just a `text` field or a more detailed JSON structure containing the text alongside additional information. In addition, VTT (Web Video Text Tracks), SRT (SubRip Text), and plain text formats are also supported. The type of the result depends on the `format` parameter if specified, otherwise, a simple JSON output is assumed. The methods could take as input an optional text prompt to guide the model's style or continue a previous audio segment. The language of the prompt should match that of the audio file.
9
+
- The available model at the time of this release supports the following list of audio file formats: m4a, mp3, wav, ogg, flac, webm, mp4, mpga, mpeg, and oga.
8
10
9
11
### Bugs Fixed
10
12
11
-
- Return `usage` information when available.
12
-
- Return `error` information in `ContentFilterResults` when available.
13
-
14
-
### Other Changes
13
+
- Returns `usage` information when available.
14
+
- Fixes a bug where errors weren't properly being thrown from the streaming methods.
15
+
- Returns `error` information in `ContentFilterResults` when available.
console.error("The sample encountered an error:", err);
351
+
});
352
+
```
353
+
354
+
### Transcribe and translate audio files
355
+
356
+
The speech to text and translation capabilities of Azure OpenAI can be used to transcribe and translate a wide variety of audio file formats. The following example shows how to use the `getAudioTranscription` method to transcribe audio into the language the audio is in. You can also translate and transcribe the audio into English using the `getAudioTranslation` method.
357
+
358
+
The audio file can be loaded into memory using the NodeJS file system APIs. In the browser, the file can be loaded using the `FileReader` API and the output of `arrayBuffer` instance method can be passed to the `getAudioTranscription` method.
console.error("The sample encountered an error:", err);
377
+
});
326
378
```
327
379
328
380
## Troubleshooting
@@ -340,9 +392,11 @@ setLogLevel("info");
340
392
For more detailed instructions on how to enable logs, you can look at the [@azure/logger package docs](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/core/logger).
0 commit comments