Skip to content

Commit 0cfc2e5

Browse files
bug: Allow passing options to useSpeechToText model.stream() (#648)
## Description When using multilingual transcription model (e.g. `WHISPER_TINY`) language is expected, but there's no way to pass it to the `model.stream()` method: Encountered error: ``` [Error: Model is multilingual, provide a language] ``` IDE: <img width="578" height="220" alt="Screenshot 2025-10-14 at 22 21 32" src="https://github.com/user-attachments/assets/9f9ff6d1-ea4a-4571-8a64-4aed7a4200a5" /> It looks like `SpeechToTextModule.stream()` does accept options with language, but the wrapper function in `useSpeechToText()` does not, making it impossible to use the multilingual model. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [x] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [ ] Android ### Testing instructions <!-- Provide step-by-step instructions on how to test your changes. Include setup details if necessary. --> ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues <!-- Link related issues here using #issue-number --> ### Checklist - [x] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. --> --------- Co-authored-by: Michał Pałys-Dudek <[email protected]>
1 parent a31d565 commit 0cfc2e5

File tree

2 files changed

+24
-19
lines changed

2 files changed

+24
-19
lines changed

docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ For more information on loading resources, take a look at [loading models](../..
7878
| Field | Type | Description |
7979
| --------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
8080
| `transcribe` | `(waveform: Float32Array \| number[], options?: DecodingOptions \| undefined) => Promise<string>` | Starts a transcription process for a given input array, which should be a waveform at 16kHz. The second argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Resolves a promise with the output transcription when the model is finished. Passing `number[]` is deprecated. |
81-
| `stream` | `() => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
81+
| `stream` | `(options?: DecodingOptions \| undefined) => Promise<string>` | Starts a streaming transcription process. Use in combination with `streamInsert` to feed audio chunks and `streamStop` to end the stream. The argument is an options object, e.g. `{ language: 'es' }` for multilingual models. Updates `committedTranscription` and `nonCommittedTranscription` as transcription progresses. |
8282
| `streamInsert` | `(waveform: Float32Array \| number[]) => void` | Inserts a chunk of audio data (sampled at 16kHz) into the ongoing streaming transcription. Call this repeatedly as new audio data becomes available. Passing `number[]` is deprecated. |
8383
| `streamStop` | `() => void` | Stops the ongoing streaming transcription process. |
8484
| `encode` | `(waveform: Float32Array \| number[]) => Promise<Float32Array>` | Runs the encoding part of the model on the provided waveform. Passing `number[]` is deprecated. |

packages/react-native-executorch/src/hooks/natural_language_processing/useSpeechToText.ts

Lines changed: 23 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import { useEffect, useCallback, useState } from 'react';
22
import { ETError, getError } from '../../Error';
33
import { SpeechToTextModule } from '../../modules/natural_language_processing/SpeechToTextModule';
4-
import { SpeechToTextModelConfig } from '../../types/stt';
4+
import { DecodingOptions, SpeechToTextModelConfig } from '../../types/stt';
55

66
export const useSpeechToText = ({
77
model,
@@ -65,24 +65,29 @@ export const useSpeechToText = ({
6565
[isReady, isGenerating, modelInstance]
6666
);
6767

68-
const stream = useCallback(async () => {
69-
if (!isReady) throw new Error(getError(ETError.ModuleNotLoaded));
70-
if (isGenerating) throw new Error(getError(ETError.ModelGenerating));
71-
setIsGenerating(true);
72-
setCommittedTranscription('');
73-
setNonCommittedTranscription('');
74-
let transcription = '';
75-
try {
76-
for await (const { committed, nonCommitted } of modelInstance.stream()) {
77-
setCommittedTranscription((prev) => prev + committed);
78-
setNonCommittedTranscription(nonCommitted);
79-
transcription += committed;
68+
const stream = useCallback(
69+
async (options?: DecodingOptions) => {
70+
if (!isReady) throw new Error(getError(ETError.ModuleNotLoaded));
71+
if (isGenerating) throw new Error(getError(ETError.ModelGenerating));
72+
setIsGenerating(true);
73+
setCommittedTranscription('');
74+
setNonCommittedTranscription('');
75+
let transcription = '';
76+
try {
77+
for await (const { committed, nonCommitted } of modelInstance.stream(
78+
options
79+
)) {
80+
setCommittedTranscription((prev) => prev + committed);
81+
setNonCommittedTranscription(nonCommitted);
82+
transcription += committed;
83+
}
84+
} finally {
85+
setIsGenerating(false);
8086
}
81-
} finally {
82-
setIsGenerating(false);
83-
}
84-
return transcription;
85-
}, [isReady, isGenerating, modelInstance]);
87+
return transcription;
88+
},
89+
[isReady, isGenerating, modelInstance]
90+
);
8691

8792
const wrapper = useCallback(
8893
<T extends (...args: any[]) => any>(fn: T) => {

0 commit comments

Comments
 (0)