Skip to content

Commit f81b465

Browse files
authored
Remove Moonshine from docs etc. (#574)
## Description Now we have removed Moonshine support in favor of Whisper. However, there are a few places where it was still mentioned. This PR removes all artifact related to Moonshine. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [x] Android ### Testing instructions <!-- Provide step-by-step instructions on how to test your changes. Include setup details if necessary. --> ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues <!-- Link related issues here using #issue-number --> ### Checklist - [ ] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. -->
1 parent 37ca6d0 commit f81b465

File tree

5 files changed

+5
-23
lines changed

5 files changed

+5
-23
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ const handleGenerate = async () => {
101101
We currently host a few example [apps](https://github.com/software-mansion/react-native-executorch/tree/main/apps) demonstrating use cases of our library:
102102

103103
- `llm` - Chat application showcasing use of LLMs
104-
- `speech-to-text` - Whisper and Moonshine models ready for transcription tasks
104+
- `speech-to-text` - Whisper model ready for transcription tasks
105105
- `computer-vision` - Computer vision related tasks
106106
- `text-embeddings` - Computing text representations for semantic search
107107

apps/llm/app/voice_chat/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ function VoiceChatScreen() {
108108
>
109109
<View style={styles.topContainer}>
110110
<SWMIcon width={45} height={45} />
111-
<Text style={styles.textModelName}>Qwen 3 x Moonshine</Text>
111+
<Text style={styles.textModelName}>Qwen 3 x Whisper</Text>
112112
</View>
113113
{llm.messageHistory.length || speechToText.committedTranscription ? (
114114
<View style={styles.chatContainer}>

docs/docs/04-benchmarks/inference-time.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,10 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
6464

6565
### Streaming mode
6666

67-
Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. That's why for the lowest latency and the fastest transcription we suggest using `Moonshine` model, if you still want to proceed with `Whisper` use preferably the `balanced` mode.
67+
Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead.
6868

6969
| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] |
7070
| ------------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: |
71-
| Moonshine-tiny (fast) | 0.8s \| 19.0t/s | 1.5s \| 11.3t/s | 1.5s \| 10.4t/s | 2.0s \| 8.8t/s | 1.6s \| 12.5t/s |
72-
| Moonshine-tiny (balanced) | 2.0s \| 20.0t/s | 3.2s \| 12.4t/s | 3.7s \| 10.4t/s | 4.6s \| 11.2t/s | 3.4s \| 14.6t/s |
73-
| Moonshine-tiny (quality) | 4.3s \| 16.8t/s | 6.6s \| 10.8t/s | 8.0s \| 8.9t/s | 7.7s \| 11.1t/s | 6.8s \| 13.1t/s |
7471
| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s |
7572
| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s |
7673
| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s |
@@ -81,9 +78,6 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode
8178

8279
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
8380
| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
84-
| Moonshine-tiny (5s) | 99 | 95 | 115 | 284 | 277 |
85-
| Moonshine-tiny (10s) | 178 | 177 | 204 | 555 | 528 |
86-
| Moonshine-tiny (30s) | 580 | 576 | 689 | 1726 | 1617 |
8781
| Whisper-tiny (30s) | 1034 | 1344 | 1269 | 2916 | 2143 |
8882

8983
### Decoding
@@ -92,9 +86,6 @@ Average time for decoding one token in sequence of 100 tokens, with encoding con
9286

9387
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
9488
| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
95-
| Moonshine-tiny (5s) | 48.98 | 47.98 | 46.86 | 36.70 | 29.03 |
96-
| Moonshine-tiny (10s) | 54.24 | 51.74 | 55.07 | 46.31 | 32.41 |
97-
| Moonshine-tiny (30s) | 76.38 | 76.19 | 87.37 | 65.61 | 45.04 |
9889
| Whisper-tiny (30s) | 128.03 | 113.65 | 141.63 | 89.08 | 84.49 |
9990

10091
## Text Embeddings

docs/src/pages/index.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ const Home = () => {
2020
<Head>
2121
<meta
2222
name="keywords"
23-
content="react native ai, react native llm, react native qwen, on-device ai, mobile ai, mobile machine learning, on-device inference, edge ai, llama, llm, whisper, ocr, moonshine, speech to text, qwen"
23+
content="react native ai, react native llm, react native qwen, on-device ai, mobile ai, mobile machine learning, on-device inference, edge ai, llama, llm, whisper, ocr, speech to text, qwen"
2424
/>
2525
</Head>
2626
<div className={styles.container}>

docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,10 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
6464

6565
### Streaming mode
6666

67-
Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. That's why for the lowest latency and the fastest transcription we suggest using `Moonshine` model, if you still want to proceed with `Whisper` use preferably the `balanced` mode.
67+
Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead.
6868

6969
| Model (mode) | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] |
7070
| ------------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: |
71-
| Moonshine-tiny (fast) | 0.8s \| 19.0t/s | 1.5s \| 11.3t/s | 1.5s \| 10.4t/s | 2.0s \| 8.8t/s | 1.6s \| 12.5t/s |
72-
| Moonshine-tiny (balanced) | 2.0s \| 20.0t/s | 3.2s \| 12.4t/s | 3.7s \| 10.4t/s | 4.6s \| 11.2t/s | 3.4s \| 14.6t/s |
73-
| Moonshine-tiny (quality) | 4.3s \| 16.8t/s | 6.6s \| 10.8t/s | 8.0s \| 8.9t/s | 7.7s \| 11.1t/s | 6.8s \| 13.1t/s |
7471
| Whisper-tiny (fast) | 2.8s \| 5.5t/s | 3.7s \| 4.4t/s | 4.4s \| 3.4t/s | 5.5s \| 3.1t/s | 5.3s \| 3.8t/s |
7572
| Whisper-tiny (balanced) | 5.6s \| 7.9t/s | 7.0s \| 6.3t/s | 8.3s \| 5.0t/s | 8.4s \| 6.7t/s | 7.7s \| 7.2t/s |
7673
| Whisper-tiny (quality) | 10.3s \| 8.3t/s | 12.6s \| 6.8t/s | 7.8s \| 8.9t/s | 13.5s \| 7.1t/s | 12.9s \| 7.5t/s |
@@ -81,9 +78,6 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode
8178

8279
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
8380
| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
84-
| Moonshine-tiny (5s) | 99 | 95 | 115 | 284 | 277 |
85-
| Moonshine-tiny (10s) | 178 | 177 | 204 | 555 | 528 |
86-
| Moonshine-tiny (30s) | 580 | 576 | 689 | 1726 | 1617 |
8781
| Whisper-tiny (30s) | 1034 | 1344 | 1269 | 2916 | 2143 |
8882

8983
### Decoding
@@ -92,9 +86,6 @@ Average time for decoding one token in sequence of 100 tokens, with encoding con
9286

9387
| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
9488
| -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
95-
| Moonshine-tiny (5s) | 48.98 | 47.98 | 46.86 | 36.70 | 29.03 |
96-
| Moonshine-tiny (10s) | 54.24 | 51.74 | 55.07 | 46.31 | 32.41 |
97-
| Moonshine-tiny (30s) | 76.38 | 76.19 | 87.37 | 65.61 | 45.04 |
9889
| Whisper-tiny (30s) | 128.03 | 113.65 | 141.63 | 89.08 | 84.49 |
9990

10091
## Text Embeddings

0 commit comments

Comments
 (0)