Remove Moonshine from docs etc. (#574)

msluszniak · web-flow · commit f81b46522bdc · 2025-09-03T13:24:29.000+02:00
## Description

Now we have removed Moonshine support in favor of Whisper. However,
there are a few places where it was still mentioned. This PR removes all
artifact related to Moonshine.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [ ] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [x] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [x] Android

### Testing instructions

&lt;!-- Provide step-by-step instructions on how to test your changes.
Include setup details if necessary. --&gt;

### Screenshots

&lt;!-- Add screenshots here, if applicable --&gt;

### Related issues

&lt;!-- Link related issues here using #issue-number --&gt;

### Checklist

- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

&lt;!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. --&gt;
diff --git a/README.md b/README.md
@@ -101,7 +101,7 @@ const handleGenerate = async () => {
 We currently host a few example [apps](https://github.com/software-mansion/react-native-executorch/tree/main/apps) demonstrating use cases of our library:
 
 - `llm` - Chat application showcasing use of LLMs
-- `speech-to-text` - Whisper and Moonshine models ready for transcription tasks
+- `speech-to-text` - Whisper model ready for transcription tasks
 - `computer-vision` - Computer vision related tasks
 - `text-embeddings` - Computing text representations for semantic search
 
diff --git a/apps/llm/app/voice_chat/index.tsx b/apps/llm/app/voice_chat/index.tsx
@@ -108,7 +108,7 @@ function VoiceChatScreen() {
       >
         <View style={styles.topContainer}>
           <SWMIcon width={45} height={45} />
-          <Text style={styles.textModelName}>Qwen 3 x Moonshine</Text>
+          <Text style={styles.textModelName}>Qwen 3 x Whisper</Text>
         </View>
         {llm.messageHistory.length || speechToText.committedTranscription ? (
           <View style={styles.chatContainer}>
diff --git a/docs/docs/04-benchmarks/inference-time.md b/docs/docs/04-benchmarks/inference-time.md
@@ -64,13 +64,10 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
 
 ### Streaming mode
 
-Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. That's why for the lowest latency and the fastest transcription we suggest using `Moonshine` model, if you still want to proceed with `Whisper` use preferably the `balanced` mode.
+Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead.
 
 | Model (mode)              | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] |
 | ------------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: |
-| Moonshine-tiny (fast)     |                0.8s \| 19.0t/s                |                1.5s \| 11.3t/s                |               1.5s \| 10.4t/s               |                   2.0s \| 8.8t/s                   |              1.6s \| 12.5t/s               |
-| Moonshine-tiny (balanced) |                2.0s \| 20.0t/s                |                3.2s \| 12.4t/s                |               3.7s \| 10.4t/s               |                  4.6s \| 11.2t/s                   |              3.4s \| 14.6t/s               |
-| Moonshine-tiny (quality)  |                4.3s \| 16.8t/s                |                6.6s \| 10.8t/s                |               8.0s \| 8.9t/s                |                  7.7s \| 11.1t/s                   |              6.8s \| 13.1t/s               |
 | Whisper-tiny (fast)       |                2.8s \| 5.5t/s                 |                3.7s \| 4.4t/s                 |               4.4s \| 3.4t/s                |                   5.5s \| 3.1t/s                   |               5.3s \| 3.8t/s               |
 | Whisper-tiny (balanced)   |                5.6s \| 7.9t/s                 |                7.0s \| 6.3t/s                 |               8.3s \| 5.0t/s                |                   8.4s \| 6.7t/s                   |               7.7s \| 7.2t/s               |
 | Whisper-tiny (quality)    |                10.3s \| 8.3t/s                |                12.6s \| 6.8t/s                |               7.8s \| 8.9t/s                |                  13.5s \| 7.1t/s                   |              12.9s \| 7.5t/s               |
@@ -81,9 +78,6 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode
 
 | Model                | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
 | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
-| Moonshine-tiny (5s)  |              99              |              95              |            115             |                284                |            277            |
-| Moonshine-tiny (10s) |             178              |             177              |            204             |                555                |            528            |
-| Moonshine-tiny (30s) |             580              |             576              |            689             |               1726                |           1617            |
 | Whisper-tiny (30s)   |             1034             |             1344             |            1269            |               2916                |           2143            |
 
 ### Decoding
@@ -92,9 +86,6 @@ Average time for decoding one token in sequence of 100 tokens, with encoding con
 
 | Model                | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
 | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
-| Moonshine-tiny (5s)  |            48.98             |            47.98             |           46.86            |               36.70               |           29.03           |
-| Moonshine-tiny (10s) |            54.24             |            51.74             |           55.07            |               46.31               |           32.41           |
-| Moonshine-tiny (30s) |            76.38             |            76.19             |           87.37            |               65.61               |           45.04           |
 | Whisper-tiny (30s)   |            128.03            |            113.65            |           141.63           |               89.08               |           84.49           |
 
 ## Text Embeddings
diff --git a/docs/src/pages/index.tsx b/docs/src/pages/index.tsx
@@ -20,7 +20,7 @@ const Home = () => {
       <Head>
         <meta
           name="keywords"
-          content="react native ai, react native llm, react native qwen, on-device ai, mobile ai, mobile machine learning, on-device inference, edge ai, llama, llm, whisper, ocr, moonshine, speech to text, qwen"
+          content="react native ai, react native llm, react native qwen, on-device ai, mobile ai, mobile machine learning, on-device inference, edge ai, llama, llm, whisper, ocr, speech to text, qwen"
         />
       </Head>
       <div className={styles.container}>
diff --git a/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md b/docs/versioned_docs/version-0.5.x/04-benchmarks/inference-time.md
@@ -64,13 +64,10 @@ Times presented in the tables are measured as consecutive runs of the model. Ini
 
 ### Streaming mode
 
-Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. That's why for the lowest latency and the fastest transcription we suggest using `Moonshine` model, if you still want to proceed with `Whisper` use preferably the `balanced` mode.
+Notice than for `Whisper` model which has to take as an input 30 seconds audio chunks (for shorter audio it is automatically padded with silence to 30 seconds) `fast` mode has the lowest latency (time from starting transcription to first token returned, caused by streaming algorithm), but the slowest speed. If you believe that this might be a problem for you, prefer `balanced` mode instead.
 
 | Model (mode)              | iPhone 16 Pro (XNNPACK) [latency \| tokens/s] | iPhone 14 Pro (XNNPACK) [latency \| tokens/s] | iPhone SE 3 (XNNPACK) [latency \| tokens/s] | Samsung Galaxy S24 (XNNPACK) [latency \| tokens/s] | OnePlus 12 (XNNPACK) [latency \| tokens/s] |
 | ------------------------- | :-------------------------------------------: | :-------------------------------------------: | :-----------------------------------------: | :------------------------------------------------: | :----------------------------------------: |
-| Moonshine-tiny (fast)     |                0.8s \| 19.0t/s                |                1.5s \| 11.3t/s                |               1.5s \| 10.4t/s               |                   2.0s \| 8.8t/s                   |              1.6s \| 12.5t/s               |
-| Moonshine-tiny (balanced) |                2.0s \| 20.0t/s                |                3.2s \| 12.4t/s                |               3.7s \| 10.4t/s               |                  4.6s \| 11.2t/s                   |              3.4s \| 14.6t/s               |
-| Moonshine-tiny (quality)  |                4.3s \| 16.8t/s                |                6.6s \| 10.8t/s                |               8.0s \| 8.9t/s                |                  7.7s \| 11.1t/s                   |              6.8s \| 13.1t/s               |
 | Whisper-tiny (fast)       |                2.8s \| 5.5t/s                 |                3.7s \| 4.4t/s                 |               4.4s \| 3.4t/s                |                   5.5s \| 3.1t/s                   |               5.3s \| 3.8t/s               |
 | Whisper-tiny (balanced)   |                5.6s \| 7.9t/s                 |                7.0s \| 6.3t/s                 |               8.3s \| 5.0t/s                |                   8.4s \| 6.7t/s                   |               7.7s \| 7.2t/s               |
 | Whisper-tiny (quality)    |                10.3s \| 8.3t/s                |                12.6s \| 6.8t/s                |               7.8s \| 8.9t/s                |                  13.5s \| 7.1t/s                   |              12.9s \| 7.5t/s               |
@@ -81,9 +78,6 @@ Average time for encoding audio of given length over 10 runs. For `Whisper` mode
 
 | Model                | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
 | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
-| Moonshine-tiny (5s)  |              99              |              95              |            115             |                284                |            277            |
-| Moonshine-tiny (10s) |             178              |             177              |            204             |                555                |            528            |
-| Moonshine-tiny (30s) |             580              |             576              |            689             |               1726                |           1617            |
 | Whisper-tiny (30s)   |             1034             |             1344             |            1269            |               2916                |           2143            |
 
 ### Decoding
@@ -92,9 +86,6 @@ Average time for decoding one token in sequence of 100 tokens, with encoding con
 
 | Model                | iPhone 16 Pro (XNNPACK) [ms] | iPhone 14 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
 | -------------------- | :--------------------------: | :--------------------------: | :------------------------: | :-------------------------------: | :-----------------------: |
-| Moonshine-tiny (5s)  |            48.98             |            47.98             |           46.86            |               36.70               |           29.03           |
-| Moonshine-tiny (10s) |            54.24             |            51.74             |           55.07            |               46.31               |           32.41           |
-| Moonshine-tiny (30s) |            76.38             |            76.19             |           87.37            |               65.61               |           45.04           |
 | Whisper-tiny (30s)   |            128.03            |            113.65            |           141.63           |               89.08               |           84.49           |
 
 ## Text Embeddings

Original file line number	Diff line number	Diff line change
`@@ -108,7 +108,7 @@ function VoiceChatScreen() {`
`108`	`108`	`>`
`109`	`109`	`<View style={styles.topContainer}>`
`110`	`110`	`<SWMIcon width={45} height={45} />`
`111`		`- <Text style={styles.textModelName}>Qwen 3 x Moonshine</Text>`
	`111`	`+ <Text style={styles.textModelName}>Qwen 3 x Whisper</Text>`
`112`	`112`	`</View>`
`113`	`113`	`{llm.messageHistory.length \|\| speechToText.committedTranscription ? (`
`114`	`114`	`<View style={styles.chatContainer}>`