Skip to content

Commit 9efd924

Browse files
authored
Remove benchmarks from hook API (#743)
## Description As in the title & moving section `Benchmark` on the sidebar up, so the benchmarks will be visible immediately. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [ ] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [x] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [ ] iOS - [ ] Android ### Testing instructions <!-- Provide step-by-step instructions on how to test your changes. Include setup details if necessary. --> ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues <!-- Link related issues here using #issue-number --> ### Checklist - [ ] I have performed a self-review of my code - [ ] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. -->
1 parent cc852b2 commit 9efd924

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+20
-397
lines changed

docs/docs/01-fundamentals/03-frequently-asked-questions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ Each hook documentation subpage (useClassification, useLLM, etc.) contains a sup
1010

1111
### How can I run my own AI model?
1212

13-
To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../02-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../03-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.
13+
To run your own model, you need to directly access the underlying [ExecuTorch Module API](https://pytorch.org/executorch/stable/extension-module.html). We provide an experimental [React hook](../03-hooks/03-executorch-bindings/useExecutorchModule.md) along with a [TypeScript alternative](../04-typescript-api/03-executorch-bindings/ExecutorchModule.md), which serve as a way to use the aforementioned API without the need of diving into native code. In order to get a model in a format runnable by the runtime, you'll need to get your hands dirty with some ExecuTorch knowledge. For more guides on exporting models, please refer to the [ExecuTorch tutorials](https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html). Once you obtain your model in a `.pte` format, you can run it with `useExecuTorchModule` and `ExecuTorchModule`.
1414

1515
### Can you do function calling with useLLM?
1616

17-
If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../02-hooks/01-natural-language-processing/useLLM.md).
17+
If your model supports tool calling (i.e. its chat template can process tools) you can use the method explained on the [useLLM page](../03-hooks/01-natural-language-processing/useLLM.md).
1818

1919
If your model doesn't support it, you can still work around it using context. For details, refer to [this comment](https://github.com/software-mansion/react-native-executorch/issues/173#issuecomment-2775082278).
2020

docs/docs/02-hooks/01-natural-language-processing/_category_.json renamed to docs/docs/03-hooks/01-natural-language-processing/_category_.json

File renamed without changes.

docs/docs/02-hooks/01-natural-language-processing/useLLM.md renamed to docs/docs/03-hooks/01-natural-language-processing/useLLM.md

Lines changed: 0 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -498,40 +498,3 @@ Depending on selected model and the user's device generation speed can be above
498498
| [Phi 4 Mini](https://huggingface.co/software-mansion/react-native-executorch-phi-4-mini) | 4B ||
499499
| [SmolLM 2](https://huggingface.co/software-mansion/react-native-executorch-smolLm-2) | 135M, 360M, 1.7B ||
500500
| [LLaMA 3.2](https://huggingface.co/software-mansion/react-native-executorch-llama-3.2) | 1B, 3B ||
501-
502-
## Benchmarks
503-
504-
### Model size
505-
506-
| Model | XNNPACK [GB] |
507-
| --------------------- | :----------: |
508-
| LLAMA3_2_1B | 2.47 |
509-
| LLAMA3_2_1B_SPINQUANT | 1.14 |
510-
| LLAMA3_2_1B_QLORA | 1.18 |
511-
| LLAMA3_2_3B | 6.43 |
512-
| LLAMA3_2_3B_SPINQUANT | 2.55 |
513-
| LLAMA3_2_3B_QLORA | 2.65 |
514-
515-
### Memory usage
516-
517-
| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
518-
| --------------------- | :--------------------: | :----------------: |
519-
| LLAMA3_2_1B | 3.2 | 3.1 |
520-
| LLAMA3_2_1B_SPINQUANT | 1.9 | 2 |
521-
| LLAMA3_2_1B_QLORA | 2.2 | 2.5 |
522-
| LLAMA3_2_3B | 7.1 | 7.3 |
523-
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
524-
| LLAMA3_2_3B_QLORA | 4 | 4.1 |
525-
526-
### Inference time
527-
528-
| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
529-
| --------------------- | :--------------------------------: | :--------------------------------: | :------------------------------: | :-------------------------------------: | :-----------------------------: |
530-
| LLAMA3_2_1B | 16.1 | 11.4 || 15.6 | 19.3 |
531-
| LLAMA3_2_1B_SPINQUANT | 40.6 | 16.7 | 16.5 | 40.3 | 48.2 |
532-
| LLAMA3_2_1B_QLORA | 31.8 | 11.4 | 11.2 | 37.3 | 44.4 |
533-
| LLAMA3_2_3B ||||| 7.1 |
534-
| LLAMA3_2_3B_SPINQUANT | 17.2 | 8.2 || 16.2 | 19.4 |
535-
| LLAMA3_2_3B_QLORA | 14.5 ||| 14.8 | 18.1 |
536-
537-
❌ - Insufficient RAM.

docs/docs/02-hooks/01-natural-language-processing/useSpeechToText.md renamed to docs/docs/03-hooks/01-natural-language-processing/useSpeechToText.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -322,22 +322,3 @@ function App() {
322322
| [whisper-base](https://huggingface.co/openai/whisper-base) | Multilingual |
323323
| [whisper-small.en](https://huggingface.co/openai/whisper-small.en) | English |
324324
| [whisper-small](https://huggingface.co/openai/whisper-small) | Multilingual |
325-
326-
## Benchmarks
327-
328-
### Model size
329-
330-
| Model | XNNPACK [MB] |
331-
| ---------------- | :----------: |
332-
| WHISPER_TINY_EN | 151 |
333-
| WHISPER_TINY | 151 |
334-
| WHISPER_BASE_EN | 290.6 |
335-
| WHISPER_BASE | 290.6 |
336-
| WHISPER_SMALL_EN | 968 |
337-
| WHISPER_SMALL | 968 |
338-
339-
### Memory usage
340-
341-
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
342-
| ------------ | :--------------------: | :----------------: |
343-
| WHISPER_TINY | 410 | 375 |

docs/docs/02-hooks/01-natural-language-processing/useTextEmbeddings.md renamed to docs/docs/03-hooks/01-natural-language-processing/useTextEmbeddings.md

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -116,43 +116,3 @@ function App() {
116116
:::info
117117
For the supported models, the returned embedding vector is normalized, meaning that its length is equal to 1. This allows for easier comparison of vectors using cosine similarity, just calculate the dot product of two vectors to get the cosine similarity score.
118118
:::
119-
120-
## Benchmarks
121-
122-
### Model size
123-
124-
| Model | XNNPACK [MB] |
125-
| -------------------------- | :----------: |
126-
| ALL_MINILM_L6_V2 | 91 |
127-
| ALL_MPNET_BASE_V2 | 438 |
128-
| MULTI_QA_MINILM_L6_COS_V1 | 91 |
129-
| MULTI_QA_MPNET_BASE_DOT_V1 | 438 |
130-
| CLIP_VIT_BASE_PATCH32_TEXT | 254 |
131-
132-
### Memory usage
133-
134-
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
135-
| -------------------------- | :--------------------: | :----------------: |
136-
| ALL_MINILM_L6_V2 | 95 | 110 |
137-
| ALL_MPNET_BASE_V2 | 405 | 455 |
138-
| MULTI_QA_MINILM_L6_COS_V1 | 120 | 140 |
139-
| MULTI_QA_MPNET_BASE_DOT_V1 | 435 | 455 |
140-
| CLIP_VIT_BASE_PATCH32_TEXT | 200 | 280 |
141-
142-
### Inference time
143-
144-
:::warning
145-
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
146-
:::
147-
148-
| Model | iPhone 17 Pro (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
149-
| -------------------------- | :--------------------------: | :-----------------------: |
150-
| ALL_MINILM_L6_V2 | 7 | 21 |
151-
| ALL_MPNET_BASE_V2 | 24 | 90 |
152-
| MULTI_QA_MINILM_L6_COS_V1 | 7 | 19 |
153-
| MULTI_QA_MPNET_BASE_DOT_V1 | 24 | 88 |
154-
| CLIP_VIT_BASE_PATCH32_TEXT | 14 | 39 |
155-
156-
:::info
157-
Benchmark times for text embeddings are highly dependent on the sentence length. The numbers above are based on a sentence of around 80 tokens. For shorter or longer sentences, inference time may vary accordingly.
158-
:::

docs/docs/02-hooks/01-natural-language-processing/useTokenizer.md renamed to docs/docs/03-hooks/01-natural-language-processing/useTokenizer.md

File renamed without changes.

0 commit comments

Comments
 (0)