Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion apps/computer-vision/app/ocr/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ export default function OCRScreen() {
height: number;
}>();

const model = useOCR({ model: OCR_ENGLISH });
const model = useOCR({
model: OCR_ENGLISH,
});
const { setGlobalGenerating } = useContext(GeneratingContext);
useEffect(() => {
setGlobalGenerating(model.isGenerating);
Expand Down
80 changes: 29 additions & 51 deletions docs/docs/02-hooks/02-computer-vision/useOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,6 @@ function App() {
<summary>Type definitions</summary>

```typescript
interface RecognizerSources {
recognizerLarge: string | number;
recognizerMedium: string | number;
recognizerSmall: string | number;
}

type OCRLanguage =
| 'abq'
| 'ady'
Expand Down Expand Up @@ -121,9 +115,7 @@ interface OCRDetection {
**`model`** - Object containing the detector source, recognizer sources, and language.

- **`detectorSource`** - A string that specifies the location of the detector binary.
- **`recognizerLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
- **`recognizerMedium`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels.
- **`recognizerSmall`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels.
- **`recognizerSource`** - A string that specifies the location of the recognizer binary.
- **`language`** - A parameter that specifies the language of the text to be recognized by the OCR.

**`preventLoad?`** - Boolean that can prevent automatic model loading (and downloading the data if you load it for the first time) after running the hook.
Expand Down Expand Up @@ -186,23 +178,18 @@ function App() {
}
```

## Language-Specific Recognizers
## Alphabet-Specific Recognizers

Each supported language requires its own set of recognizer models.
The built-in constants such as `RECOGNIZER_EN_CRNN_512`, `RECOGNIZER_PL_CRNN_256`, etc., point to specific models trained for a particular language.
Each supported alphabet requires its own recognizer model. The built-in constants, such as `RECOGNIZER_LATIN_CRNN` or `RECOGNIZER_CYRILLIC_CRNN`, point to specific models trained for a particular alphabet.

> For example:
>
> - To recognize **English** text, use:
> - `RECOGNIZER_EN_CRNN_512`
> - `RECOGNIZER_EN_CRNN_256`
> - `RECOGNIZER_EN_CRNN_128`
> - To recognize **Polish** text, use:
> - `RECOGNIZER_PL_CRNN_512`
> - `RECOGNIZER_PL_CRNN_256`
> - `RECOGNIZER_PL_CRNN_128`
> - To recognize text in languages using the **Latin** alphabet (like Polish, or German), use:
> - `RECOGNIZER_LATIN_CRNN`
> - To recognize text in languages using the **Cyrillic** alphabet (like Russian or Ukrainian), use:
> - `RECOGNIZER_CYRILLIC_CRNN`

You need to make sure the recognizer models you pass in `recognizerSources` match the `language` you specify.
You need to make sure the recognizer model you pass in `recognizerSource` matches the alphabet of the `language` you specify.

## Supported languages

Expand Down Expand Up @@ -275,33 +262,27 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

## Supported models

| Model | Type |
| ------------------------------------------------------- | :--------: |
| [CRAFT_800\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN_512\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_256\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_128\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

\* - The number following the underscore (\_) indicates the input image width used during model export.
| Model | Type |
| ------------------------------------------------- | :--------: |
| [CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ------------------------------ | :----------: |
| Detector (CRAFT_800_QUANTIZED) | 19.8 |
| Recognizer (CRNN_512) | 15 - 18\* |
| Recognizer (CRNN_256) | 16 - 18\* |
| Recognizer (CRNN_128) | 17 - 19\* |
| Model | XNNPACK [MB] |
| -------------------------- | :-----------: |
| Detector (CRAFT_QUANTIZED) | 20.9 |
| Recognizer (CRNN) | 18.5 - 25.2\* |

\* - The model weights vary depending on the language.

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------------------------------------------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT_800_QUANTIZED) + Recognizer (CRNN_512) + Recognizer (CRNN_256) + Recognizer (CRNN_128) | 1400 | 1320 |
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT) + Recognizer (CRNN) | 1400 | 1320 |

### Inference time

Expand All @@ -317,16 +298,13 @@ Times presented in the tables are measured as consecutive runs of the model. Ini

**Time measurements:**

| Metric | iPhone 17 Pro <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| ---------------------------------- | ------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 652 | 600 | 2855 | 1092 | 1034 |
| **Detector (CRAFT_800_QUANTIZED)** | 220 | 221 | 1740 | 521 | 492 |
| **Recognizer (CRNN_512)** | | | | | |
| ├─ Average Time | 45 | 38 | 110 | 40 | 38 |
| ├─ Total Time (3 runs) | 135 | 114 | 330 | 120 | 114 |
| **Recognizer (CRNN_256)** | | | | | |
| ├─ Average Time | 21 | 18 | 54 | 20 | 19 |
| ├─ Total Time (7 runs) | 147 | 126 | 378 | 140 | 133 |
| **Recognizer (CRNN_128)** | | | | | |
| ├─ Average Time | 11 | 9 | 27 | 10 | 10 |
| ├─ Total Time (7 runs) | 77 | 63 | 189 | 70 | 70 |
Notice that the recognizer models were executed between 3 and 7 times during a single recognition.
The values below represent the averages across all runs for the benchmark image.

| Model | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
| **Total Inference Time** | 652 | 600 | 2855 | 1092 | 1034 |
| Detector (CRAFT) `forward_800` | 220 | 221 | 1740 | 521 | 492 |
| Recognizer (CRNN) `forward_512` | 45 | 38 | 110 | 40 | 38 |
| Recognizer (CRNN) `forward_256` | 21 | 18 | 54 | 20 | 19 |
| Recognizer (CRNN) `forward_128` | 11 | 9 | 27 | 10 | 10 |
67 changes: 27 additions & 40 deletions docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,10 @@ interface OCRDetection {

### Arguments

**`model`** - Object containing the detector sources, recognizer sources, and language.
**`model`** - Object containing the detector source, recognizer source, and language.

- **`detectorLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels.
- **`detectorNarrow`** - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels.
- **`recognizerLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
- **`recognizerSmall`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels.
- **`detectorSource`** - A string that specifies the location of the detector binary.
- **`recognizerSource`** - A string that specifies the location of the recognizer binary.
- **`language`** - A parameter that specifies the language of the text to be recognized by the OCR.

**`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text.
Expand Down Expand Up @@ -202,21 +200,18 @@ function App() {
}
```

## Language-Specific Recognizers
## Alphabet-Specific Recognizers

Each supported language requires its own set of recognizer models.
The built-in constants such as `RECOGNIZER_EN_CRNN_512`, `RECOGNIZER_PL_CRNN_64`, etc., point to specific models trained for a particular language.
Each supported alphabet requires its own recognizer model. The built-in constants, such as `RECOGNIZER_LATIN_CRNN` or `RECOGNIZER_CYRILLIC_CRNN`, point to specific models trained for a particular alphabet.

> For example:
>
> - To recognize **English** text, use:
> - `RECOGNIZER_EN_CRNN_512`
> - `RECOGNIZER_EN_CRNN_64`
> - To recognize **Polish** text, use:
> - `RECOGNIZER_PL_CRNN_512`
> - `RECOGNIZER_PL_CRNN_64`
> - To recognize text in languages using the **Latin** alphabet (like Polish, or German), use:
> - `RECOGNIZER_LATIN_CRNN`
> - To recognize text in languages using the **Cyrillic** alphabet (like Russian or Ukrainian), use:
> - `RECOGNIZER_CYRILLIC_CRNN`

You need to make sure the recognizer models you pass in `recognizerSources` match the `language` you specify.
You need to make sure the recognizer model you pass in `recognizerSource` matches the alphabet of the `language` you specify.

## Supported languages

Expand Down Expand Up @@ -289,14 +284,10 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

## Supported models

| Model | Type |
| -------------------------------------------------------- | ---------- |
| [CRAFT_1280\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRAFT_320\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN_512\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_64\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

\* - The number following the underscore (\_) indicates the input image width used during model export.
| Model | Type |
| ------------------------------------------------- | :--------: |
| [CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

Expand All @@ -313,10 +304,9 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------------------------------------------------------------- | :--------------------: | :----------------: |
| Detector (CRAFT_1280) + Detector (CRAFT_320) + Recognizer (CRNN_512) | 1540 | 1470 |
| Detector(CRAFT_1280) + Detector(CRAFT_320) + Recognizer (CRNN_64) | 1070 | 1000 |
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT) + Recognizer (CRNN) | 1000-1600 | 1000-1500 |

### Inference time

Expand All @@ -332,16 +322,13 @@ Times presented in the tables are measured as consecutive runs of the model. Ini

**Time measurements:**

| Metric | iPhone 17 Pro <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| -------------------------------------------------------------------------- | ------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 1104 | 1113 | 8840 | 2845 | 2640 |
| **Detector (CRAFT_1280_QUANTIZED)** | 501 | 507 | 4317 | 1405 | 1275 |
| **Detector (CRAFT_320_QUANTIZED)** | | | | | |
| ├─ Average Time | 125 | 121 | 1060 | 338 | 299 |
| ├─ Total Time (4 runs) | 500 | 484 | 4240 | 1352 | 1196 |
| **Recognizer (CRNN_64)** <br /> (_With Flag `independentChars == true`_) | | | | | |
| ├─ Average Time | 5 | 6 | 14 | 7 | 6 |
| ├─ Total Time (21 runs) | 105 | 126 | 294 | 147 | 126 |
| **Recognizer (CRNN_512)** <br /> (_With Flag `independentChars == false`_) | | | | | |
| ├─ Average Time | 46 | 42 | 109 | 47 | 37 |
| ├─ Total Time (4 runs) | 184 | 168 | 436 | 188 | 148 |
Notice that the recognizer models, as well as detector's `forward_320` method, were executed between 4 and 21 times during a single recognition.
The values below represent the averages across all runs for the benchmark image.

| Model | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
| **Total Inference Time** | 1104 | 1113 | 8840 | 2845 | 2640 |
| Detector (CRAFT) `forward_1280` | 501 | 507 | 4317 | 1405 | 1275 |
| Detector (CRAFT) `forward_320` | 125 | 121 | 1060 | 338 | 299 |
| Recognizer (CRNN) `forward_512` | 46 | 42 | 109 | 47 | 37 |
| Recognizer (CRNN) `forward_64` | 5 | 6 | 14 | 7 | 6 |
Loading
Loading