Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion apps/computer-vision/app/ocr/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ export default function OCRScreen() {
height: number;
}>();

const model = useOCR({ model: OCR_ENGLISH });
const model = useOCR({
model: OCR_ENGLISH,
});
const { setGlobalGenerating } = useContext(GeneratingContext);
useEffect(() => {
setGlobalGenerating(model.isGenerating);
Expand Down
80 changes: 29 additions & 51 deletions docs/docs/02-hooks/02-computer-vision/useOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,6 @@ function App() {
<summary>Type definitions</summary>

```typescript
interface RecognizerSources {
recognizerLarge: string | number;
recognizerMedium: string | number;
recognizerSmall: string | number;
}

type OCRLanguage =
| 'abq'
| 'ady'
Expand Down Expand Up @@ -121,9 +115,7 @@ interface OCRDetection {
**`model`** - Object containing the detector source, recognizer sources, and language.

- **`detectorSource`** - A string that specifies the location of the detector binary.
- **`recognizerLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
- **`recognizerMedium`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 256 pixels.
- **`recognizerSmall`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 128 pixels.
- **`recognizerSource`** - A string that specifies the location of the recognizer binary.
- **`language`** - A parameter that specifies the language of the text to be recognized by the OCR.

**`preventLoad?`** - Boolean that can prevent automatic model loading (and downloading the data if you load it for the first time) after running the hook.
Expand Down Expand Up @@ -186,23 +178,18 @@ function App() {
}
```

## Language-Specific Recognizers
## Alphabet-Specific Recognizers

Each supported language requires its own set of recognizer models.
The built-in constants such as `RECOGNIZER_EN_CRNN_512`, `RECOGNIZER_PL_CRNN_256`, etc., point to specific models trained for a particular language.
Each supported alphabet requires its own recognizer model. The built-in constants, such as `RECOGNIZER_LATIN_CRNN` or `RECOGNIZER_CYRILLIC_CRNN`, point to specific models trained for a particular alphabet.

> For example:
>
> - To recognize **English** text, use:
> - `RECOGNIZER_EN_CRNN_512`
> - `RECOGNIZER_EN_CRNN_256`
> - `RECOGNIZER_EN_CRNN_128`
> - To recognize **Polish** text, use:
> - `RECOGNIZER_PL_CRNN_512`
> - `RECOGNIZER_PL_CRNN_256`
> - `RECOGNIZER_PL_CRNN_128`
> - To recognize text in languages using the **Latin** alphabet (like Polish, or German), use:
> - `RECOGNIZER_LATIN_CRNN`
> - To recognize text in languages using the **Cyrillic** alphabet (like Russian or Ukrainian), use:
> - `RECOGNIZER_CYRILLIC_CRNN`

You need to make sure the recognizer models you pass in `recognizerSources` match the `language` you specify.
You need to make sure the recognizer model you pass in `recognizerSource` matches the alphabet of the `language` you specify.

## Supported languages

Expand Down Expand Up @@ -275,33 +262,27 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

## Supported models

| Model | Type |
| ------------------------------------------------------- | :--------: |
| [CRAFT_800\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN_512\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_256\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_128\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

\* - The number following the underscore (\_) indicates the input image width used during model export.
| Model | Type |
| ------------------------------------------------- | :--------: |
| [CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ------------------------------ | :----------: |
| Detector (CRAFT_800_QUANTIZED) | 19.8 |
| Recognizer (CRNN_512) | 15 - 18\* |
| Recognizer (CRNN_256) | 16 - 18\* |
| Recognizer (CRNN_128) | 17 - 19\* |
| Model | XNNPACK [MB] |
| -------------------------- | :-----------: |
| Detector (CRAFT_QUANTIZED) | 20.9 |
| Recognizer (CRNN) | 18.5 - 25.2\* |

\* - The model weights vary depending on the language.

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------------------------------------------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT_800_QUANTIZED) + Recognizer (CRNN_512) + Recognizer (CRNN_256) + Recognizer (CRNN_128) | 1400 | 1320 |
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT) + Recognizer (CRNN) | 1400 | 1320 |

### Inference time

Expand All @@ -317,16 +298,13 @@ Times presented in the tables are measured as consecutive runs of the model. Ini

**Time measurements:**

| Metric | iPhone 17 Pro <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| ---------------------------------- | ------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 652 | 600 | 2855 | 1092 | 1034 |
| **Detector (CRAFT_800_QUANTIZED)** | 220 | 221 | 1740 | 521 | 492 |
| **Recognizer (CRNN_512)** | | | | | |
| ├─ Average Time | 45 | 38 | 110 | 40 | 38 |
| ├─ Total Time (3 runs) | 135 | 114 | 330 | 120 | 114 |
| **Recognizer (CRNN_256)** | | | | | |
| ├─ Average Time | 21 | 18 | 54 | 20 | 19 |
| ├─ Total Time (7 runs) | 147 | 126 | 378 | 140 | 133 |
| **Recognizer (CRNN_128)** | | | | | |
| ├─ Average Time | 11 | 9 | 27 | 10 | 10 |
| ├─ Total Time (7 runs) | 77 | 63 | 189 | 70 | 70 |
Notice that the recognizer models were executed between 3 and 7 times during a single recognition.
The values below represent the averages across all runs for the benchmark image.

| Model | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
| **Total Inference Time** | 652 | 600 | 2855 | 1092 | 1034 |
| Detector (CRAFT) `forward_800` | 220 | 221 | 1740 | 521 | 492 |
| Recognizer (CRNN) `forward_512` | 45 | 38 | 110 | 40 | 38 |
| Recognizer (CRNN) `forward_256` | 21 | 18 | 54 | 20 | 19 |
| Recognizer (CRNN) `forward_128` | 11 | 9 | 27 | 10 | 10 |
67 changes: 27 additions & 40 deletions docs/docs/02-hooks/02-computer-vision/useVerticalOCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,10 @@ interface OCRDetection {

### Arguments

**`model`** - Object containing the detector sources, recognizer sources, and language.
**`model`** - Object containing the detector source, recognizer source, and language.

- **`detectorLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 1280 pixels.
- **`detectorNarrow`** - A string that specifies the location of the detector binary file which accepts input images with a width of 320 pixels.
- **`recognizerLarge`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 512 pixels.
- **`recognizerSmall`** - A string that specifies the location of the recognizer binary file which accepts input images with a width of 64 pixels.
- **`detectorSource`** - A string that specifies the location of the detector binary.
- **`recognizerSource`** - A string that specifies the location of the recognizer binary.
- **`language`** - A parameter that specifies the language of the text to be recognized by the OCR.

**`independentCharacters`** – A boolean parameter that indicates whether the text in the image consists of a random sequence of characters. If set to true, the algorithm will scan each character individually instead of reading them as continuous text.
Expand Down Expand Up @@ -202,21 +200,18 @@ function App() {
}
```

## Language-Specific Recognizers
## Alphabet-Specific Recognizers

Each supported language requires its own set of recognizer models.
The built-in constants such as `RECOGNIZER_EN_CRNN_512`, `RECOGNIZER_PL_CRNN_64`, etc., point to specific models trained for a particular language.
Each supported alphabet requires its own recognizer model. The built-in constants, such as `RECOGNIZER_LATIN_CRNN` or `RECOGNIZER_CYRILLIC_CRNN`, point to specific models trained for a particular alphabet.

> For example:
>
> - To recognize **English** text, use:
> - `RECOGNIZER_EN_CRNN_512`
> - `RECOGNIZER_EN_CRNN_64`
> - To recognize **Polish** text, use:
> - `RECOGNIZER_PL_CRNN_512`
> - `RECOGNIZER_PL_CRNN_64`
> - To recognize text in languages using the **Latin** alphabet (like Polish, or German), use:
> - `RECOGNIZER_LATIN_CRNN`
> - To recognize text in languages using the **Cyrillic** alphabet (like Russian or Ukrainian), use:
> - `RECOGNIZER_CYRILLIC_CRNN`

You need to make sure the recognizer models you pass in `recognizerSources` match the `language` you specify.
You need to make sure the recognizer model you pass in `recognizerSource` matches the alphabet of the `language` you specify.

## Supported languages

Expand Down Expand Up @@ -289,14 +284,10 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

## Supported models

| Model | Type |
| -------------------------------------------------------- | ---------- |
| [CRAFT_1280\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRAFT_320\*](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN_512\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |
| [CRNN_64\*](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

\* - The number following the underscore (\_) indicates the input image width used during model export.
| Model | Type |
| ------------------------------------------------- | :--------: |
| [CRAFT](https://github.com/clovaai/CRAFT-pytorch) | Detector |
| [CRNN](https://www.jaided.ai/easyocr/modelhub/) | Recognizer |

## Benchmarks

Expand All @@ -313,10 +304,9 @@ You need to make sure the recognizer models you pass in `recognizerSources` matc

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| -------------------------------------------------------------------- | :--------------------: | :----------------: |
| Detector (CRAFT_1280) + Detector (CRAFT_320) + Recognizer (CRNN_512) | 1540 | 1470 |
| Detector(CRAFT_1280) + Detector(CRAFT_320) + Recognizer (CRNN_64) | 1070 | 1000 |
| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------------ | :--------------------: | :----------------: |
| Detector (CRAFT) + Recognizer (CRNN) | 1000-1600 | 1000-1500 |

### Inference time

Expand All @@ -332,16 +322,13 @@ Times presented in the tables are measured as consecutive runs of the model. Ini

**Time measurements:**

| Metric | iPhone 17 Pro <br /> [ms] | iPhone 16 Pro <br /> [ms] | iPhone SE 3 | Samsung Galaxy S24 <br /> [ms] | OnePlus 12 <br /> [ms] |
| -------------------------------------------------------------------------- | ------------------------- | ------------------------- | ----------- | ------------------------------ | ---------------------- |
| **Total Inference Time** | 1104 | 1113 | 8840 | 2845 | 2640 |
| **Detector (CRAFT_1280_QUANTIZED)** | 501 | 507 | 4317 | 1405 | 1275 |
| **Detector (CRAFT_320_QUANTIZED)** | | | | | |
| ├─ Average Time | 125 | 121 | 1060 | 338 | 299 |
| ├─ Total Time (4 runs) | 500 | 484 | 4240 | 1352 | 1196 |
| **Recognizer (CRNN_64)** <br /> (_With Flag `independentChars == true`_) | | | | | |
| ├─ Average Time | 5 | 6 | 14 | 7 | 6 |
| ├─ Total Time (21 runs) | 105 | 126 | 294 | 147 | 126 |
| **Recognizer (CRNN_512)** <br /> (_With Flag `independentChars == false`_) | | | | | |
| ├─ Average Time | 46 | 42 | 109 | 47 | 37 |
| ├─ Total Time (4 runs) | 184 | 168 | 436 | 188 | 148 |
Notice that the recognizer models, as well as detector's `forward_320` method, were executed between 4 and 21 times during a single recognition.
The values below represent the averages across all runs for the benchmark image.

| Model | iPhone 17 Pro [ms] | iPhone 16 Pro [ms] | iPhone SE 3 | Samsung Galaxy S24 [ms] | OnePlus 12 [ms] |
| ------------------------------- | ------------------ | ------------------ | ----------- | ----------------------- | --------------- |
| **Total Inference Time** | 1104 | 1113 | 8840 | 2845 | 2640 |
| Detector (CRAFT) `forward_1280` | 501 | 507 | 4317 | 1405 | 1275 |
| Detector (CRAFT) `forward_320` | 125 | 121 | 1060 | 338 | 299 |
| Recognizer (CRNN) `forward_512` | 46 | 42 | 109 | 47 | 37 |
| Recognizer (CRNN) `forward_64` | 5 | 6 | 14 | 7 | 6 |
Loading
Loading