shape-detection-api/text.bs at main · WICG/shape-detection-api · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
<pre class="metadata">
Title: Accelerated Text Detection in Images
Repository: wicg/shape-detection-api
Status: CG-DRAFT
ED: https://wicg.github.io/shape-detection-api
Shortname: text-detection-api
Level: 1
Editor: Miguel Casas-Sanchez 82825, Google LLC https://www.google.com, mcasas@google.com
Editor: Reilly Grant 83788, Google LLC https://www.google.com, reillyg@google.com
Abstract: This document describes an API providing access to accelerated text detectors for still images and/or live image feeds.
Group: wicg
Markup Shorthands: markdown yes
!Participate: <a href="https://www.w3.org/community/wicg/">Join the W3C Community Group</a>
!Participate: <a href="https://github.com/WICG/shape-detection-api">Fix the text through GitHub</a>
</pre>

<style>
table {
  border-collapse: collapse;
  border-left-style: hidden;
  border-right-style: hidden;
  text-align: left;
}
table caption {
  font-weight: bold;
  padding: 3px;
  text-align: left;
}
table td, table th {
  border: 1px solid black;
  padding: 3px;
}
</style>

# Introduction # {#introduction}

Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. This document deals with text detection whereas the sister document [[SHAPE-DETECTION-API]] specifies the Face and Barcode detection cases and APIs.

## Text detection use cases ## {#use-cases}

Please see the <a href="https://github.com/WICG/shape-detection-api/blob/gh-pages/README.md">Readme/Explainer</a> in the repository.

# Text Detection API # {#api}

Individual browsers MAY provide a {{TextDetector}} to perform text detection in images,
potentially leveraging hardware acceleration or additional dependent libraries.
The {{TextDetector/availability()}} method allows developers to check for the availability
of these capabilities and specific language support.

## Image sources for detection ## {#image-sources-for-detection}

Please refer to [[SHAPE-DETECTION-API#image-sources-for-detection]]

## Text Detection API ## {#text-detection-api}

{{TextDetector}} represents an underlying accelerated platform's component for detection in images of Latin-1 text as defined in [[iso8859-1]].  It provides a single {{TextDetector/detect()}} operation on an {{ImageBitmapSource}} of which the result is a Promise.  This method must reject this Promise in the cases detailed in [[#image-sources-for-detection]]; otherwise it may queue a task using the OS/Platform resources to resolve the Promise with a sequence of {{DetectedText}}s, each one essentially consisting on a {{DetectedText/rawValue}} and delimited by a {{DetectedText/boundingBox}} and a series of {{Point2D}}s.


<xmp class="idl">
dictionary TextDetectorOptions {
    required sequence<DOMString> languages;
};

dictionary TextDetectorCreateOptions {
    AbortSignal signal;
    sequence<DOMString> languages;
};

[
    Exposed=(Window,Worker),
    SecureContext
] interface TextDetector {
    constructor();
    static Promise<Availability> availability(TextDetectorOptions options);
    static Promise<TextDetector> create(optional TextDetectorCreateOptions options = {});
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};
</xmp>

<dl class="domintro">
  <dt><dfn constructor for="TextDetector">`TextDetector()`</dfn></dt>
  <dd>
    <div class="note">
    Detectors may potentially allocate and hold significant resources. Where possible, reuse the same {{TextDetector}} for several detections.
    </div>
  </dd>
  <dt><dfn method for="TextDetector"><code>availability(TextDetectorOptions |options|)</code></dfn></dt>
  <dd>
    <p>
      Returns a {{Promise}} that resolves with an {{Availability}} object
      indicating the overall availability status for the specified |options| languages for text
      detection.
    </p>
    <p>
      The returned {{Availability}} value is determined by the following precedence,
      applied across all requested languages:
    </p>
    <ul>
      <li>If any requested language is <code>"unavailable"</code>, the method returns
      <code>"unavailable"</code>.</li>
      <li>Otherwise, if any requested language is <code>"downloadable"</code>, the method
      returns <code>"downloadable"</code>.</li>
      <li>Otherwise, if any requested language is <code>"downloading"</code>, the method
      returns <code>"downloading"</code>.</li>
      <li>Otherwise, all requested languages are <code>"available"</code>, and the method
      returns <code>"available"</code>.</li>
    </ul>
    <div class="note">
      This method allows developers to check for specific language support before
      attempting to create a {{TextDetector}} instance.
    </div>
  </dd>
  <dt><dfn method for="TextDetector"><code>create(optional TextDetectorCreateOptions |options|)</code></dfn></dt>
  <dd>
    <p>
      Returns a {{Promise}} that resolves with a new {{TextDetector}}
      instance.
    </p>
    <div class="note">
      This factory method handles the asynchronous initialization of the
      text detector, including downloading necessary resources. It is recommended
      to use this asynchronous method over the synchronous constructor
      to accommodate potential delays from dependency downloads or initialization,
      ensuring a smoother user experience.
    </div>
  </dd>
  <dt><dfn method for="TextDetector"><code>detect(ImageBitmapSource |image|)</code></dfn></dt>
  <dd>Tries to detect text blocks in the {{ImageBitmapSource}} |image|.</dd>
</dl>


### {{DetectedText}} ### {#detectedtext-section}

<xmp class="idl">
dictionary DetectedText {
  required DOMRectReadOnly boundingBox;
  required DOMString rawValue;
  required sequence<Point2D> cornerPoints;
};
</xmp>

<dl class="domintro">
  <dt><dfn dict-member for="DetectedText">`boundingBox`</dfn></dt>
  <dd>A rectangle indicating the position and extent of a detected feature aligned to the image</dd>

  <dt><dfn dict-member for="DetectedText">`rawValue`</dfn></dt>
  <dd>Raw string detected from the image, where characters are drawn from [[iso8859-1]].</dd>

  <dt><dfn dict-member for="DetectedText">`cornerPoints`</dfn></dt>
  <dd>A <a>sequence</a> of corner points of the detected feature, in clockwise direction and  starting with top-left. This is not necessarily a square due to possible perspective distortions.</dd>
</dl>

# Examples # {#examples}

<i>This section is non-normative.</i>


## Platform support for a text detector ## {#example-feature-detection}

<div class="example">
```js
if (!('TextDetector' in window)) {
  console.error('Text Detection not supported on this platform');
} else {
  const languages = ['en', 'es']; // English and Spanish
  TextDetector.availability({ languages: languages }).then(availability => {
    if (availability === 'unavailable') {
      console.log('Not all of the requested languages are supported.');
      return;
    }

    if (availability === 'downloadable') {
      console.log('Languages need to be downloaded first.');
    } else if (availability === 'downloading') {
      console.log('Languages are currently being downloaded.');
    } else {
      console.log('All requested languages are supported.');
    }

    // Now you can create a TextDetector with the supported languages.
    // If the status was 'downloadable' or 'downloading', create() will wait
    // for the download to finish before resolving.
    TextDetector.create({ languages: languages }).then(detector => {
      // ... use the detector
    });
  });
}
```
</div>

## Text Detection ## {#example-text-detection}


<div class="example">
```js
(async () => {
  // Assuming |theImage| is e.g. a <img> content, or a Blob.
  try {
    // The legacy synchronous constructor is still supported,
    // but the async create() method is recommended.
    // let textDetector = new TextDetector();

    let textDetector = await TextDetector.create();

    const detectedTextBlocks = await textDetector.detect(theImage);
    for (const textBlock of detectedTextBlocks) {
      console.log(
          `text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ` +
          `size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}`);
    }
  } catch (e) {
    console.error("Text Detection failed, boo.", e);
  }
})();
```
</div>

<pre class="link-defaults">
spec: html
    type: dfn
        text: allowed to show a popup
        text: in parallel
        text: incumbent settings object
spec: writing-assistance-apis
    type: enum
        text: Availability
</pre>

<pre class="biblio">
{
  "iso8859-1": {
      "href": "https://www.iso.org/standard/28245.html",
      "title": "Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1",
      "publisher": "ISO/IEC",
      "date": "April 1998"
  }
}
</pre>