You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Computer-vision/concept-generating-thumbnails.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,9 @@ The following table illustrates thumbnails defined by smart-cropping for the exa
49
49
50
50
The Computer Vision smart-cropping utility takes a given aspect ratio (or several) and returns the bounding box coordinates (in pixels) of the region(s) identified. Your app can then crop and return the image using those coordinates.
51
51
52
+
> [!IMPORTANT]
53
+
> This feature uses face detection to help determine important regions in the image. The detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Computer-vision/concept-ocr.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,14 +18,13 @@ ms.author: pafarley
18
18
19
19
Version 4.0 of Image Analysis offers the ability to extract text from images. Contextual information like line number and position is also returned. Text reading is also available through the [OCR service](overview-ocr.md), but the latest model version is available through Image Analysis. This version is optimized for image inputs as opposed to documents.
20
20
21
-
> [!IMPORTANT]
22
-
> you need Image Analysis version 4.0 to use this feature. Version 4.0 is currently available to resources in the following Azure regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Computer-vision/concept-people-detection.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,13 +19,13 @@ ms.author: pafarley
19
19
Version 4.0 of Image Analysis offers the ability to detect people appearing in images. The bounding box coordinates of each detected person are returned, along with a confidence score.
20
20
21
21
> [!IMPORTANT]
22
-
> you need Image Analysis version 4.0 to use this feature. Version 4.0 is currently available to resources in the following Azure regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
22
+
> We built this model by enhancing our object detection model for person detection scenarios. People detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
23
23
24
24
## People detection example
25
25
26
26
The following JSON response illustrates what the Analyze API returns when describing the example image based on its visual features.
27
27
28
-
.
28
+

Copy file name to clipboardExpand all lines: articles/cognitive-services/Computer-vision/faq.yml
+4-17Lines changed: 4 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -20,22 +20,17 @@ summary: |
20
20
21
21
22
22
sections:
23
-
- name: General Computer Vision questions
23
+
- name: Computer Vision API frequently asked questions
24
24
questions:
25
25
- question: |
26
26
How can I increase the transactions-per-second (TPS) allowed by the service?
27
27
answer: |
28
-
The free (S0) tier only allows 20 transaction per minute. Upgrade to the S1 tier to get up to 30 transactions per second. If you're seeing the error code 429 and the "Too many requests" error message, [submit an Azure support ticket](https://azure.microsoft.com/support/create-ticket/) to raise your TPS to 50 or higher with a brief business justification. [Computer Vision pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/#pricing).
28
+
The free (S0) tier only allows 20 transactions per minute. Upgrade to the S1 tier to get up to 30 transactions per second. If you're seeing the error code 429 and the "Too many requests" error message, [submit an Azure support ticket](https://azure.microsoft.com/support/create-ticket/) to raise your TPS to 50 or higher with a brief business justification. [Computer Vision pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/#pricing).
29
29
30
30
- question: |
31
31
The service is throwing an error because my image file is too large. How can I work around this?
32
32
answer: |
33
-
The file size limit for most Computer Vision features is 4 MB, but the client library SDKs can handle files up to 6 MB. For Optical Character Recognition (OCR) that handles multi-page documents, the maximum file size is 50 MB. For more information, see the Image [Analysis inputs limits](overview-image-analysis.md#image-requirements) and [OCR input limits](how-to/call-read-api.md#input-requirements).
34
-
35
-
- question: |
36
-
How can I process multi-page documents with OCR in a single call?
37
-
answer: |
38
-
Optical Character Recognition, specifically the Read operation, supports multi-page documents as the API input. If you call the API with a 10-page document, you'll be billed for 10 pages, with each page counted as a billable transaction. If you have the free (S0) tier, it can only process two pages at a time.
33
+
The file size limit for most Computer Vision features is 4 MB for the 3.2 version of the API and 20MB for the 4.0 preview version, and the client library SDKs can handle files up to 6 MB. For more information, see the [Image Analysis input limits](overview-image-analysis.md#image-requirements).
39
34
40
35
- question: |
41
36
Can I send multiple images in a single API call to the Computer Vision service?
@@ -46,19 +41,11 @@ sections:
46
41
answer: |
47
42
See the [Language support](language-support.md) page for the list of languages covered by Image Analysis and OCR.
48
43
49
-
- name: OCR service questions
50
-
questions:
51
-
- question: |
52
-
How can I process multi-page documents with OCR in a single call?
53
-
answer: |
54
-
Optical Character Recognition, specifically the Read operation, supports multi-page documents as the API input. If you call the API with a 10-page document, you'll be billed for 10 pages, with each page counted as a billable transaction. Note that if you have the free (S0) tier, it can only process two pages at a time.
55
44
- question: |
56
45
Can I deploy the OCR (Read) capability on-premises?
57
46
answer: |
58
-
Yes, the OCR (Read) cloud API is also available as a Docker container for on-premises deployment. Learn [how to deploy the OCR containers](./computer-vision-how-to-install-containers.md).
47
+
Yes, the Computer Vision 3.2 OCR (Read) cloud API is also available as a Docker container for on-premises deployment. Learn [how to deploy
59
48
60
-
- name: Image Analysis service questions
61
-
questions:
62
49
- question: |
63
50
Can I train Computer Vision API to use custom tags? For example, I would like to feed in pictures of cat breeds to 'train' the AI, then receive the breed value on an AI request.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Computer-vision/how-to/call-analyze-image.md
+39-48Lines changed: 39 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -76,25 +76,19 @@ The Analyze API gives you access to all of the service's image analysis features
76
76
77
77
#### [REST](#tab/rest)
78
78
79
-
You can specify which features you want to use by setting the URL query parameters of the [Analyze API](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b). A parameter can have multiple values, separated by commas. Each feature you specify will require more computation time, so only specify what you need.
79
+
You can specify which features you want to use by setting the URL query parameters of the [Analyze API](https://aka.ms/vision-4-0-ref). A parameter can have multiple values, separated by commas. Each feature you specify will require more computation time, so only specify what you need.
80
80
81
81
|URL parameter | Value | Description|
82
82
|---|---|--|
83
-
|`visualFeatures`|`Adult`| detects if the image is pornographic in nature (depicts nudity or a sex act), or is gory (depicts extreme violence or blood). Sexually suggestive content ("racy" content) is also detected.|
84
-
|`visualFeatures`|`Brands`| detects various brands within an image, including the approximate location. The Brands argument is only available in English.|
85
-
|`visualFeatures`|`Categories`| categorizes image content according to a taxonomy defined in documentation. This value is the default value of `visualFeatures`.|
86
-
|`visualFeatures`|`Color`| determines the accent color, dominant color, and whether an image is black&white.|
87
-
|`visualFeatures`|`Description`| describes the image content with a complete sentence in supported languages.|
88
-
|`visualFeatures`|`Faces`| detects if faces are present. If present, generate coordinates, gender and age.|
89
-
|`visualFeatures`|`ImageType`| detects if image is clip art or a line drawing.|
90
-
|`visualFeatures`|`Objects`| detects various objects within an image, including the approximate location. The Objects argument is only available in English.|
91
-
|`visualFeatures`|`Tags`| tags the image with a detailed list of words related to the image content.|
92
-
|`details`|`Celebrities`| identifies celebrities if detected in the image.|
93
-
|`details`|`Landmarks`|identifies landmarks if detected in the image.|
83
+
|`features`|`Read`| reads the visible text in the image and outputs it as structured JSON data.|
84
+
|`features`|`Description`| describes the image content with a complete sentence in supported languages.|
85
+
|`features`|`SmartCrops`| finds the rectangle coordinates that would crop the image to a desired aspect ratio while preserving the area of interest.|
86
+
|`features`|`Objects`| detects various objects within an image, including the approximate location. The Objects argument is only available in English.|
87
+
|`features`|`Tags`| tags the image with a detailed list of words related to the image content.|
@@ -198,44 +192,41 @@ This section shows you how to parse the results of the API call. It includes the
198
192
The service returns a `200` HTTP response, and the body contains the returned data in the form of a JSON string. The following text is an example of a JSON response.
199
193
200
194
```json
201
-
{
202
-
"tags":[
203
-
{
204
-
"name":"outdoor",
205
-
"score":0.976
195
+
{
196
+
"metadata":
197
+
{
198
+
"width": 300,
199
+
"height": 200
206
200
},
207
-
{
208
-
"name":"bird",
209
-
"score":0.95
201
+
"tagsResult":
202
+
{
203
+
"values":
204
+
[
205
+
{
206
+
"name": "grass",
207
+
"confidence": 0.9960499405860901
208
+
},
209
+
{
210
+
"name": "outdoor",
211
+
"confidence": 0.9956876635551453
212
+
},
213
+
{
214
+
"name": "building",
215
+
"confidence": 0.9893627166748047
216
+
},
217
+
{
218
+
"name": "property",
219
+
"confidence": 0.9853052496910095
220
+
},
221
+
{
222
+
"name": "plant",
223
+
"confidence": 0.9791355729103088
224
+
}
225
+
]
210
226
}
211
-
],
212
-
"description":{
213
-
"tags":[
214
-
"outdoor",
215
-
"bird"
216
-
],
217
-
"captions":[
218
-
{
219
-
"text":"partridge in a pear tree",
220
-
"confidence":0.96
221
-
}
222
-
]
223
-
}
224
227
}
225
228
```
226
229
227
-
See the following table for explanations of the fields in this example:
228
-
229
-
Field | Type | Content
230
-
------|------|------|
231
-
Tags | `object` | The top-level object for an array of tags.
232
-
tags[].Name | `string` | The keyword from the tags classifier.
233
-
tags[].Score | `number` | The confidence score, between 0 and 1.
234
-
description | `object` | The top-level object for an image description.
235
-
description.tags[] | `string` | The list of tags. If there is insufficient confidence in the ability to produce a caption, the tags might be the only information available to the caller.
236
-
description.captions[].text | `string` | A phrase describing the image.
237
-
description.captions[].confidence | `number` | The confidence score for the phrase.
238
-
239
230
### Error codes
240
231
241
232
See the following list of possible errors and their causes:
@@ -292,4 +283,4 @@ The following code calls the Image Analysis API and prints the results to the co
292
283
## Next steps
293
284
294
285
* Explore the [concept articles](../concept-object-detection.md) to learn more about each feature.
295
-
* See the [API reference](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b) to learn more about the API functionality.
286
+
* See the [API reference](https://aka.ms/vision-4-0-ref) to learn more about the API functionality.
In this guide, you'll learn how to call the v3.2 GA Read API to extract text from images. You'll learn the different ways you can configure the behavior of this API to meet your needs. This guide assumes you have already <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision"title="created a Computer Vision resource"target="_blank">create a Computer Vision resource </a> and obtained a key and endpoint URL. If you haven't, follow a [quickstart](../quickstarts-sdk/client-library.md) to get started.
The **Read** call takes images and documents as its input. They have the following requirements:
@@ -43,7 +43,7 @@ When using the Read operation, use the following values for the optional `model-
43
43
| latest | Latest GA model|
44
44
|[2022-04-30](../whats-new.md#may-2022)| Latest GA model. 164 languages for print text and 9 languages for handwritten text along with several enhancements on quality and performance |
45
45
|[2022-01-30-preview](../whats-new.md#february-2022)| Preview model adds print text support for Hindi, Arabic and related languages. For handwritten text, adds support for Japanese and Korean. |
46
-
|[2021-09-30-preview](../whats-new.md#september-2021)| Preview model adds print text support for Russian and other Cyrillic languages, For handwritten text, adds support for Chinese Simplified, French, German, Italian, Portuguese, and Spanish. |
46
+
|[2021-09-30-preview](../whats-new.md#september-2021)| Preview model adds print text support for Russian and other Cyrillic languages. For handwritten text, adds support for Chinese Simplified, French, German, Italian, Portuguese, and Spanish. |
> | General in-the-wild images with single image at a time | labels, street signs, and posters | [Image Analysis Read(preview)](/azure/cognitive-services/computer-vision/how-to/concept-ocr) | Optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR powered experiences in your workflows.
22
+
> | General in-the-wild images with single image at a time | labels, street signs, and posters | [Image Analysis Read (preview)](/azure/cognitive-services/computer-vision/concept-ocr) | Optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR powered experiences in your workflows.
23
23
> | Scanned document images, digital and scanned documents including embedded images| books, reports, and forms | [Form Recognizer Read](/azure/applied-ai-services/form-recognizer/concept-read) | Optimized for text-heavy scanned and digital document scenarios with asynchronous API to allow processing large documents in your workflows.
The Computer Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.
21
21
22
-
The latest version of Image Analysis, 4.0, has new features like OCR and people detection, and it uses updated models that have achieved human parity in certain recognition tasks. If your resource belongs to one of the regions enabled for 4.0 (East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US), we recommend you use this version going forward.
22
+
The latest version of Image Analysis, 4.0, which is now in public preview, has new features like synchronous OCR and people detection. We recommend you use this version going forward.
23
23
24
-
You can use Image Analysis through a client library SDK or by calling the [REST API](https://westcentralus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-ga/operations/5d986960601faab4bf452005) directly. Follow the [quickstart](quickstarts-sdk/image-analysis-client-library.md) to get started.
24
+
You can use Image Analysis through a client library SDK or by calling the [REST API](https://aka.ms/vision-4-0-ref) directly. Follow the [quickstart](quickstarts-sdk/image-analysis-client-library.md) to get started.
0 commit comments