Skip to content

Commit 4d9781c

Browse files
authored
Merge pull request #214172 from PatrickFarley/comvis-4-rest
[cog svcs] Comvis 4 rest
2 parents 9d1d9a1 + 738e6f5 commit 4d9781c

12 files changed

+70
-82
lines changed

articles/cognitive-services/Computer-vision/computer-vision-how-to-install-containers.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Install Read OCR Docker containers from Computer Vision
2+
title: Computer Vision 3.2 GA Read OCR container
33
titleSuffix: Azure Cognitive Services
4-
description: Use the Read OCR Docker containers from Computer Vision to extract text from images and documents, on-premises.
4+
description: Use the Read 3.2 OCR containers from Computer Vision to extract text from images and documents, on-premises.
55
services: cognitive-services
66
author: PatrickFarley
77
manager: nitinme
@@ -14,7 +14,7 @@ ms.custom: seodec18, cog-serv-seo-aug-2020
1414
keywords: on-premises, OCR, Docker, container
1515
---
1616

17-
# Install Read OCR Docker containers
17+
# Install Computer Vision 3.2 GA Read OCR container
1818

1919
[!INCLUDE [container hosting on the Microsoft Container Registry](../containers/includes/gated-container-hosting.md)]
2020

articles/cognitive-services/Computer-vision/concept-generating-thumbnails.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ The following table illustrates thumbnails defined by smart-cropping for the exa
4949

5050
The Computer Vision smart-cropping utility takes a given aspect ratio (or several) and returns the bounding box coordinates (in pixels) of the region(s) identified. Your app can then crop and return the image using those coordinates.
5151

52+
> [!IMPORTANT]
53+
> This feature uses face detection to help determine important regions in the image. The detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
54+
5255
---
5356

5457
## Use the API

articles/cognitive-services/Computer-vision/concept-ocr.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,13 @@ ms.author: pafarley
1818

1919
Version 4.0 of Image Analysis offers the ability to extract text from images. Contextual information like line number and position is also returned. Text reading is also available through the [OCR service](overview-ocr.md), but the latest model version is available through Image Analysis. This version is optimized for image inputs as opposed to documents.
2020

21-
> [!IMPORTANT]
22-
> you need Image Analysis version 4.0 to use this feature. Version 4.0 is currently available to resources in the following Azure regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
21+
[!INCLUDE [read-editions](./includes/read-editions.md)]
2322

2423
## Reading text example
2524

2625
The following JSON response illustrates what the Analyze API returns when reading text in the given image.
2726

28-
![Photo of a sticky note with writing on it.](./Images/handwritten-note.jpg).
27+
![Photo of a sticky note with writing on it.](./Images/handwritten-note.jpg)
2928

3029
```json
3130
{

articles/cognitive-services/Computer-vision/concept-people-detection.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@ ms.author: pafarley
1919
Version 4.0 of Image Analysis offers the ability to detect people appearing in images. The bounding box coordinates of each detected person are returned, along with a confidence score.
2020

2121
> [!IMPORTANT]
22-
> you need Image Analysis version 4.0 to use this feature. Version 4.0 is currently available to resources in the following Azure regions: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US.
22+
> We built this model by enhancing our object detection model for person detection scenarios. People detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
2323
2424
## People detection example
2525

2626
The following JSON response illustrates what the Analyze API returns when describing the example image based on its visual features.
2727

28-
![Photo of a woman in a kitchen.](./Images/windows-kitchen.jpg).
28+
![Photo of a woman in a kitchen.](./Images/windows-kitchen.jpg)
2929

3030
```json
3131
{

articles/cognitive-services/Computer-vision/faq.yml

Lines changed: 4 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,17 @@ summary: |
2020
2121
2222
sections:
23-
- name: General Computer Vision questions
23+
- name: Computer Vision API frequently asked questions
2424
questions:
2525
- question: |
2626
How can I increase the transactions-per-second (TPS) allowed by the service?
2727
answer: |
28-
The free (S0) tier only allows 20 transaction per minute. Upgrade to the S1 tier to get up to 30 transactions per second. If you're seeing the error code 429 and the "Too many requests" error message, [submit an Azure support ticket](https://azure.microsoft.com/support/create-ticket/) to raise your TPS to 50 or higher with a brief business justification. [Computer Vision pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/#pricing).
28+
The free (S0) tier only allows 20 transactions per minute. Upgrade to the S1 tier to get up to 30 transactions per second. If you're seeing the error code 429 and the "Too many requests" error message, [submit an Azure support ticket](https://azure.microsoft.com/support/create-ticket/) to raise your TPS to 50 or higher with a brief business justification. [Computer Vision pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/#pricing).
2929
3030
- question: |
3131
The service is throwing an error because my image file is too large. How can I work around this?
3232
answer: |
33-
The file size limit for most Computer Vision features is 4 MB, but the client library SDKs can handle files up to 6 MB. For Optical Character Recognition (OCR) that handles multi-page documents, the maximum file size is 50 MB. For more information, see the Image [Analysis inputs limits](overview-image-analysis.md#image-requirements) and [OCR input limits](how-to/call-read-api.md#input-requirements).
34-
35-
- question: |
36-
How can I process multi-page documents with OCR in a single call?
37-
answer: |
38-
Optical Character Recognition, specifically the Read operation, supports multi-page documents as the API input. If you call the API with a 10-page document, you'll be billed for 10 pages, with each page counted as a billable transaction. If you have the free (S0) tier, it can only process two pages at a time.
33+
The file size limit for most Computer Vision features is 4 MB for the 3.2 version of the API and 20MB for the 4.0 preview version, and the client library SDKs can handle files up to 6 MB. For more information, see the [Image Analysis input limits](overview-image-analysis.md#image-requirements).
3934
4035
- question: |
4136
Can I send multiple images in a single API call to the Computer Vision service?
@@ -46,19 +41,11 @@ sections:
4641
answer: |
4742
See the [Language support](language-support.md) page for the list of languages covered by Image Analysis and OCR.
4843
49-
- name: OCR service questions
50-
questions:
51-
- question: |
52-
How can I process multi-page documents with OCR in a single call?
53-
answer: |
54-
Optical Character Recognition, specifically the Read operation, supports multi-page documents as the API input. If you call the API with a 10-page document, you'll be billed for 10 pages, with each page counted as a billable transaction. Note that if you have the free (S0) tier, it can only process two pages at a time.
5544
- question: |
5645
Can I deploy the OCR (Read) capability on-premises?
5746
answer: |
58-
Yes, the OCR (Read) cloud API is also available as a Docker container for on-premises deployment. Learn [how to deploy the OCR containers](./computer-vision-how-to-install-containers.md).
47+
Yes, the Computer Vision 3.2 OCR (Read) cloud API is also available as a Docker container for on-premises deployment. Learn [how to deploy
5948
60-
- name: Image Analysis service questions
61-
questions:
6249
- question: |
6350
Can I train Computer Vision API to use custom tags? For example, I would like to feed in pictures of cat breeds to 'train' the AI, then receive the breed value on an AI request.
6451
answer: |

articles/cognitive-services/Computer-vision/how-to/call-analyze-image.md

Lines changed: 39 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -76,25 +76,19 @@ The Analyze API gives you access to all of the service's image analysis features
7676

7777
#### [REST](#tab/rest)
7878

79-
You can specify which features you want to use by setting the URL query parameters of the [Analyze API](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b). A parameter can have multiple values, separated by commas. Each feature you specify will require more computation time, so only specify what you need.
79+
You can specify which features you want to use by setting the URL query parameters of the [Analyze API](https://aka.ms/vision-4-0-ref). A parameter can have multiple values, separated by commas. Each feature you specify will require more computation time, so only specify what you need.
8080

8181
|URL parameter | Value | Description|
8282
|---|---|--|
83-
|`visualFeatures`|`Adult` | detects if the image is pornographic in nature (depicts nudity or a sex act), or is gory (depicts extreme violence or blood). Sexually suggestive content ("racy" content) is also detected.|
84-
|`visualFeatures`|`Brands` | detects various brands within an image, including the approximate location. The Brands argument is only available in English.|
85-
|`visualFeatures`|`Categories` | categorizes image content according to a taxonomy defined in documentation. This value is the default value of `visualFeatures`.|
86-
|`visualFeatures`|`Color` | determines the accent color, dominant color, and whether an image is black&white.|
87-
|`visualFeatures`|`Description` | describes the image content with a complete sentence in supported languages.|
88-
|`visualFeatures`|`Faces` | detects if faces are present. If present, generate coordinates, gender and age.|
89-
|`visualFeatures`|`ImageType` | detects if image is clip art or a line drawing.|
90-
|`visualFeatures`|`Objects` | detects various objects within an image, including the approximate location. The Objects argument is only available in English.|
91-
|`visualFeatures`|`Tags` | tags the image with a detailed list of words related to the image content.|
92-
|`details`| `Celebrities` | identifies celebrities if detected in the image.|
93-
|`details`|`Landmarks` |identifies landmarks if detected in the image.|
83+
|`features`|`Read` | reads the visible text in the image and outputs it as structured JSON data.|
84+
|`features`|`Description` | describes the image content with a complete sentence in supported languages.|
85+
|`features`|`SmartCrops` | finds the rectangle coordinates that would crop the image to a desired aspect ratio while preserving the area of interest.|
86+
|`features`|`Objects` | detects various objects within an image, including the approximate location. The Objects argument is only available in English.|
87+
|`features`|`Tags` | tags the image with a detailed list of words related to the image content.|
9488

9589
A populated URL might look like this:
9690

97-
`https://{endpoint}/vision/v2.1/analyze?visualFeatures=Description,Tags&details=Celebrities`
91+
`https://{endpoint}/computervision/imageanalysis:analyze?api-version=2022-10-12-preview&features=Tags`
9892

9993
#### [C#](#tab/csharp)
10094

@@ -143,7 +137,7 @@ The following URL query parameter specifies the language. The default value is `
143137

144138
A populated URL might look like this:
145139

146-
`https://{endpoint}/vision/v2.1/analyze?visualFeatures=Description,Tags&details=Celebrities&language=en`
140+
`https://{endpoint}/computervision/imageanalysis:analyze?api-version=2022-10-12-preview&features=Tags&language=en`
147141

148142
#### [C#](#tab/csharp)
149143

@@ -198,44 +192,41 @@ This section shows you how to parse the results of the API call. It includes the
198192
The service returns a `200` HTTP response, and the body contains the returned data in the form of a JSON string. The following text is an example of a JSON response.
199193

200194
```json
201-
{
202-
"tags":[
203-
{
204-
"name":"outdoor",
205-
"score":0.976
195+
{
196+
"metadata":
197+
{
198+
"width": 300,
199+
"height": 200
206200
},
207-
{
208-
"name":"bird",
209-
"score":0.95
201+
"tagsResult":
202+
{
203+
"values":
204+
[
205+
{
206+
"name": "grass",
207+
"confidence": 0.9960499405860901
208+
},
209+
{
210+
"name": "outdoor",
211+
"confidence": 0.9956876635551453
212+
},
213+
{
214+
"name": "building",
215+
"confidence": 0.9893627166748047
216+
},
217+
{
218+
"name": "property",
219+
"confidence": 0.9853052496910095
220+
},
221+
{
222+
"name": "plant",
223+
"confidence": 0.9791355729103088
224+
}
225+
]
210226
}
211-
],
212-
"description":{
213-
"tags":[
214-
"outdoor",
215-
"bird"
216-
],
217-
"captions":[
218-
{
219-
"text":"partridge in a pear tree",
220-
"confidence":0.96
221-
}
222-
]
223-
}
224227
}
225228
```
226229

227-
See the following table for explanations of the fields in this example:
228-
229-
Field | Type | Content
230-
------|------|------|
231-
Tags | `object` | The top-level object for an array of tags.
232-
tags[].Name | `string` | The keyword from the tags classifier.
233-
tags[].Score | `number` | The confidence score, between 0 and 1.
234-
description | `object` | The top-level object for an image description.
235-
description.tags[] | `string` | The list of tags. If there is insufficient confidence in the ability to produce a caption, the tags might be the only information available to the caller.
236-
description.captions[].text | `string` | A phrase describing the image.
237-
description.captions[].confidence | `number` | The confidence score for the phrase.
238-
239230
### Error codes
240231

241232
See the following list of possible errors and their causes:
@@ -292,4 +283,4 @@ The following code calls the Image Analysis API and prints the results to the co
292283
## Next steps
293284

294285
* Explore the [concept articles](../concept-object-detection.md) to learn more about each feature.
295-
* See the [API reference](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b) to learn more about the API functionality.
286+
* See the [API reference](https://aka.ms/vision-4-0-ref) to learn more about the API functionality.

articles/cognitive-services/Computer-vision/how-to/call-read-api.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ ms.author: pafarley
1616

1717
# Call the Computer Vision 3.2 GA Read API
1818

19-
[!INCLUDE [read-editions](../includes/read-editions.md)]
20-
2119
In this guide, you'll learn how to call the v3.2 GA Read API to extract text from images. You'll learn the different ways you can configure the behavior of this API to meet your needs. This guide assumes you have already <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesComputerVision" title="created a Computer Vision resource" target="_blank">create a Computer Vision resource </a> and obtained a key and endpoint URL. If you haven't, follow a [quickstart](../quickstarts-sdk/client-library.md) to get started.
2220

21+
[!INCLUDE [read-editions](../includes/read-editions.md)]
22+
2323
## Input requirements
2424

2525
The **Read** call takes images and documents as its input. They have the following requirements:
@@ -43,7 +43,7 @@ When using the Read operation, use the following values for the optional `model-
4343
| latest | Latest GA model|
4444
| [2022-04-30](../whats-new.md#may-2022) | Latest GA model. 164 languages for print text and 9 languages for handwritten text along with several enhancements on quality and performance |
4545
| [2022-01-30-preview](../whats-new.md#february-2022) | Preview model adds print text support for Hindi, Arabic and related languages. For handwritten text, adds support for Japanese and Korean. |
46-
| [2021-09-30-preview](../whats-new.md#september-2021) | Preview model adds print text support for Russian and other Cyrillic languages, For handwritten text, adds support for Chinese Simplified, French, German, Italian, Portuguese, and Spanish. |
46+
| [2021-09-30-preview](../whats-new.md#september-2021) | Preview model adds print text support for Russian and other Cyrillic languages. For handwritten text, adds support for Chinese Simplified, French, German, Italian, Portuguese, and Spanish. |
4747
| 2021-04-12 | 2021 GA model |
4848

4949
### Input language

articles/cognitive-services/Computer-vision/includes/read-editions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ms.author: pafarley
1919
>
2020
> | Input | Examples | Suggested API | Benefits |
2121
> |----------|--------------|-------------------------|-------------------------|
22-
> | General in-the-wild images with single image at a time | labels, street signs, and posters | [Image&nbsp;Analysis Read (preview)](/azure/cognitive-services/computer-vision/how-to/concept-ocr) | Optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR powered experiences in your workflows.
22+
> | General in-the-wild images with single image at a time | labels, street signs, and posters | [Image&nbsp;Analysis Read&nbsp;(preview)](/azure/cognitive-services/computer-vision/concept-ocr) | Optimized for general, non-document images with a performance-enhanced synchronous API that makes it easier to embed OCR powered experiences in your workflows.
2323
> | Scanned document images, digital and scanned documents including embedded images| books, reports, and forms | [Form&nbsp;Recognizer Read](/azure/applied-ai-services/form-recognizer/concept-read) | Optimized for text-heavy scanned and digital document scenarios with asynchronous API to allow processing large documents in your workflows.
2424
>
2525
> **Computer Vision 3.2 GA Read**

articles/cognitive-services/Computer-vision/index.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ conceptualContent:
8080
url: /training/modules/analyze-images-computer-vision/
8181
- itemType: reference
8282
text: Image Analysis API reference
83-
url: https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b
83+
url: https://aka.ms/vision-4-0-ref
8484
footerLink:
8585
text: More
8686
url: index-image-analysis.yml

articles/cognitive-services/Computer-vision/overview-image-analysis.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ keywords: computer vision, computer vision applications, computer vision service
1919

2020
The Computer Vision Image Analysis service can extract a wide variety of visual features from your images. For example, it can determine whether an image contains adult content, find specific brands or objects, or find human faces.
2121

22-
The latest version of Image Analysis, 4.0, has new features like OCR and people detection, and it uses updated models that have achieved human parity in certain recognition tasks. If your resource belongs to one of the regions enabled for 4.0 (East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US), we recommend you use this version going forward.
22+
The latest version of Image Analysis, 4.0, which is now in public preview, has new features like synchronous OCR and people detection. We recommend you use this version going forward.
2323

24-
You can use Image Analysis through a client library SDK or by calling the [REST API](https://westcentralus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-ga/operations/5d986960601faab4bf452005) directly. Follow the [quickstart](quickstarts-sdk/image-analysis-client-library.md) to get started.
24+
You can use Image Analysis through a client library SDK or by calling the [REST API](https://aka.ms/vision-4-0-ref) directly. Follow the [quickstart](quickstarts-sdk/image-analysis-client-library.md) to get started.
2525

2626
> [!div class="nextstepaction"]
2727
> [Quickstart](quickstarts-sdk/image-analysis-client-library.md)

0 commit comments

Comments
 (0)