Skip to content

Commit eee3200

Browse files
Merge pull request #224596 from PatrickFarley/comvis-4
[cog svcs] Comvis 4
2 parents 0fb7704 + 16bd5ec commit eee3200

File tree

88 files changed

+3713
-653
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+3713
-653
lines changed

.openpublishing.publish.config.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -914,6 +914,12 @@
914914
"branch": "main",
915915
"branch_mapping": {}
916916
},
917+
{
918+
"path_to_root": "azure-ai-vision-sdk",
919+
"url": "https://github.com/Azure-Samples/azure-ai-vision-sdk",
920+
"branch": "main",
921+
"branch_mapping": {}
922+
},
917923
{
918924
"path_to_root": "azure-cache-redis-samples",
919925
"url": "https://github.com/Azure-Samples/azure-cache-redis-samples",
557 KB
Loading
558 KB
Loading
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
title: Background removal - Image Analysis
3+
titleSuffix: Azure Cognitive Services
4+
description: Learn about background removal, an operation of Image Analysis
5+
services: cognitive-services
6+
author: PatrickFarley
7+
manager: nitinme
8+
9+
ms.service: cognitive-services
10+
ms.subservice: computer-vision
11+
ms.topic: conceptual
12+
ms.date: 03/02/2023
13+
ms.author: pafarley
14+
ms.custom: references_regions
15+
---
16+
17+
# Background removal (version 4.0 preview)
18+
19+
The Image Analysis service can divide images into multiple segments or regions to help the user identify different objects or parts of the image. Background removal creates an alpha matte that separates the foreground object from the background in an image.
20+
21+
> [!div class="nextstepaction"]
22+
> [Call the Background removal API](./how-to/background-removal.md)
23+
24+
This feature provides two possible outputs based on the customer's needs:
25+
26+
- The foreground object of the image without the background. This edited image shows the foreground object and makes the background transparent, allowing the foreground to be placed on a new background.
27+
- An alpha matte that shows the opacity of the detected foreground object. This matte can be used to separate the foreground object from the background for further processing.
28+
29+
This service is currently in preview, and the API may change in the future.
30+
31+
## Background removal examples
32+
33+
The following example images illustrate what the Image Analysis service returns when removing the background of an image and creating an alpha matte.
34+
35+
36+
|Original image |With background removed |Alpha matte |
37+
|---------|---------|---------|
38+
39+
| | | |
40+
|---------|---------|---------|
41+
| :::image type="content" source="media/background-removal/building-1.png" alt-text="Photo of a city near water."::: | :::image type="content" source="media/background-removal/building-1-result.png" alt-text="Photo of a city near water; sky is transparent."::: | :::image type="content" source="media/background-removal/building-1-matte.png" alt-text="Alpha matte of a city skyline."::: |
42+
| :::image type="content" source="media/background-removal/person-5.png" alt-text="Photo of a group of people using a tablet."::: | :::image type="content" source="media/background-removal/person-5-result.png" alt-text="Photo of a group of people using a tablet; background is transparent."::: | :::image type="content" source="media/background-removal/person-5-matte.png" alt-text="Alpha matte of a group of people."::: |
43+
| :::image type="content" source="media/background-removal/bears.png" alt-text="Photo of a group of bears in the woods."::: | :::image type="content" source="media/background-removal/bears-result.png" alt-text="Photo of a group of bears; background is transparent."::: | :::image type="content" source="media/background-removal/bears-alpha.png" alt-text="Alpha matte of a group of bears."::: |
44+
45+
46+
## Limitations
47+
48+
It's important to note the limitations of background removal:
49+
50+
* Background removal works best for categories such as people and animals, buildings and environmental structures, furniture, vehicles, food, text and graphics, and personal belongings.
51+
* Objects that aren't prominent in the foreground may not be identified as part of the foreground.
52+
* Images with thin and detailed structures, like hair or fur, may show some artifacts when overlaid on backgrounds with strong contrast to the original background.
53+
* The latency of the background removal operation will be higher, up to several seconds, for large images. We suggest you experiment with integrating both modes into your workflow to find the best usage for your needs (for instance, calling background removal on the original image versus calling foreground matting on a downsampled version of the image, then resizing the alpha matte to the original size and applying it to the original image).
54+
55+
## Use the API
56+
57+
The background removal feature is available through the [Image Analysis - Segment](https://aka.ms/vision-4-0-ref) API (`imageanalysis:segment`). You can call this API through REST calls. See the [Background removal how-to guide](./how-to/background-removal.md) for more information.
58+
59+
## Next steps
60+
61+
* [Call the background removal API](./how-to/background-removal.md)
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
title: Image captions - Image Analysis 4.0
3+
titleSuffix: Azure Cognitive Services
4+
description: Concepts related to the image captioning feature of the Image Analysis 4.0 API.
5+
services: cognitive-services
6+
author: PatrickFarley
7+
manager: nitinme
8+
9+
ms.service: cognitive-services
10+
ms.subservice: computer-vision
11+
ms.topic: conceptual
12+
ms.date: 01/24/2023
13+
ms.author: pafarley
14+
ms.custom: seodec18, ignite-2022, references_regions
15+
---
16+
17+
# Image captions (version 4.0 preview)
18+
Image captions in Image Analysis 4.0 (preview) are available through the Caption and Dense Captions features.
19+
20+
Caption generates a one sentence description for all image contents. Dense Captions provides more detail by generating one sentence descriptions of up to 10 regions of the image in addition to describing the whole image. Dense Captions also returns bounding box coordinates of the described image regions. Both these features use the latest groundbreaking Florence based AI models.
21+
22+
At this time, image captioning is available in English language only.
23+
24+
### Gender-neutral captions
25+
All captions contain gender terms: "man", "woman", "boy" and "girl" by default. You have the option to replace these terms with "person" in your results and receive gender-neutral captions. You can do so by setting the optional API request parameter, **gender-neutral-caption** to `true` in the request URL.
26+
27+
> [!IMPORTANT]
28+
> Image captioning in Image Analysis 4.0 is only available in the following Azure data center regions at this time: East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, West US. You must use a Computer Vision resource located in one of these regions to get results from Caption and Dense Captions features.
29+
>
30+
> If you have to use a Computer Vision resource outside these regions to generate image captions, please use [Image Analysis 3.2](concept-describing-images.md) which is available in all Computer Vision regions.
31+
32+
33+
Try out the image captioning features quickly and easily in your browser using Vision Studio.
34+
35+
> [!div class="nextstepaction"]
36+
> [Try Vision Studio](https://portal.vision.cognitive.azure.com/)
37+
38+
## Caption example
39+
40+
#### [Caption](#tab/image)
41+
42+
The following JSON response illustrates what the Analysis 4.0 API returns when describing the example image based on its visual features.
43+
44+
![Photo of a man pointing at a screen](./Media/quickstarts/presentation.png)
45+
46+
```json
47+
"captions": [
48+
{
49+
"text": "a man pointing at a screen",
50+
"confidence": 0.4891590476036072
51+
}
52+
]
53+
```
54+
55+
#### [Dense Captions](#tab/dense)
56+
57+
The following JSON response illustrates what the Analysis 4.0 API returns when generating dense captions for the example image.
58+
59+
![Photo of a tractor on a farm](./Images/farm.png)
60+
61+
```json
62+
{
63+
"denseCaptionsResult": {
64+
"values": [
65+
{
66+
"text": "a man driving a tractor in a farm",
67+
"confidence": 0.535620927810669,
68+
"boundingBox": {
69+
"x": 0,
70+
"y": 0,
71+
"w": 850,
72+
"h": 567
73+
}
74+
},
75+
{
76+
"text": "a man driving a tractor in a field",
77+
"confidence": 0.5428450107574463,
78+
"boundingBox": {
79+
"x": 132,
80+
"y": 266,
81+
"w": 209,
82+
"h": 219
83+
}
84+
},
85+
{
86+
"text": "a blurry image of a tree",
87+
"confidence": 0.5139822363853455,
88+
"boundingBox": {
89+
"x": 147,
90+
"y": 126,
91+
"w": 76,
92+
"h": 131
93+
}
94+
},
95+
{
96+
"text": "a man riding a tractor",
97+
"confidence": 0.4799223840236664,
98+
"boundingBox": {
99+
"x": 206,
100+
"y": 264,
101+
"w": 64,
102+
"h": 97
103+
}
104+
},
105+
{
106+
"text": "a blue sky above a hill",
107+
"confidence": 0.35495415329933167,
108+
"boundingBox": {
109+
"x": 0,
110+
"y": 0,
111+
"w": 837,
112+
"h": 166
113+
}
114+
},
115+
{
116+
"text": "a tractor in a field",
117+
"confidence": 0.47338250279426575,
118+
"boundingBox": {
119+
"x": 0,
120+
"y": 243,
121+
"w": 838,
122+
"h": 311
123+
}
124+
}
125+
]
126+
},
127+
"modelVersion": "2023-02-01-preview",
128+
"metadata": {
129+
"width": 850,
130+
"height": 567
131+
}
132+
}
133+
```
134+
135+
---
136+
137+
## Use the API
138+
139+
#### [Image captions](#tab/image)
140+
141+
The image captioning feature is part of the [Analyze Image](https://aka.ms/vision-4-0-ref) API. Include `Caption` in the **features** query parameter. Then, when you get the full JSON response, parse the string for the contents of the `"captionResult"` section.
142+
143+
#### [Dense captions](#tab/dense)
144+
145+
The dense captioning feature is part of the [Analyze Image](https://aka.ms/vision-4-0-ref) API. You can call this API using REST. Include `denseCaptions` in the **features** query parameter. Then, when you get the full JSON response, parse the string for the contents of the `"denseCaptionsResult"` section.
146+
147+
---
148+
149+
## Next steps
150+
151+
* Learn the related concept of [object detection](concept-object-detection-40.md).
152+
* [Quickstart: Image Analysis REST API or client libraries](./quickstarts-sdk/image-analysis-client-library-40.md?pivots=programming-language-csharp)
153+
* [Call the Analyze Image API](./how-to/call-analyze-image-40.md)

articles/cognitive-services/Computer-vision/concept-describing-images.md

Lines changed: 1 addition & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.author: pafarley
1414
ms.custom: seodec18, ignite-2022
1515
---
1616

17-
# Image description generation
17+
# Image descriptions
1818

1919
Computer Vision can analyze an image and generate a human-readable phrase that describes its contents. The algorithm returns several descriptions based on different visual features, and each description is given a confidence score. The final output is a list of descriptions ordered from highest to lowest confidence.
2020

@@ -31,8 +31,6 @@ The following JSON response illustrates what the Analyze API returns when descri
3131

3232
![A black and white picture of buildings in Manhattan](./Images/bw_buildings.png)
3333

34-
#### [Version 3.2](#tab/3-2)
35-
3634
```json
3735
{
3836
"description":{
@@ -57,41 +55,12 @@ The following JSON response illustrates what the Analyze API returns when descri
5755
"modelVersion":"2021-05-01"
5856
}
5957
```
60-
#### [Version 4.0](#tab/4-0)
61-
62-
```json
63-
{
64-
"metadata":
65-
{
66-
"width": 239,
67-
"height": 300
68-
},
69-
"descriptionResult":
70-
{
71-
"values":
72-
[
73-
{
74-
"text": "a city with tall buildings",
75-
"confidence": 0.3551448881626129
76-
}
77-
]
78-
}
79-
}
80-
```
81-
---
8258

8359
## Use the API
8460

85-
#### [Version 3.2](#tab/3-2)
8661

8762
The image description feature is part of the [Analyze Image](https://westcentralus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f21b) API. You can call this API through a native SDK or through REST calls. Include `Description` in the **visualFeatures** query parameter. Then, when you get the full JSON response, parse the string for the contents of the `"description"` section.
8863

89-
#### [Version 4.0](#tab/4-0)
90-
91-
The image description feature is part of the [Analyze Image](https://aka.ms/vision-4-0-ref) API. You can call this API using REST. Include `Description` in the **features** query parameter. Then, when you get the full JSON response, parse the string for the contents of the `"description"` section.
92-
93-
---
94-
9564
* [Quickstart: Image Analysis REST API or client libraries](./quickstarts-sdk/image-analysis-client-library.md?pivots=programming-language-csharp)
9665

9766
## Next steps
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: Smart-cropped thumbnails - Image Analysis 4.0
3+
titleSuffix: Azure Cognitive Services
4+
description: Concepts related to generating thumbnails for images using the Image Analysis 4.0 API.
5+
services: cognitive-services
6+
author: PatrickFarley
7+
manager: nitinme
8+
9+
ms.service: cognitive-services
10+
ms.subservice: computer-vision
11+
ms.topic: conceptual
12+
ms.date: 01/24/2023
13+
ms.author: pafarley
14+
ms.custom: seodec18, ignite-2022
15+
---
16+
17+
# Smart-cropped thumbnails (version 4.0 preview)
18+
19+
A thumbnail is a reduced-size representation of an image. Thumbnails are used to represent images and other data in a more economical, layout-friendly way. The Computer Vision API uses smart cropping to create intuitive image thumbnails that include the most important regions of an image with priority given to any detected faces.
20+
21+
The Computer Vision smart-cropping utility takes one or more aspect ratios in the range [0.75, 1.80] and returns the bounding box coordinates (in pixels) of the region(s) identified. Your app can then crop and return the image using those coordinates.
22+
23+
> [!IMPORTANT]
24+
> This feature uses face detection to help determine important regions in the image. The detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
25+
26+
## Examples
27+
28+
The generated bounding box can vary widely depending on what you specify for aspect ratio, as shown in the following images.
29+
30+
| Aspect ratio | Bounding box |
31+
|-------|-----------|
32+
| original | :::image type="content" source="Images/cropped-original.png" alt-text="Photo of a man with a dog at a table."::: |
33+
| 0.75 | :::image type="content" source="Images/cropped-075-bb.png" alt-text="Photo of a man with a dog at a table. A 0.75 ratio bounding box is drawn."::: |
34+
| 1.00 | :::image type="content" source="Images/cropped-1-0-bb.png" alt-text="Photo of a man with a dog at a table. A 1.00 ratio bounding box is drawn."::: |
35+
| 1.50 | :::image type="content" source="Images/cropped-150-bb.png" alt-text="Photo of a man with a dog at a table. A 1.50 ratio bounding box is drawn."::: |
36+
37+
38+
## Use the API
39+
40+
The smart cropping feature is available through the [Analyze Image API](https://aka.ms/vision-4-0-ref). Include `SmartCrops` in the **features** query parameter. Also include a **smartcrops-aspect-ratios** query parameter, and set it to a decimal value for the aspect ratio you want (defined as width / height) in the range [0.75, 1.80]. Multiple aspect ratio values should be comma-separated. If no aspect ratio value is provided the API will return a crop with an aspect ratio that best preserves the image’s most important region.
41+
42+
## Next steps
43+
44+
* [Call the Analyze Image API](./how-to/call-analyze-image-40.md)

articles/cognitive-services/Computer-vision/concept-generating-thumbnails.md

Lines changed: 0 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ ms.custom: seodec18, ignite-2022
1818

1919
A thumbnail is a reduced-size representation of an image. Thumbnails are used to represent images and other data in a more economical, layout-friendly way. The Computer Vision API uses smart cropping to create intuitive image thumbnails that include the most important regions of an image with priority given to any detected faces.
2020

21-
#### [Version 3.2](#tab/3-2)
2221
The Computer Vision thumbnail generation algorithm works as follows:
2322

2423
1. Remove distracting elements from the image and identify the _area of interest_—the area of the image in which the main object(s) appears.
@@ -45,37 +44,11 @@ The following table illustrates thumbnails defined by smart-cropping for the exa
4544
|![A white flower with a green background](./Images/flower.png) | ![Vision Analyze Flower thumbnail](./Images/flower_thumbnail.png) |
4645
|![A woman on the roof of an apartment building](./Images/woman_roof.png) | ![thumbnail of a woman on the roof of an apartment building](./Images/woman_roof_thumbnail.png) |
4746

48-
#### [Version 4.0](#tab/4-0)
49-
50-
The Computer Vision smart-cropping utility takes one or more aspect ratios in the range [0.75, 1.80] and returns the bounding box coordinates (in pixels) of the region(s) identified. Your app can then crop and return the image using those coordinates.
51-
52-
> [!IMPORTANT]
53-
> This feature uses face detection to help determine important regions in the image. The detection does not involve distinguishing one face from another face, predicting or classifying facial attributes, or creating a facial template (a unique set of numbers generated from an image that represents the distinctive features of a face).
54-
55-
## Examples
56-
57-
The generated bounding box can vary widely depending on what you specify for aspect ratio, as shown in the following images.
58-
59-
| Aspect ratio | Bounding box |
60-
|-------|-----------|
61-
| original | :::image type="content" source="Images/cropped-original.png" alt-text="Photo of a man with a dog at a table."::: |
62-
| 0.75 | :::image type="content" source="Images/cropped-075-bb.png" alt-text="Photo of a man with a dog at a table. A 0.75 ratio bounding box is drawn."::: |
63-
| 1.00 | :::image type="content" source="Images/cropped-1-0-bb.png" alt-text="Photo of a man with a dog at a table. A 1.00 ratio bounding box is drawn."::: |
64-
| 1.50 | :::image type="content" source="Images/cropped-150-bb.png" alt-text="Photo of a man with a dog at a table. A 1.50 ratio bounding box is drawn."::: |
65-
66-
67-
---
6847

6948
## Use the API
7049

71-
#### [Version 3.2](#tab/3-2)
7250

7351
The generate thumbnail feature is available through the [Get Thumbnail](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/56f91f2e778daf14a499f20c) and [Get Area of Interest](https://westus.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/b156d0f5e11e492d9f64418d) APIs. You can call this API through a native SDK or through REST calls.
7452

75-
#### [Version 4.0](#tab/4-0)
76-
77-
The smart cropping feature is available through the [Analyze](https://aka.ms/vision-4-0-ref) API. You can call this API using REST. Include `SmartCrops` in the **visualFeatures** query parameter. Also include a **smartcrops-aspect-ratios** query parameter, and set it to a decimal value for the aspect ratio you want (defined as width / height). Multiple aspect ratio values should be comma-separated.
78-
79-
---
8053

8154
* [Generate a thumbnail (how-to)](./how-to/generate-thumbnail.md)

0 commit comments

Comments
 (0)