You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/concept-describe-images-40.md
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,35 +8,37 @@ manager: nitinme
8
8
9
9
ms.service: azure-ai-vision
10
10
ms.topic: conceptual
11
-
ms.date: 01/19/2024
11
+
ms.date: 09/25/2024
12
12
ms.author: pafarley
13
13
---
14
14
15
15
# Image captions (version 4.0)
16
-
Image captions in Image Analysis 4.0 are available through the **Caption** and **Dense Captions** features.
17
16
18
-
Caption generates a one-sentence description for all image contents. Dense Captions provides more detail by generating one-sentence descriptions of up to 10 regions of the image in addition to describing the whole image. Dense Captions also returns bounding box coordinates of the described image regions. Both these features use the latest groundbreaking Florence-based AI models.
17
+
Image captions in Image Analysis 4.0 are available through the **Caption** and **Dense Captions**features.
19
18
20
-
At this time, image captioning is available in English only.
19
+
The Caption feature generates a one-sentence description of all the image contents. Dense Captions provides more detail by generating one-sentence descriptions of up to 10 different regions of the image in addition to describing the whole image. Dense Captions also returns bounding box coordinates of the described image regions. Both of these features use the latest Florence-based AI models.
20
+
21
+
Image captioning is available in English only.
21
22
22
23
> [!IMPORTANT]
23
-
> Image captioning in Image Analysis 4.0 is only available in certain Azure data center regions: see [Region availability](./overview-image-analysis.md#region-availability). You must use a Vision resource located in one of these regions to get results from Caption and Dense Captions features.
24
+
> Image captioning in Image Analysis 4.0 is only available in certain Azure data center regions: see [Region availability](./overview-image-analysis.md#region-availability). You must use an Azure AI Vision resource located in one of these regions to get results from Caption and Dense Captions features.
24
25
>
25
-
> If you have to use a Vision resource outside these regions to generate image captions, please use [Image Analysis 3.2](concept-describing-images.md) which is available in all Azure AI Vision regions.
26
+
> If you need to use a Vision resource outside these regions to generate image captions, please use [Image Analysis 3.2](concept-describing-images.md) which is available in all Azure AI Vision regions.
26
27
27
28
Try out the image captioning features quickly and easily in your browser using Vision Studio.
Captions contain gender terms ("man", "woman", "boy" and "girl") by default. You have the option to replace these terms with "person" in your results and receive gender-neutral captions. You can do so by setting the optional API request parameter, **gender-neutral-caption** to `true` in the request URL.
33
+
## Gender-neutral captions
34
+
35
+
By default, captions contain gender terms ("man", "woman", "boy" and "girl"). You have the option to replace these terms with "person" in your results and receive gender-neutral captions. You can do so by setting the optional API request parameter `gender-neutral-caption` to `true` in the request URL.
34
36
35
37
## Caption and Dense Captions examples
36
38
37
39
#### [Caption](#tab/image)
38
40
39
-
The following JSON response illustrates what the Analysis 4.0 API returns when describing the example image based on its visual features.
41
+
The following JSON response illustrates what the Image Analysis 4.0 API returns when describing the example image based on its visual features.
40
42
41
43

42
44
@@ -51,7 +53,7 @@ The following JSON response illustrates what the Analysis 4.0 API returns when d
51
53
52
54
#### [Dense Captions](#tab/dense)
53
55
54
-
The following JSON response illustrates what the Analysis 4.0 API returns when generating dense captions for the example image.
56
+
The following JSON response illustrates what the Image Analysis 4.0 API returns when generating dense captions for the example image.
55
57
56
58

Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/concept-describing-images.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,15 +8,15 @@ manager: nitinme
8
8
9
9
ms.service: azure-ai-vision
10
10
ms.topic: conceptual
11
-
ms.date: 04/30/2024
11
+
ms.date: 09/25/2024
12
12
ms.author: pafarley
13
13
---
14
14
15
15
# Image descriptions
16
16
17
-
Azure AI Vision can analyze an image and generate a human-readable phrase that describes its contents. The algorithm returns several descriptions based on different visual features, and each description is given a confidence score. The final output is a list of descriptions ordered from highest to lowest confidence.
17
+
Azure AI Vision can analyze an image and generate a human-readable phrase that describes its contents. The service returns several descriptions based on different visual features, and each description is given a confidence score. The final output is a list of descriptions ordered from highest to lowest confidence.
18
18
19
-
At this time, English is the only supported language for image description.
19
+
English is the only supported language for image descriptions.
20
20
21
21
Try out the image captioning features quickly and easily in your browser using Vision Studio.
22
22
@@ -25,7 +25,7 @@ Try out the image captioning features quickly and easily in your browser using V
25
25
26
26
## Image description example
27
27
28
-
The following JSON response illustrates what the Analyze API returns when describing the example image based on its visual features.
28
+
The following JSON response illustrates what the Analyze Image API returns when describing the example image based on its visual features.
29
29
30
30

description: Concepts related to image vectorization using the Image Analysis 4.0 API.
4
+
description: Learn about concepts related to image vectorization and search/retrieval using the Image Analysis 4.0 API.
5
5
#services: cognitive-services
6
6
author: PatrickFarley
7
7
manager: nitinme
8
8
9
9
ms.service: azure-ai-vision
10
10
ms.topic: conceptual
11
-
ms.date: 02/20/2024
11
+
ms.date: 09/25/2024
12
12
ms.author: pafarley
13
13
---
14
14
15
15
# Multimodal embeddings (version 4.0)
16
16
17
-
Multimodal embedding is the process of generating a numerical representation of an image that captures its features and characteristics in a vector format. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
17
+
Multimodal embedding is the process of generating a vector representation of an image that captures its features and characteristics. These vectors encode the content and context of an image in a way that is compatible with text search over the same vector space.
18
18
19
-
Image retrieval systems have traditionally used features extracted from the images, such as content labels, tags, and image descriptors, to compare images and rank them by similarity. However, vector similarity search is gaining more popularity due to a number of benefits over traditional keyword-based search and is becoming a vital component in popular content search services.
19
+
Image retrieval systems have traditionally used features extracted from the images, such as content labels, tags, and image descriptors, to compare images and rank them by similarity. However, vector similarity search offers a number of benefits over traditional keyword-based search and is becoming a vital component in popular content search services.
20
20
21
-
## What's the difference between vector search and keyword-based search?
21
+
## Differences between vector search and keyword search
22
22
23
23
Keyword search is the most basic and traditional method of information retrieval. In that approach, the search engine looks for the exact match of the keywords or phrases entered by the user in the search query and compares it with the labels and tags provided for the images. The search engine then returns images that contain those exact keywords as content tags and image labels. Keyword search relies heavily on the user's ability to use relevant and specific search terms.
24
24
@@ -50,18 +50,17 @@ Each dimension of the vector corresponds to a different feature or attribute of
50
50
51
51
The following are the main steps of the image retrieval process using Multimodal embeddings.
52
52
53
-
:::image type="content" source="media/image-retrieval.png" alt-text="Diagram of image retrieval process.":::
53
+
:::image type="content" source="media/image-retrieval.png" alt-text="Diagram of the multimodal embedding / image retrieval process.":::
54
54
55
55
1. Vectorize Images and Text: the Multimodal embeddings APIs, **VectorizeImage** and **VectorizeText**, can be used to extract feature vectors out of an image or text respectively. The APIs return a single feature vector representing the entire input.
56
56
> [!NOTE]
57
57
> Multimodal embedding does not do any biometric processing of human faces. For face detection and identification, see the [Azure AI Face service](./overview-identity.md).
58
-
59
58
1. Measure similarity: Vector search systems typically use distance metrics, such as cosine distance or Euclidean distance, to compare vectors and rank them by similarity. The [Vision studio](https://portal.vision.cognitive.azure.com/) demo uses [cosine distance](./how-to/image-retrieval.md#calculate-vector-similarity) to measure similarity.
60
59
1. Retrieve Images: Use the top _N_ vectors similar to the search query and retrieve images corresponding to those vectors from your photo library to provide as the final result.
61
60
62
61
### Relevance score
63
62
64
-
The image and video retrieval services return a field called "relevance." The term "relevance" denotes a measure of similarity score between a query and image or video frame embeddings. The relevance score is composed of two parts:
63
+
The image and video retrieval services return a field called "relevance." The term "relevance" denotes a measure of similarity between a query and image or video frame embeddings. The relevance score is composed of two parts:
65
64
1. The cosine similarity (that falls in the range of [0,1]) between the query and image or video frame embeddings.
66
65
1. A metadata score, which reflects the similarity between the query and the metadata associated with the image or video frame.
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/how-to/mitigate-latency.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ The quality of the input images affects both the accuracy and the latency of the
64
64
65
65
To achieve the optimal balance between accuracy and speed, follow these tips to optimize your input data.
66
66
- For face detection and recognition operations, see [input data for face detection](../concept-face-detection.md#input-requirements) and [input data for face recognition](../concept-face-recognition.md#input-requirements).
67
-
- For liveness detection, see the [tutorial](../Tutorials/liveness.md#select-a-good-reference-image).
67
+
- For liveness detection, see the [tutorial](../Tutorials/liveness.md#select-a-reference-image).
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/language-support.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ author: PatrickFarley
7
7
manager: nitinme
8
8
ms.service: azure-ai-vision
9
9
ms.topic: conceptual
10
-
ms.date: 03/11/2024
10
+
ms.date: 09/25/2024
11
11
ms.author: pafarley
12
12
---
13
13
@@ -125,9 +125,9 @@ The following table lists the OCR supported languages for print text by the most
125
125
|Kazakh (Latin) |`kk-latn`|Zhuang |`za`|
126
126
|Khaling |`klr`|Zulu |`zu`|
127
127
128
-
## Analyze image
128
+
## Image Analysis
129
129
130
-
Some features of the [Analyze - Image](/rest/api/computervision/analyze-image?view=rest-computervision-v3.2) API can return results in other languages, specified with the `language` query parameter. Other actions return results in English regardless of what language is specified, and others throw an exception for unsupported languages. Actions are specified with the `visualFeatures` and `details` query parameters; see the [Overview](overview-image-analysis.md) for a list of all the actions you can do with the Analyze API, or follow the [How-to guide](/azure/ai-services/computer-vision/how-to/call-analyze-image-40) to try them out.
130
+
Some features of the [Analyze - Image](/rest/api/computervision/analyze-image?view=rest-computervision-v3.2) API can return results in other languages, specified with the `language` query parameter. Other features return results in English regardless of what language is specified, and others throw an exception for unsupported languages. Features are specified with the `visualFeatures` and `details` query parameters; see the [Overview](overview-image-analysis.md) for a list of all the actions you can do with the [Analyze - Image](/rest/api/computervision/analyze-image?view=rest-computervision-v3.2) API, or follow the [How-to guide](/azure/ai-services/computer-vision/how-to/call-analyze-image-40) to try them out.
Copy file name to clipboardExpand all lines: articles/ai-services/computer-vision/tutorials/liveness.md
+8-11Lines changed: 8 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,21 +7,19 @@ ms.service: azure-ai-vision
7
7
ms.custom:
8
8
- ignite-2023
9
9
ms.topic: tutorial
10
-
ms.date: 11/06/2023
10
+
ms.date: 09/25/2024
11
11
---
12
12
13
13
# Tutorial: Detect liveness in faces
14
14
15
-
Face Liveness detection can be used to determine if a face in an input video stream is real (live) or fake (spoofed). It's an important building block in a biometric authentication system to prevent imposters from gaining access to the system using a photograph, video, mask, or other means to impersonate another person.
15
+
Face Liveness detection is used to determine if a face in an input video stream is real (live) or fake (spoofed). It's an important building block in a biometric authentication system to prevent imposters from gaining access to the system using a photograph, video, mask, or other means to impersonate another person.
16
16
17
-
The goal of liveness detection is to ensure that the system is interacting with a physically present live person at the time of authentication. Such systems are increasingly important with the rise of digital finance, remote access control, and online identity verification processes.
17
+
The goal of liveness detection is to ensure that the system is interacting with a physically present, live person at the time of authentication. These systems are increasingly important with the rise of digital finance, remote access control, and online identity verification processes.
18
18
19
-
The Azure AI Face liveness detection solution successfully defends against various spoof types ranging from paper printouts, 2d/3d masks, and spoof presentations on phones and laptops. Liveness detection is an active area of research, with continuous improvements being made to counteract increasingly sophisticated spoofing attacks over time. Continuous improvements will be rolled out to the client and the service components over time as the overall solution gets more robust to new types of attacks.
19
+
The Azure AI Face liveness detection solution successfully defends against various spoof types ranging from paper printouts, 2D/3D masks, and spoof presentations on phones and laptops. Liveness detection is an active area of research, with continuous improvements being made to counteract increasingly sophisticated spoofing attacks. Continuous improvements are rolled out to the client and the service components over time as the overall solution gets more robust to new types of attacks.
The liveness solution integration involves two distinct components: a frontend mobile/web application and an app server/orchestrator.
@@ -31,7 +29,7 @@ The liveness solution integration involves two distinct components: a frontend m
31
29
-**Frontend application**: The frontend application receives authorization from the app server to initiate liveness detection. Its primary objective is to activate the camera and guide end-users accurately through the liveness detection process.
32
30
-**App server**: The app server serves as a backend server to create liveness detection sessions and obtain an authorization token from the Face service for a particular session. This token authorizes the frontend application to perform liveness detection. The app server's objectives are to manage the sessions, to grant authorization for frontend application, and to view the results of the liveness detection process.
33
31
34
-
Additionally, we combine face verification with liveness detection to verify whether the person is the specific person you designated. The following table help describe details of the liveness detection features:
32
+
Additionally, we combine face verification with liveness detection to verify whether the person is the specific person you designated. The following table describes details of the liveness detection features:
35
33
36
34
| Feature | Description |
37
35
| -- |--|
@@ -40,7 +38,6 @@ Additionally, we combine face verification with liveness detection to verify whe
40
38
41
39
This tutorial demonstrates how to operate a frontend application and an app server to perform [liveness detection](#perform-liveness-detection) and [liveness detection with face verification](#perform-liveness-detection-with-face-verification) across various language SDKs.
42
40
43
-
44
41
## Prerequisites
45
42
46
43
- Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services/)
@@ -71,7 +68,7 @@ The app server/orchestrator is responsible for controlling the lifecycle of a li
71
68
- For Python, follow the instructions in the [Python readme](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/face/azure-ai-vision-face/README.md)
72
69
- For JavaScript, follow the instructions in the [JavaScript readme](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/face/ai-vision-face-rest/README.md)
@@ -415,7 +412,7 @@ There are two parts to integrating liveness with verification:
415
412
416
413
:::imagetype="content"source="../media/liveness/liveness-verify-diagram.jpg"alt-text="Diagram of the liveness-with-face-verification workflow of Azure AI Face."lightbox="../media/liveness/liveness-verify-diagram.jpg":::
0 commit comments