Skip to content

Commit 5b453c6

Browse files
authored
Merge pull request #231946 from Juliako/notes
moved TN files
2 parents b00c642 + 0029c0a commit 5b453c6

12 files changed

+1467
-10
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: Introduction to Azure Video Indexer audio effects detection
3+
titleSuffix: Azure Video Indexer
4+
description: An introduction to Azure Video Indexer audio effects detection component responsibly.
5+
author: Juliako
6+
ms.author: juliako
7+
manager: femila
8+
ms.service: azure-video-indexer
9+
ms.date: 06/15/2022
10+
ms.topic: article
11+
---
12+
13+
# Audio effects detection
14+
15+
Audio effects detection is an Azure Video Indexer feature that detects insights on various acoustic events and classifies them into acoustic categories. Audio effect detection can detect and classify different categories such as laughter, crowd reactions, alarms and/or sirens.
16+
17+
When working on the website, the instances are displayed in the Insights tab. They can also be generated in a categorized list in a JSON file that includes the category ID, type, name, and instances per category together with the specific timeframes and confidence score.
18+
19+
## Prerequisites
20+
21+
Review [transparency note overview](/legal/azure-video-indexer/transparency-note?context=/azure/azure-video-indexer/context/context)
22+
23+
## General principles
24+
25+
This article discusses audio effects detection and the key considerations for making use of this technology responsibly. There are many things you need to consider when deciding how to use and implement an AI-powered feature:
26+
27+
* Does this feature perform well in my scenario? Before deploying audio effects detection into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
28+
* Are we equipped to identify and respond to errors? AI-powered products and features won't be 100% accurate, so consider how you'll identify and respond to any errors that may occur.
29+
30+
## View the insight
31+
32+
To see the instances on the website, do the following:
33+
34+
1. When uploading the media file, go to Video + Audio Indexing, or go to Audio Only or Video + Audio and select Advanced.
35+
1. After the file is uploaded and indexed, go to Insights and scroll to audio effects.
36+
37+
To display the JSON file, do the following:
38+
39+
1. Select Download -> Insights (JSON).
40+
1. Copy the `audioEffects` element, under `insights`, and paste it into your Online JSON viewer.
41+
42+
```json
43+
"audioEffects": [
44+
{
45+
"id": 1,
46+
"type": "Silence",
47+
"instances": [
48+
{
49+
"confidence": 0,
50+
"adjustedStart": "0:01:46.243",
51+
"adjustedEnd": "0:01:50.434",
52+
"start": "0:01:46.243",
53+
"end": "0:01:50.434"
54+
}
55+
]
56+
},
57+
{
58+
"id": 2,
59+
"type": "Speech",
60+
"instances": [
61+
{
62+
"confidence": 0,
63+
"adjustedStart": "0:00:00",
64+
"adjustedEnd": "0:01:43.06",
65+
"start": "0:00:00",
66+
"end": "0:01:43.06"
67+
}
68+
]
69+
}
70+
],
71+
```
72+
73+
To download the JSON file via the API, use the [Azure Video Indexer developer portal](https://api-portal.videoindexer.ai/).
74+
75+
## Audio effects detection components
76+
77+
During the audio effects detection procedure, audio in a media file is processed, as follows:
78+
79+
|Component|Definition|
80+
|---|---|
81+
|Source file | The user uploads the source file for indexing. |
82+
|Segmentation| The audio is analyzed, nonspeech audio is identified and then split into short overlapping internals. |
83+
|Classification| An AI process analyzes each segment and classifies its contents into event categories such as crowd reaction or laughter. A probability list is then created for each event category according to department-specific rules. |
84+
|Confidence level| The estimated confidence level of each audio effect is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.|
85+
86+
## Example use cases
87+
88+
- Companies with a large video archive can improve accessibility by offering more context for a hearing- impaired audience by transcription of nonspeech effects.
89+
- Improved efficiency when creating raw data for content creators. Important moments in promos and trailers such as laughter, crowd reactions, gunshots, or explosions can be identified, for example, in Media and Entertainment.
90+
- Detecting and classifying gunshots, explosions, and glass shattering in a smart-city system or in other public environments that include cameras and microphones to offer fast and accurate detection of violence incidents.
91+
92+
## Considerations and limitations when choosing a use case
93+
94+
- Avoid use of short or low-quality audio, audio effects detection provides probabilistic and partial data on detected nonspeech audio events. For accuracy, audio effects detection requires at least 2 seconds of clear nonspeech audio. Voice commands or singing aren't supported.  
95+
- Avoid use of audio with loud background music or music with repetitive and/or linearly scanned frequency, audio effects detection is designed for nonspeech audio only and therefore can't classify events in loud music. Music with repetitive and/or linearly scanned frequency many be incorrectly classified as an alarm or siren.
96+
- Carefully consider the methods of usage in law enforcement and similar institutions, to promote more accurate probabilistic data, carefully review the following:
97+
98+
- Audio effects can be detected in nonspeech segments only.
99+
- The duration of a nonspeech section should be at least 2 seconds.
100+
- Low quality audio might impact the detection results.
101+
- Events in loud background music aren't classified.
102+
- Music with repetitive and/or linearly scanned frequency might be incorrectly classified as an alarm or siren.
103+
- Knocking on a door or slamming a door might be labeled as a gunshot or explosion.
104+
- Prolonged shouting or sounds of physical human effort might be incorrectly classified.
105+
- A group of people laughing might be classified as both laughter and crowd.
106+
- Natural and nonsynthetic gunshot and explosions sounds are supported.
107+
108+
When used responsibly and carefully, Azure Video Indexer is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:  
109+
110+
- Always respect an individual’s right to privacy, and only ingest audio for lawful and justifiable purposes.  
111+
- Don't purposely disclose inappropriate audio of young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.  
112+
- Commit to respecting and promoting human rights in the design and deployment of your analyzed audio.  
113+
- When using third party materials, be aware of any existing copyrights or permissions required before distributing content derived from them. 
114+
- Always seek legal advice when using audio from unknown sources. 
115+
- Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing audio containing people. 
116+
- Keep a human in the loop. Don't use any solution as a replacement for human oversight and decision-making.  
117+
- Fully examine and review the potential of any AI model you're using to understand its capabilities and limitations. 
118+
119+
## Next steps
120+
121+
- [Microsoft Responsible AI principles](https://www.microsoft.com/ai/responsible-ai?activetab=pivot1%3aprimaryr6)
122+
- [Microsoft Responsible AI resources](https://www.microsoft.com/ai/responsible-ai-resources)
123+
- [Microsoft Azure Learning courses on Responsible AI](/training/paths/responsible-ai-business-principles/)
124+
- [Microsoft Global Human Rights Statement](https://www.microsoft.com/corporate-responsibility/human-rights-statement?activetab=pivot_1:primaryr5)
125+
126+
### Contact us
127+
128+
129+
130+
## Azure Video Indexer insights
131+
132+
- [Face detection](face-detection.md)
133+
- [OCR](ocr.md)
134+
- [Keywords extraction](keywords.md)
135+
- [Transcription, translation & language identification](transcription-translation-lid.md)
136+
- [Labels identification](labels-identification.md)
137+
- [Named entities](named-entities.md)
138+
- [Observed people tracking & matched faces](observed-matched-people.md)
139+
- [Topics inference](topics-inference.md)

articles/azure-video-indexer/concepts-overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ ms.date: 12/01/2022
66
ms.author: juliako
77
---
88

9-
# Azure Video Indexer terminology & concepts
9+
# Azure Video Indexer terminology & concepts
1010

11-
This article gives a brief overview of Azure Video Indexer terminology and concepts.
11+
This article gives a brief overview of Azure Video Indexer terminology and concepts. Also, review [transparency note overview](/legal/azure-video-indexer/transparency-note?context=/azure/azure-video-indexer/context/context)
1212

1313
## Artifact files
1414

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
title: Azure Video Indexer face detection overview
3+
titleSuffix: Azure Video Indexer
4+
description: This article gives an overview of an Azure Video Indexer face detection.
5+
author: juliako
6+
ms.author: juliako
7+
manager: femila
8+
ms.service: azure-video-indexer
9+
ms.date: 06/15/2022
10+
ms.topic: article
11+
---
12+
13+
# Face detection
14+
15+
> [!IMPORTANT]
16+
> Face identification, customization and celebrity recognition features access is limited based on eligibility and usage criteria in order to support our Responsible AI principles. Face identification, customization and celebrity recognition features are only available to Microsoft managed customers and partners. Use the [Face Recognition intake form](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR7en2Ais5pxKtso_Pz4b1_xUQjA5SkYzNDM4TkcwQzNEOE1NVEdKUUlRRCQlQCN0PWcu) to apply for access.
17+
18+
Face detection is an Azure Video Indexer AI feature that automatically detects faces in a media file and aggregates instances of similar faces into the same group. The celebrities recognition module is then run to recognize celebrities. This module covers approximately one million faces and is based on commonly requested data sources. Faces that aren't recognized by Azure Video Indexer are still detected but are left unnamed. Customers can build their own custom [Person modules](/azure/azure-video-indexer/customize-person-model-overview) whereby the Azure Video Indexer recognizes faces that aren't recognized by default.
19+
20+
The resulting insights are generated in a categorized list in a JSON file that includes a thumbnail and either name or ID of each face. Clicking face’s thumbnail displays information like the name of the person (if they were recognized), the % of appearances in the video, and their biography if they're a celebrity. It also enables scrolling between the instances in the video. 
21+
22+
## Prerequisites
23+
24+
Review [transparency note overview](/legal/azure-video-indexer/transparency-note?context=/azure/azure-video-indexer/context/context)
25+
26+
## General principles
27+
28+
This article discusses faces detection and the key considerations for making use of this technology responsibly. There are many things you need to consider when deciding how to use and implement an AI-powered feature:
29+
30+
- Will this feature perform well in my scenario? Before deploying faces detection into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
31+
- Are we equipped to identify and respond to errors? AI-powered products and features won't be 100% accurate, so consider how you'll identify and respond to any errors that may occur.
32+
33+
## Key terms
34+
35+
|Term|Definition|
36+
|---|---|
37+
|Insight  |The information and knowledge derived from the processing and analysis of video and audio files that generate different types of insights and can include detected objects, people, faces, animated characters, keyframes and translations or transcriptions. |
38+
|Face recognition  |The analysis of images to identify the faces that appear in the images. This process is implemented via the Azure Cognitive Services Face API. |
39+
|Template |Enrolled images of people are converted to templates, which are then used for facial recognition. Machine-interpretable features are extracted from one or more images of an individual to create that individual’s template. The enrollment or probe images aren't stored by Face API and the original images can't be reconstructed based on a template. Template quality is a key determinant on the accuracy of your results. |
40+
|Enrollment |The process of enrolling images of individuals for template creation so they can be recognized. When a person is enrolled to a verification system used for authentication, their template is also associated with a primary identifier2 that is used to determine which template to compare with the probe template. High-quality images and images representing natural variations in how a person looks (for instance wearing glasses, not wearing glasses) generate high-quality enrollment templates. |
41+
|Deep search  |The ability to retrieve only relevant video and audio files from a video library by searching for specific terms within the extracted insights.|
42+
43+
## View the insight
44+
45+
To see the instances on the website, do the following:
46+
47+
1. When uploading the media file, go to Video + Audio Indexing, or go to Audio Only or Video + Audio and select Advanced.
48+
1. After the file is uploaded and indexed, go to Insights and scroll to People.
49+
50+
To see face detection insight in the JSON file, do the following:
51+
52+
1. Select Download -> Insights (JSON).
53+
1. Copy the `faces` element, under `insights`, and paste it into your JSON viewer.
54+
55+
```json
56+
"faces": [
57+
{
58+
"id": 1785,
59+
"name": "Emily Tran",
60+
"confidence": 0.7855,
61+
"description": null,
62+
"thumbnailId": "fd2720f7-b029-4e01-af44-3baf4720c531",
63+
"knownPersonId": "92b25b4c-944f-4063-8ad4-f73492e42e6f",
64+
"title": null,
65+
"imageUrl": null,
66+
"thumbnails": [
67+
{
68+
"id": "4d182b8c-2adf-48a2-a352-785e9fcd1fcf",
69+
"fileName": "FaceInstanceThumbnail_4d182b8c-2adf-48a2-a352-785e9fcd1fcf.jpg",
70+
"instances": [
71+
{
72+
"adjustedStart": "0:00:00",
73+
"adjustedEnd": "0:00:00.033",
74+
"start": "0:00:00",
75+
"end": "0:00:00.033"
76+
}
77+
]
78+
},
79+
{
80+
"id": "feff177b-dabf-4f03-acaf-3e5052c8be57",
81+
"fileName": "FaceInstanceThumbnail_feff177b-dabf-4f03-acaf-3e5052c8be57.jpg",
82+
"instances": [
83+
{
84+
"adjustedStart": "0:00:05",
85+
"adjustedEnd": "0:00:05.033",
86+
"start": "0:00:05",
87+
"end": "0:00:05.033"
88+
}
89+
]
90+
},
91+
]
92+
}
93+
]
94+
```
95+
96+
To download the JSON file via the API, [Azure Video Indexer developer portal](https://api-portal.videoindexer.ai/).
97+
98+
## Face detection components
99+
100+
During the Faces Detection procedure, images in a media file are processed, as follows:
101+
102+
|Component|Definition|
103+
|---|---|
104+
|Source file | The user uploads the source file for indexing. |
105+
|Detection and aggregation |The face detector identifies the faces in each frame. The faces are then aggregated and grouped. |
106+
|Recognition |The celebrities module runs over the aggregated groups to recognize celebrities. If the customer has created their own **persons** module it's also run to recognize people. When people aren't recognized, they're labeled Unknown1, Unknown2 and so on. |
107+
|Confidence value |Where applicable for well-known faces or faces identified in the customizable list, the estimated confidence level of each label is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.|
108+
109+
## Example use cases
110+
111+
* Summarizing where an actor appears in a movie or reusing footage by deep searching for specific faces in organizational archives for insight on a specific celebrity.
112+
* Improved efficiency when creating feature stories at a news or sports agency, for example deep searching for a celebrity or football player in organizational archives.
113+
* Using faces appearing in the video to create promos, trailers or highlights. Azure Video Indexer can assist by adding keyframes, scene markers, timestamps and labeling so that content editors invest less time reviewing numerous files.  
114+
115+
## Considerations when choosing a use case
116+
117+
* Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the video, low quality video might impact the detected insights.
118+
* Carefully consider when using for law enforcement. People might not be detected if they're small, sitting, crouching, or obstructed by objects or other people. To ensure fair and high-quality decisions, combine face detection-based automation with human oversight.
119+
* Don't use face detection for decisions that may have serious adverse impacts. Decisions based on incorrect output could have serious adverse impacts. Additionally, it's advisable to include human review of decisions that have the potential for serious impacts on individuals.
120+
121+
When used responsibly and carefully face detection is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:  
122+
123+
* Always respect an individual’s right to privacy, and only ingest videos for lawful and justifiable purposes.  
124+
* Don't purposely disclose inappropriate content about young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.  
125+
* Commit to respecting and promoting human rights in the design and deployment of your analyzed media.  
126+
* When using third party materials, be aware of any existing copyrights or permissions required before distributing content derived from them. 
127+
* Always seek legal advice when using content from unknown sources. 
128+
* Always obtain appropriate legal and professional advice to ensure that your uploaded videos are secured and have adequate controls to preserve the integrity of your content and to prevent unauthorized access.    
129+
* Provide a feedback channel that allows users and individuals to report issues with the service.  
130+
* Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing media containing people. 
131+
* Keep a human in the loop. Don't use any solution as a replacement for human oversight and decision-making.  
132+
* Fully examine and review the potential of any AI model you're using to understand its capabilities and limitations. 
133+
134+
## Next steps
135+
136+
### Learn More about Responsible AI
137+
138+
- [Microsoft Responsible AI principles](https://www.microsoft.com/ai/responsible-ai?activetab=pivot1%3aprimaryr6)
139+
- [Microsoft Responsible AI resources](https://www.microsoft.com/ai/responsible-ai-resources)
140+
- [Microsoft Azure Learning courses on Responsible AI](/training/paths/responsible-ai-business-principles/)
141+
- [Microsoft Global Human Rights Statement](https://www.microsoft.com/corporate-responsibility/human-rights-statement?activetab=pivot_1:primaryr5)
142+
143+
### Contact us
144+
145+
146+
147+
## Azure Video Indexer insights
148+
149+
- [Audio effects detection](audio-effects-detection.md)
150+
- [OCR](ocr.md)
151+
- [Keywords extraction](keywords.md)
152+
- [Transcription, translation & language identification](transcription-translation-lid.md)
153+
- [Labels identification](labels-identification.md)
154+
- [Named entities](named-entities.md)
155+
- [Observed people tracking & matched persons](observed-matched-people.md)
156+
- [Topics inference](topics-inference.md)

0 commit comments

Comments
 (0)