Skip to content

Commit d74a416

Browse files
committed
moved TN files
1 parent 226601b commit d74a416

10 files changed

+1449
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: Introduction to Azure Video Indexer audio effects detection transparency note
3+
titleSuffix: Azure Video Indexer
4+
description: An introduction to Azure Video Indexer audio effects detection component responsibly.
5+
author: Juliako
6+
ms.author: juliako
7+
manager: femila
8+
ms.service: azure-video-indexer
9+
ms.date: 06/15/2022
10+
ms.topic: article
11+
---
12+
13+
# Audio effects detection - transparency note
14+
15+
Audio effects detection is an Azure Video Indexer feature that detects insights on a variety of acoustic events and classifies them into acoustic categories. Audio effect detection can detect and classify different categories such as laughter, crowd reactions, alarms and/or sirens.
16+
17+
When working on the website, the instances are displayed in the Insights tab. They can also be generated in a categorized list in a JSON file which includes the category ID, type, name, and instances per category together with the specific timeframes and confidence score.
18+
19+
## Prerequisites
20+
21+
Review [transparency note overview](/legal/azure-video-indexer/transparency-note?context=/azure/azure-video-indexer/context/context)
22+
23+
## General principles
24+
25+
This transparency note discusses audio effects detection and the key considerations for making use of this technology responsibly. There are a number of things you need to consider when deciding how to use and implement an AI-powered feature:
26+
27+
* Will this feature perform well in my scenario? Before deploying audio effects detection into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
28+
* Are we equipped to identify and respond to errors? AI-powered products and features won't be 100% accurate, so consider how you'll identify and respond to any errors that may occur.
29+
30+
## View the insight
31+
32+
To see the instances on the website, do the following:
33+
34+
1. When uploading the media file, go to Video + Audio Indexing, or go to Audio Only or Video + Audio and select Advanced.
35+
1. After the file is uploaded and indexed, go to Insights and scroll to audio effects.
36+
37+
To display the JSON file, do the following:
38+
39+
1. Click Download -> Insights (JSON).
40+
1. Copy the `audioEffects` element, under `insights`, and paste it into your Online JSON viewer.
41+
42+
```json
43+
"audioEffects": [
44+
{
45+
"id": 1,
46+
"type": "Silence",
47+
"instances": [
48+
{
49+
"confidence": 0,
50+
"adjustedStart": "0:01:46.243",
51+
"adjustedEnd": "0:01:50.434",
52+
"start": "0:01:46.243",
53+
"end": "0:01:50.434"
54+
}
55+
]
56+
},
57+
{
58+
"id": 2,
59+
"type": "Speech",
60+
"instances": [
61+
{
62+
"confidence": 0,
63+
"adjustedStart": "0:00:00",
64+
"adjustedEnd": "0:01:43.06",
65+
"start": "0:00:00",
66+
"end": "0:01:43.06"
67+
}
68+
]
69+
}
70+
],
71+
```
72+
73+
To download the JSON file via the API, use the [Azure Video Indexer developer portal](https://api-portal.videoindexer.ai/).
74+
75+
## Audio effects detection components
76+
77+
During the audio effects detection procedure, audio in a media file is processed, as follows:
78+
79+
|Component|Definition|
80+
|---|---|
81+
|Source file | The user uploads the source file for indexing. |
82+
|Segmentation| The audio is analyzed, non-speech audio is identified and then split into short overlapping internals. |
83+
|Classification| An AI process analyzes each segment and classifies its contents into event categories such as crowd reaction or laughter. A probability list is then created for each event category according to department-specific rules. |
84+
|Confidence level| The estimated confidence level of each audio effect is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty will be represented as an 0.82 score.|
85+
86+
## Example use cases
87+
88+
- Companies with a large video archive can improve accessibility by offering more context for a hearing- impaired audience by transcription of non-speech effects.
89+
- Improved efficiency when creating raw data for content creators. Important moments in promos and trailers such as laughter, crowd reactions, gunshots, or explosions can be identified for example in Media and Entertainment.
90+
- Detecting and classifying gunshots, explosions, and glass shattering in a smart-city system or in other public environments that include cameras and microphones to offer fast and accurate detection of violence incidents.
91+
92+
## Considerations and limitations when choosing a use case
93+
94+
- Avoid use of very short or low-quality audio, audio effects detection provides probabilistic and partial data on detected non-speech audio events. For accuracy, audio effects detection requires at least 2 seconds of clear non-speech audio. Voice commands or singing are not supported.  
95+
- Avoid use of audio with very loud background music or music with repetitive and/or linearly scanned frequency, audio effects detection is designed for non-speech audio only and therefore cannot classify events in loud music. Music with repetitive and/or linearly scanned frequency many be incorrectly classified as an alarm or siren.
96+
- Carefully consider the methods of usage in law enforcement and similar institutions, to promote more accurate probabilistic data, carefully review the following:
97+
98+
- Audio effects can be detected in non-speech segments only.
99+
- The duration of a non-speech section should be at least 2 seconds.
100+
- Low quality audio might impact the detection results.
101+
- Events in loud background music are not classified.
102+
- Music with repetitive and/or linearly scanned frequency might be incorrectly classified as an alarm or siren.
103+
- Knocking on a door or slamming a door might be labelled as a gunshot or explosion.
104+
- Prolonged shouting or sounds of physical human effort might be incorrectly classified.
105+
- A group of people laughing might be classified as both laughter and crowd.
106+
- Natural and non-synthetic gunshot and explosions sounds are supported.
107+
108+
When used responsibly and carefully, Azure Video Indexer is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:  
109+
110+
- Always respect an individual’s right to privacy, and only ingest audio for lawful and justifiable purposes.  
111+
- Do not purposely disclose inappropriate audio of young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.  
112+
- Commit to respecting and promoting human rights in the design and deployment of your analyzed audio.  
113+
- When using 3rd party materials, be aware of any existing copyrights or permissions required before distributing content derived from them. 
114+
- Always seek legal advice when using audio from unknown sources. 
115+
- Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing audio containing people. 
116+
- Keep a human in the loop. Do not use any solution as a replacement for human oversight and decision-making.  
117+
- Fully examine and review the potential of any AI model you are using to understand its capabilities and limitations. 
118+
119+
## Next steps
120+
121+
- [Microsoft Responsible AI principles](https://www.microsoft.com/ai/responsible-ai?activetab=pivot1%3aprimaryr6)
122+
- [Microsoft Responsible AI resources](https://www.microsoft.com/ai/responsible-ai-resources)
123+
- [Microsoft Azure Learning courses on Responsible AI](/training/paths/responsible-ai-business-principles/)
124+
- [Microsoft Global Human Rights Statement](https://www.microsoft.com/corporate-responsibility/human-rights-statement?activetab=pivot_1:primaryr5)
125+
126+
### Contact us
127+
128+
129+
130+
## Azure Video Indexer insights
131+
132+
- [Face detection](/legal/azure-video-indexer/face-detection-transparency-note?context=/azure/azure-video-indexer/context/context)
133+
- [OCR](/legal/azure-video-indexer/ocr-transparency-note?context=/azure/azure-video-indexer/context/context)
134+
- [Keywords extraction](/legal/azure-video-indexer/keywords-transparency-note?context=/azure/azure-video-indexer/context/context)
135+
- [Transcription, translation & language identification](/legal/azure-video-indexer/transcription-translation-lid-transparency-note?context=/azure/azure-video-indexer/context/context)
136+
- [Labels identification](/legal/azure-video-indexer/labels-identification-transparency-note?context=/azure/azure-video-indexer/context/context)
137+
- [Named entities](/legal/azure-video-indexer/named-entities-transparency-note?context=/azure/azure-video-indexer/context/context)
138+
- [Observed people tracking & matched faces](/legal/azure-video-indexer/observed-matched-people-transparency-note?context=/azure/azure-video-indexer/context/context)
139+
- [Topics inference](/legal/azure-video-indexer/topics-inference-transparency-note?context=/azure/azure-video-indexer/context/context)

0 commit comments

Comments
 (0)