You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-video-indexer/audio-effects-detection-overview.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,17 +12,17 @@ ms.topic: article
12
12
13
13
# Audio effects detection
14
14
15
-
Audio effects detection is an Azure Video Indexer feature that detects insights on a variety of acoustic events and classifies them into acoustic categories. Audio effect detection can detect and classify different categories such as laughter, crowd reactions, alarms and/or sirens.
15
+
Audio effects detection is an Azure Video Indexer feature that detects insights on various acoustic events and classifies them into acoustic categories. Audio effect detection can detect and classify different categories such as laughter, crowd reactions, alarms and/or sirens.
16
16
17
-
When working on the website, the instances are displayed in the Insights tab. They can also be generated in a categorized list in a JSON file which includes the category ID, type, name, and instances per category together with the specific timeframes and confidence score.
17
+
When working on the website, the instances are displayed in the Insights tab. They can also be generated in a categorized list in a JSON file that includes the category ID, type, name, and instances per category together with the specific timeframes and confidence score.
This article discusses audio effects detection and the key considerations for making use of this technology responsibly. There are a number of things you need to consider when deciding how to use and implement an AI-powered feature:
25
+
This article discusses audio effects detection and the key considerations for making use of this technology responsibly. There are many things you need to consider when deciding how to use and implement an AI-powered feature:
26
26
27
27
* Will this feature perform well in my scenario? Before deploying audio effects detection into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
28
28
* Are we equipped to identify and respond to errors? AI-powered products and features won't be 100% accurate, so consider how you'll identify and respond to any errors that may occur.
@@ -36,7 +36,7 @@ To see the instances on the website, do the following:
36
36
37
37
To display the JSON file, do the following:
38
38
39
-
1.Click Download -> Insights (JSON).
39
+
1.Select Download -> Insights (JSON).
40
40
1. Copy the `audioEffects` element, under `insights`, and paste it into your Online JSON viewer.
41
41
42
42
```json
@@ -81,40 +81,40 @@ During the audio effects detection procedure, audio in a media file is processed
81
81
|Source file | The user uploads the source file for indexing. |
82
82
|Segmentation| The audio is analyzed, non-speech audio is identified and then split into short overlapping internals. |
83
83
|Classification| An AI process analyzes each segment and classifies its contents into event categories such as crowd reaction or laughter. A probability list is then created for each event category according to department-specific rules. |
84
-
|Confidence level| The estimated confidence level of each audio effect is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty will be represented as an 0.82 score.|
84
+
|Confidence level| The estimated confidence level of each audio effect is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.|
85
85
86
86
## Example use cases
87
87
88
-
- Companies with a large video archive can improve accessibility by offering more context for a hearing- impaired audience by transcription of non-speech effects.
89
-
- Improved efficiency when creating raw data for content creators. Important moments in promos and trailers such as laughter, crowd reactions, gunshots, or explosions can be identified for example in Media and Entertainment.
88
+
- Companies with a large video archive can improve accessibility by offering more context for a hearing- impaired audience by transcription of nonspeech effects.
89
+
- Improved efficiency when creating raw data for content creators. Important moments in promos and trailers such as laughter, crowd reactions, gunshots, or explosions can be identified, for example, in Media and Entertainment.
90
90
- Detecting and classifying gunshots, explosions, and glass shattering in a smart-city system or in other public environments that include cameras and microphones to offer fast and accurate detection of violence incidents.
91
91
92
92
## Considerations and limitations when choosing a use case
93
93
94
-
- Avoid use of very short or low-quality audio, audio effects detection provides probabilistic and partial data on detected non-speech audio events. For accuracy, audio effects detection requires at least 2 seconds of clear non-speech audio. Voice commands or singing are not supported.
95
-
- Avoid use of audio with very loud background music or music with repetitive and/or linearly scanned frequency, audio effects detection is designed for non-speech audio only and therefore cannot classify events in loud music. Music with repetitive and/or linearly scanned frequency many be incorrectly classified as an alarm or siren.
94
+
- Avoid use of very short or low-quality audio, audio effects detection provides probabilistic and partial data on detected nonspeech audio events. For accuracy, audio effects detection requires at least 2 seconds of clear nonspeech audio. Voice commands or singing aren't supported.
95
+
- Avoid use of audio with loud background music or music with repetitive and/or linearly scanned frequency, audio effects detection is designed for nonspeech audio only and therefore can't classify events in loud music. Music with repetitive and/or linearly scanned frequency many be incorrectly classified as an alarm or siren.
96
96
- Carefully consider the methods of usage in law enforcement and similar institutions, to promote more accurate probabilistic data, carefully review the following:
97
97
98
-
- Audio effects can be detected in non-speech segments only.
99
-
- The duration of a non-speech section should be at least 2 seconds.
98
+
- Audio effects can be detected in nonspeech segments only.
99
+
- The duration of a nonspeech section should be at least 2 seconds.
100
100
- Low quality audio might impact the detection results.
101
-
- Events in loud background music are not classified.
101
+
- Events in loud background music aren't classified.
102
102
- Music with repetitive and/or linearly scanned frequency might be incorrectly classified as an alarm or siren.
103
-
- Knocking on a door or slamming a door might be labelled as a gunshot or explosion.
103
+
- Knocking on a door or slamming a door might be labeled as a gunshot or explosion.
104
104
- Prolonged shouting or sounds of physical human effort might be incorrectly classified.
105
105
- A group of people laughing might be classified as both laughter and crowd.
106
-
- Natural and non-synthetic gunshot and explosions sounds are supported.
106
+
- Natural and nonsynthetic gunshot and explosions sounds are supported.
107
107
108
108
When used responsibly and carefully, Azure Video Indexer is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:
109
109
110
110
- Always respect an individual’s right to privacy, and only ingest audio for lawful and justifiable purposes.
111
-
- Do not purposely disclose inappropriate audio of young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.
111
+
- Don't purposely disclose inappropriate audio of young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.
112
112
- Commit to respecting and promoting human rights in the design and deployment of your analyzed audio.
113
113
- When using 3rd party materials, be aware of any existing copyrights or permissions required before distributing content derived from them.
114
114
- Always seek legal advice when using audio from unknown sources.
115
115
- Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing audio containing people.
116
-
- Keep a human in the loop. Do not use any solution as a replacement for human oversight and decision-making.
117
-
- Fully examine and review the potential of any AI model you are using to understand its capabilities and limitations.
116
+
- Keep a human in the loop. Don't use any solution as a replacement for human oversight and decision-making.
117
+
- Fully examine and review the potential of any AI model you're using to understand its capabilities and limitations.
Copy file name to clipboardExpand all lines: articles/azure-video-indexer/insights-overview.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,18 @@ ms.author: juliako
10
10
11
11
When a video is indexed, Azure Video Indexer analyzes the video and audio content by running 30+ AI models, generating rich insights. Insights contain an aggregated view of the data: transcripts, optical character recognition elements (OCRs), face, topics, emotions, etc. Once the video is indexed and analyzed, Azure Video Indexer produces a JSON content that contains details of the video insights. For example, each insight type includes instances of time ranges that show when the insight appears in the video.
This article discusses Keywords and the key considerations for making use of this technology responsibly. There are a number of things you need to consider when deciding how to use and implement an AI-powered feature:
23
+
This article discusses Keywords and the key considerations for making use of this technology responsibly. There are many things you need to consider when deciding how to use and implement an AI-powered feature:
24
24
25
25
- Will this feature perform well in my scenario? Before deploying Keywords Extraction into your scenario, test how it performs using real-life data and make sure it can deliver the accuracy you need.
26
-
- Are we equipped to identify and respond to errors? AI-powered products and features will not be 100% accurate, so consider how you will identify and respond to any errors that may occur.
26
+
- Are we equipped to identify and respond to errors? AI-powered products and features won't be 100% accurate, so consider how you'll identify and respond to any errors that may occur.
27
27
28
28
## View the insight
29
29
@@ -112,20 +112,20 @@ During the Keywords procedure, audio and images in a media file are processed, a
112
112
Below are some considerations to keep in mind when using keywords extraction:
113
113
114
114
- When uploading a file always use high-quality video content. The recommended maximum frame size is HD and frame rate is 30 FPS. A frame should contain no more than 10 people. When outputting frames from videos to AI models, only send around 2 or 3 frames per second. Processing 10 and more frames might delay the AI result.
115
-
- When uploading a file always use high quality audio and video content. At least 1 minute of spontaneous conversational speech is required to perform analysis. Audio effects are detected in non-speech segments only. The minimal duration of a non-speech section is 2 seconds. Voice commands and singing are not supported.
115
+
- When uploading a file always use high quality audio and video content. At least 1 minute of spontaneous conversational speech is required to perform analysis. Audio effects are detected in non-speech segments only. The minimal duration of a non-speech section is 2 seconds. Voice commands and singing aren't supported.
116
116
117
117
When used responsibly and carefully Keywords is a valuable tool for many industries. To respect the privacy and safety of others, and to comply with local and global regulations, we recommend the following:
118
118
119
119
- Always respect an individual’s right to privacy, and only ingest media for lawful and justifiable purposes.
120
-
- Do not purposely disclose inappropriate media showing young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.
120
+
- Don't purposely disclose inappropriate media showing young children or family members of celebrities or other content that may be detrimental or pose a threat to an individual’s personal freedom.
121
121
- Commit to respecting and promoting human rights in the design and deployment of your analyzed media.
122
122
- When using 3rd party materials, be aware of any existing copyrights or permissions required before distributing content derived from them.
123
123
- Always seek legal advice when using media from unknown sources.
124
124
- Always obtain appropriate legal and professional advice to ensure that your uploaded media is secured and have adequate controls to preserve the integrity of your content and to prevent unauthorized access.
125
125
- Provide a feedback channel that allows users and individuals to report issues with the service.
126
126
- Be aware of any applicable laws or regulations that exist in your area regarding processing, analyzing, and sharing media containing people.
127
-
- Keep a human in the loop. Do not use any solution as a replacement for human oversight and decision-making.
128
-
- Fully examine and review the potential of any AI model you are using to understand its capabilities and limitations.
127
+
- Keep a human in the loop. Don't use any solution as a replacement for human oversight and decision-making.
128
+
- Fully examine and review the potential of any AI model you're using to understand its capabilities and limitations.
0 commit comments