|
| 1 | +--- |
| 2 | +title: Content Safety in Azure AI Studio overview |
| 3 | +titleSuffix: Azure AI Studio |
| 4 | +description: Learn how to use Azure AI Content Safety in Azure AI Studio to detect harmful user-generated and AI-generated content in applications and services. |
| 5 | +manager: nitinme |
| 6 | +ms.service: azure-ai-studio |
| 7 | +ms.custom: |
| 8 | + - ignite-2024 |
| 9 | +ms.topic: overview |
| 10 | +ms.date: 11/09/2024 |
| 11 | +ms.author: pafarley |
| 12 | +author: PatrickFarley |
| 13 | +--- |
| 14 | + |
| 15 | +# Content Safety in Azure AI Studio |
| 16 | + |
| 17 | +Azure AI Content Safety is an AI service that detects harmful user-generated and AI-generated content in applications and services. Azure AI Content Safety includes various APIs that allow you to detect and prevent the output of harmful content. The interactive Content Safety **try out** page in AI Studio allows you to view, explore, and try out sample code for detecting harmful content across different modalities. |
| 18 | + |
| 19 | +## Features |
| 20 | + |
| 21 | +You can use Azure AI Content Safety for many scenarios: |
| 22 | + |
| 23 | +**Text content**: |
| 24 | +- Moderate text content: This feature scans and moderates text content, identifying and categorizing it based on different levels of severity to ensure appropriate responses. |
| 25 | +- Groundedness detection: This filter determines if the AI's responses are based on trusted, user-provided sources, ensuring that the answers are "grounded" in the intended material. Groundedness detection is helpful for improving the reliability and factual accuracy of responses. |
| 26 | +- Protected material detection for text: This feature identifies protected text material, such as known song lyrics, articles, or other content, ensuring that the AI doesn’t output this content without permission. |
| 27 | +- Protected material detection for code: Detects code segments in the model's output that match known code from public repositories, helping to prevent uncredited or unauthorized reproduction of source code. |
| 28 | +- Prompt shields: This feature provides a unified API to address "Jailbreak" and "Indirect Attacks": |
| 29 | + - Jailbreak Attacks: Attempts by users to manipulate the AI into bypassing its safety protocols or ethical guidelines. Examples include prompts designed to trick the AI into giving inappropriate responses or performing tasks it was programmed to avoid. |
| 30 | + - Indirect Attacks: Also known as Cross-Domain Prompt Injection Attacks, indirect attacks involve embedding malicious prompts within documents that the AI might process. For example, if a document contains hidden instructions, the AI might inadvertently follow them, leading to unintended or unsafe outputs. |
| 31 | + |
| 32 | +**Image content**: |
| 33 | +- Moderate image content: Similar to text moderation, this feature filters and assesses image content to detect inappropriate or harmful visuals. |
| 34 | +- Moderate multimodal content: This is designed to handle a combination of text and images, assessing the overall context and any potential risks across multiple types of content. |
| 35 | + |
| 36 | +**Customize your own categories**: |
| 37 | +- Custom categories: Allows users to define specific categories for moderating and filtering content, tailoring safety protocols to unique needs. |
| 38 | +- Safety system message: Provides a method for setting up a "System Message" to instruct the AI on desired behavior and limitations, reinforcing safety boundaries and helping prevent unwanted outputs. |
| 39 | + |
| 40 | +## Understand harm categories |
| 41 | + |
| 42 | +### Harm categories |
| 43 | + |
| 44 | +| Category | Description |API term | |
| 45 | +| --------- | ------------------- | --- | |
| 46 | +| Hate and Fairness | Hate and fairness harms refer to any content that attacks or uses discriminatory language with reference to a person or identity group based on certain differentiating attributes of these groups. <br><br>This includes, but is not limited to:<ul><li>Race, ethnicity, nationality</li><li>Gender identity groups and expression</li><li>Sexual orientation</li><li>Religion</li><li>Personal appearance and body size</li><li>Disability status</li><li>Harassment and bullying</li></ul> | `Hate` | |
| 47 | +| Sexual | Sexual describes language related to anatomical organs and genitals, romantic relationships and sexual acts, acts portrayed in erotic or affectionate terms, including those portrayed as an assault or a forced sexual violent act against one’s will. <br><br> This includes but is not limited to:<ul><li>Vulgar content</li><li>Prostitution</li><li>Nudity and Pornography</li><li>Abuse</li><li>Child exploitation, child abuse, child grooming</li></ul> | `Sexual` | |
| 48 | +| Violence | Violence describes language related to physical actions intended to hurt, injure, damage, or kill someone or something; describes weapons, guns, and related entities. <br><br>This includes, but isn't limited to: <ul><li>Weapons</li><li>Bullying and intimidation</li><li>Terrorist and violent extremism</li><li>Stalking</li></ul> | `Violence` | |
| 49 | +| Self-Harm | Self-harm describes language related to physical actions intended to purposely hurt, injure, damage one’s body or kill oneself. <br><br> This includes, but isn't limited to: <ul><li>Eating Disorders</li><li>Bullying and intimidation</li></ul> | `SelfHarm` | |
| 50 | + |
| 51 | +### Severity levels |
| 52 | + |
| 53 | +| Level | Description | |
| 54 | +| --- | ---| |
| 55 | +|Safe |Content might be related to violence, self-harm, sexual, or hate categories but the terms are used in general, journalistic, scientific, medical, and similar professional contexts, which are appropriate for most audiences. | |
| 56 | +|Low |Content that expresses prejudiced, judgmental, or opinionated views, includes offensive use of language, stereotyping, use cases exploring a fictional world (for example, gaming, literature) and depictions at low intensity.| |
| 57 | +|Medium |Content that uses offensive, insulting, mocking, intimidating, or demeaning language towards specific identity groups, includes depictions of seeking and executing harmful instructions, fantasies, glorification, promotion of harm at medium intensity. | |
| 58 | +|High |Content that displays explicit and severe harmful instructions, actions, damage, or abuse; includes endorsement, glorification, or promotion of severe harmful acts, extreme or illegal forms of harm, radicalization, or nonconsensual power exchange or abuse. | |
| 59 | + |
| 60 | +## Other Content Safety features |
| 61 | + |
| 62 | +| Feature | Functionality | Concepts guide | |
| 63 | +|:--- |:--- | ---| |
| 64 | +| [Groundedness detection](/rest/api/contentsafety/text-groundedness-detection-operations/detect-groundedness-options) (preview) | Detects whether the text responses of large language models (LLMs) are grounded in the source materials provided by the users. | [Groundedness detection concepts](/azure/ai-services/content-safety/concepts/groundedness)| |
| 65 | +| [Protected material text detection](/rest/api/contentsafety/text-operations/detect-text-protected-material) | Scans AI-generated text for known text content (for example, song lyrics, articles, recipes, selected web content). | [Protected material concepts](/azure/ai-services/content-safety/concepts/protected-material)| |
| 66 | +| Custom categories (standard) API (preview) | Lets you create and train your own custom content categories and scan text for matches. | [Custom categories concepts](/azure/ai-services/content-safety/concepts/custom-categories)| |
| 67 | +| Custom categories (rapid) API (preview) | Lets you define emerging harmful content patterns and scan text and images for matches. | [Custom categories concepts](/azure/ai-services/content-safety/concepts/custom-categories)| |
| 68 | + |
| 69 | +Refer to the [Content Safety overview](/azure/ai-services/content-safety/overview) for supported regions, rate limits, and input requirements for all features. Refer to the [Language support](/azure/ai-services/content-safety/language-support) page for supported languages. |
| 70 | + |
| 71 | +## Next step |
| 72 | + |
| 73 | +Get started using Azure AI Content Safety in Azure AI Studio by following the [How-to guide](./how-to/content-safety.md). |
0 commit comments