You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-safety/quickstart-jailbreak.md
+7-8Lines changed: 7 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,6 @@ ms.author: pafarley
13
13
14
14
# Quickstart: Prompt Shields (preview)
15
15
16
-
## Introduction
17
16
"Prompt Shields" in Azure AI Content Safety are specifically designed to safeguard generative AI systems from generating harmful or inappropriate content. These shields detect and mitigate risks associated with both User Prompt Attacks (malicious or harmful user-generated inputs) and Document Attacks (inputs containing harmful content embedded within documents). The use of "Prompt Shields" is crucial in environments where GenAI is employed, ensuring that AI outputs remain safe, compliant, and trustworthy.
18
17
19
18
The primary objectives of the "Prompt Shields" feature for GenAI applications are:
@@ -27,28 +26,28 @@ For more information on Prompt Shields, see the [Prompt Shields concept page](./
27
26
28
27
29
28
30
-
## User Scenarios
31
-
### 1. AI Content Creation Platforms: Detecting Harmful Prompts
29
+
## User scenarios
30
+
### 1. AI content creation platforms: Detecting harmful prompts
32
31
- Scenario: An AI content creation platform uses generative AI models to produce marketing copy, social media posts, and articles based on user-provided prompts. To prevent the generation of harmful or inappropriate content, the platform integrates "Prompt Shields."
33
32
- User: Content creators, platform administrators, and compliance officers.
34
33
- Action: The platform uses Azure AI Content Safety's "Prompt Shields" to analyze user prompts before generating content. If a prompt is detected as potentially harmful or likely to lead to policy-violating outputs (e.g., prompts asking for defamatory content or hate speech), the shield blocks the prompt and alerts the user to modify their input.
35
34
- Outcome: The platform ensures all AI-generated content is safe, ethical, and compliant with community guidelines, enhancing user trust and protecting the platform's reputation.
36
-
### 2. AI-Powered Chatbots: Mitigating Risk from User Prompt Attacks
35
+
### 2. AI-powered chatbots: Mitigating risk from user prompt attacks
37
36
- Scenario: A customer service provider uses AI-powered chatbots for automated support. To safeguard against user prompts that could lead the AI to generate inappropriate or unsafe responses, the provider uses "Prompt Shields."
38
37
- User: Customer service agents, chatbot developers, and compliance teams.
39
38
- Action: The chatbot system integrates "Prompt Shields" to monitor and evaluate user inputs in real-time. If a user prompt is identified as potentially harmful or designed to exploit the AI (e.g., attempting to provoke inappropriate responses or extract sensitive information), the shield intervenes by blocking the response or redirecting the query to a human agent.
40
39
- Outcome: The customer service provider maintains high standards of interaction safety and compliance, preventing the chatbot from generating responses that could harm users or breach policies.
- Scenario: An e-learning platform employs GenAI to generate personalized educational content based on student inputs and reference documents. To avoid generating inappropriate or misleading educational content, the platform utilizes "Prompt Shields."
43
42
- User: Educators, content developers, and compliance officers.
44
43
- Action: The platform uses "Prompt Shields" to analyze both user prompts and uploaded documents for content that could lead to unsafe or policy-violating AI outputs. If a prompt or document is detected as likely to generate inappropriate educational content, the shield blocks it and suggests alternative, safe inputs.
45
44
- Outcome: The platform ensures that all AI-generated educational materials are appropriate and compliant with academic standards, fostering a safe and effective learning environment.
46
-
### 4. Healthcare AI Assistants: Blocking Unsafe Prompts and Document Inputs
45
+
### 4. Healthcare AI assistants: Blocking unsafe prompts and document inputs
47
46
- Scenario: A healthcare provider uses AI assistants to offer preliminary medical advice based on user inputs and uploaded medical documents. To ensure the AI does not generate unsafe or misleading medical advice, the provider implements "Prompt Shields."
48
47
- User: Healthcare providers, AI developers, and compliance teams.
49
-
Action: The AI assistant employs "Prompt Shields" to analyze patient prompts and uploaded medical documents for harmful or misleading content. If a prompt or document is identified as potentially leading to unsafe medical advice, the shield prevents the AI from generating a response and redirects the patient to a human healthcare professional.
48
+
-Action: The AI assistant employs "Prompt Shields" to analyze patient prompts and uploaded medical documents for harmful or misleading content. If a prompt or document is identified as potentially leading to unsafe medical advice, the shield prevents the AI from generating a response and redirects the patient to a human healthcare professional.
50
49
- Outcome: The healthcare provider ensures that AI-generated medical advice remains safe and accurate, protecting patient safety and maintaining compliance with healthcare regulations.
51
-
### 5. Generative AI for Creative Writing: Protecting Against Prompt Manipulation
50
+
### 5. Generative AI for creative writing: Protecting against prompt manipulation
52
51
- Scenario: A creative writing platform uses GenAI to assist writers in generating stories, poetry, and scripts based on user inputs. To prevent the generation of inappropriate or offensive content, the platform incorporates "Prompt Shields."
53
52
- User: Writers, platform moderators, and content reviewers.
54
53
- Action: The platform integrates "Prompt Shields" to evaluate user prompts for creative writing. If a prompt is detected as likely to produce offensive, defamatory, or otherwise inappropriate content, the shield blocks the AI from generating such content and suggests revisions to the user.
0 commit comments