You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Introduction to red teaming large language models (LLMs)
2
+
title: Planning red teaming for large language models (LLMs) and their applications
3
3
titleSuffix: Azure OpenAI Service
4
-
description: Learn about how red teaming and adversarial testing is an essential practice in the responsible development of systems and features using large language models (LLMs)
4
+
description: Learn about how red teaming and adversarial testing are an essential practice in the responsible development of systems and features using large language models (LLMs)
5
5
ms.service: azure-ai-openai
6
6
ms.topic: conceptual
7
-
ms.date: 05/18/2023
7
+
ms.date: 11/03/2023
8
8
ms.custom:
9
9
manager: nitinme
10
10
author: mrbullwinkle
@@ -13,67 +13,134 @@ recommendations: false
13
13
keywords:
14
14
---
15
15
16
-
# Introduction to red teaming large language models (LLMs)
16
+
# Planning red teaming for large language models (LLMs) and their applications
17
+
18
+
This guide offers some potential strategies for planning how to set up and manage red teaming for responsible AI (RAI) risks throughout the large language model (LLM) product life cycle.
19
+
20
+
## What is red teaming?
17
21
18
22
The term *red teaming* has historically described systematic adversarial attacks for testing security vulnerabilities. With the rise of LLMs, the term has extended beyond traditional cybersecurity and evolved in common usage to describe many kinds of probing, testing, and attacking of AI systems. With LLMs, both benign and adversarial usage can produce potentially harmful outputs, which can take many forms, including harmful content such as hate speech, incitement or glorification of violence, or sexual content.
19
23
20
-
**Red teaming is an essential practice in the responsible development of systems and features using LLMs**. While not a replacement for systematic [measurement and mitigation](/legal/cognitive-services/openai/overview?context=/azure/ai-services/openai/context/context) work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.
24
+
## Why is RAI red teaming an important practice?
25
+
26
+
Red teaming is a best practice in the responsible development of systems and features using LLMs. While not a replacement for systematic measurement and mitigation work, red teamers help to uncover and identify harms and, in turn, enable measurement strategies to validate the effectiveness of mitigations.
21
27
22
-
Microsoft has conducted red teaming exercises and implemented safety systems (including [content filters](content-filter.md) and other [mitigation strategies](prompt-engineering.md)) for its Azure OpenAI Service models (see this [Responsible AI Overview](/legal/cognitive-services/openai/overview?context=/azure/ai-services/openai/context/context)). However, the context of your LLM application will be unique and you also should conduct red teaming to:
28
+
While Microsoft has conducted red teaming exercises and implemented safety systems (including [content filters](./content-filter.md) and other [mitigation strategies](./prompt-engineering.md)) for its Azure OpenAI Service models (see this [Overview of responsible AI practices](/legal/cognitive-services/openai/overview)), the context of each LLM application will be unique and you also should conduct red teaming to:
29
+
30
+
- Test the LLM base model and determine whether there are gaps in the existing safety systems, given the context of your application.
23
31
24
-
- Test the LLM base model and determine whether there are gaps in the existing safety systems, given the context of your application system.
25
32
- Identify and mitigate shortcomings in the existing default filters or mitigation strategies.
26
-
- Provide feedback on failures so we can make improvements.
27
33
28
-
Here is how you can get started in your process of red teaming LLMs. Advance planning is critical to a productive red teaming exercise.
34
+
- Provide feedback on failures in order to make improvements.
35
+
36
+
- Note that red teaming is not a replacement for systematic measurement. A best practice is to complete an initial round of manual red teaming before conducting systematic measurements and implementing mitigations. As highlighted above, the goal of RAI red teaming is to identify harms, understand the risk surface, and develop the list of harms that can inform what needs to be measured and mitigated.
29
37
30
-
## Getting started
38
+
Here is how you can get started and plan your process of red teaming LLMs. Advance planning is critical to a productive red teaming exercise.
31
39
32
-
### Managing your red team
40
+
##Before testing
33
41
34
-
**Assemble a diverse group of red teamers.**
42
+
### Plan: Who will do the testing
35
43
36
-
LLM red teamers should be a mix of people with diverse social and professional backgrounds, demographic groups, and interdisciplinary expertise that fits the deployment context of your AI system. For example, if you’re designing a chatbot to help health care providers, medical experts can help identify risks in that domain.
44
+
**Assemble a diverse group of red teamers**
37
45
38
-
**Recruit red teamers with both benign and adversarial mindsets.**
46
+
Determine the ideal composition of red teamers in terms of people’s experience, demographics, and expertise across disciplines (for example, experts in AI, social sciences, security) for your product’s domain. For example, if you’re designing a chatbot to help health care providers, medical experts can help identify risks in that domain.
47
+
48
+
**Recruit red teamers with both benign and adversarial mindsets**
39
49
40
50
Having red teamers with an adversarial mindset and security-testing experience is essential for understanding security risks, but red teamers who are ordinary users of your application system and haven’t been involved in its development can bring valuable perspectives on harms that regular users might encounter.
41
51
42
-
**Remember that handling potentially harmful content can be mentally taxing.**
52
+
**Assign red teamers to harms and/or product features**
53
+
54
+
- Assign RAI red teamers with specific expertise to probe for specific types of harms (for example, security subject matter experts can probe for jailbreaks, meta prompt extraction, and content related to cyberattacks).
55
+
56
+
- For multiple rounds of testing, decide whether to switch red teamer assignments in each round to get diverse perspectives on each harm and maintain creativity. If switching assignments, allow time for red teamers to get up to speed on the instructions for their newly assigned harm.
57
+
58
+
- In later stages, when the application and its UI are developed, you might want to assign red teamers to specific parts of the application (i.e., features) to ensure coverage of the entire application.
59
+
60
+
- Consider how much time and effort each red teamer should dedicate (for example, those testing for benign scenarios might need less time than those testing for adversarial scenarios).
61
+
62
+
It can be helpful to provide red teamers with:
63
+
- Clear instructions that could include:
64
+
- An introduction describing the purpose and goal of the given round of red teaming; the product and features that will be tested and how to access them; what kinds of issues to test for; red teamers’ focus areas, if the testing is more targeted; how much time and effort each red teamer should spend on testing; how to record results; and who to contact with questions.
65
+
- A file or location for recording their examples and findings, including information such as:
66
+
- The date an example was surfaced; a unique identifier for the input/output pair if available, for reproducibility purposes; the input prompt; a description or screenshot of the output.
67
+
68
+
### Plan: What to test
69
+
70
+
Because an application is developed using a base model, you may need to test at several different layers:
71
+
72
+
- The LLM base model with its safety system in place to identify any gaps that may need to be addressed in the context of your application system. (Testing is usually done through an API endpoint.)
73
+
74
+
- Your application. (Testing is best done through a UI.)
75
+
76
+
- Both the LLM base model and your application, before and after mitigations are in place.
77
+
78
+
The following recommendations help you choose what to test at various points during red teaming:
79
+
80
+
- You can begin by testing the base model to understand the risk surface, identify harms, and guide the development of RAI mitigations for your product.
81
+
82
+
- Test versions of your product iteratively with and without RAI mitigations in place to assess the effectiveness of RAI mitigations. (Note, manual red teaming might not be sufficient assessment—use systematic measurements as well, but only after completing an initial round of manual red teaming.)
83
+
84
+
- Conduct testing of application(s) on the production UI as much as possible because this most closely resembles real-world usage.
85
+
86
+
When reporting results, make clear which endpoints were used for testing. When testing was done in an endpoint other than product, consider testing again on the production endpoint or UI in future rounds.
87
+
88
+
### Plan: How to test
89
+
90
+
**Conduct open-ended testing to uncover a wide range of harms.**
91
+
92
+
The benefit of RAI red teamers exploring and documenting any problematic content (rather than asking them to find examples of specific harms) enables them to creatively explore a wide range of issues, uncovering blind spots in your understanding of the risk surface.
93
+
94
+
**Create a list of harms from the open-ended testing.**
95
+
96
+
- Consider creating a list of harms, with definitions and examples of the harms.
97
+
- Provide this list as a guideline to red teamers in later rounds of testing.
98
+
99
+
**Conduct guided red teaming and iterate: Continue probing for harms in the list; identify new harms that surface.**
100
+
101
+
Use a list of harms if available and continue testing for known harms and the effectiveness of their mitigations. In the process, you will likely identify new harms. Integrate these into the list and be open to shifting measurement and mitigation priorities to address the newly identified harms.
102
+
103
+
Plan which harms to prioritize for iterative testing. Several factors can inform your prioritization, including, but not limited to, the severity of the harms and the context in which they are more likely to surface.
104
+
105
+
### Plan: How to record data
106
+
107
+
**Decide what data you need to collect and what data is optional.**
108
+
109
+
- Decide what data the red teamers will need to record (for example, the input they used; the output of the system; a unique ID, if available, to reproduce the example in the future; and other notes.)
110
+
111
+
- Be strategic with what data you are collecting to avoid overwhelming red teamers, while not missing out on critical information.
112
+
113
+
**Create a structure for data collection**
114
+
115
+
A shared Excel spreadsheet is often the simplest method for collecting red teaming data. A benefit of this shared file is that red teamers can review each other’s examples to gain creative ideas for their own testing and avoid duplication of data.
116
+
117
+
## During testing
43
118
44
-
You will need to take care of your red teamers, not only by limiting the amount of time they spend on an assignment, but also by letting them know they can opt out at any time. Also, avoid burnout by switching red teamers’ assignments to different focus areas.
119
+
**Plan to be on active standby while red teaming is ongoing**
45
120
46
-
### Planning your red teaming
121
+
- Be prepared to assist red teamers with instructions and access issues.
122
+
- Monitor progress on the spreadsheet and send timely reminders to red teamers.
47
123
48
-
#### Where to test
124
+
##After each round of testing
49
125
50
-
Because a system is developed using a LLM base model, you may need to test at several different layers:
126
+
**Report data**
51
127
52
-
- The LLM base model with its [safety system](./content-filter.md) in place to identify any gaps that may need to be addressed in the context of your application system. (Testing is usually through an API endpoint.)
53
-
- Your application system. (Testing is usually through a UI.)
54
-
- Both the LLM base model and your application system before and after mitigations are in place.
128
+
Share a short report on a regular interval with key stakeholders that:
55
129
56
-
#### How to test
130
+
1. Lists the top identified issues.
57
131
58
-
Consider conducting iterative red teaming in at least two phases:
132
+
2. Provides a link to the raw data.
59
133
60
-
1. Open-ended red teaming, where red teamers are encouraged to discover a variety of harms. This can help you develop a taxonomy of harms to guide further testing. Note that developing a taxonomy of undesired LLM outputs for your application system is crucial to being able to measure the success of specific mitigation efforts.
61
-
2. Guided red teaming, where red teamers are assigned to focus on specific harms listed in the taxonomy while staying alert for any new harms that may emerge. Red teamers can also be instructed to focus testing on specific features of a system for surfacing potential harms.
134
+
3. Previews the testing plan for the upcoming rounds.
62
135
63
-
Be sure to:
136
+
4. Acknowledges red teamers.
64
137
65
-
- Provide your red teamers with clear instructions for what harms or system features they will be testing.
66
-
- Give your red teamers a place for recording their findings. For example, this could be a simple spreadsheet specifying the types of data that red teamers should provide, including basics such as:
67
-
- The type of harm that was surfaced.
68
-
- The input prompt that triggered the output.
69
-
- An excerpt from the problematic output.
70
-
- Comments about why the red teamer considered the output problematic.
71
-
- Maximize the effort of responsible AI red teamers who have expertise for testing specific types of harms or undesired outputs. For example, have security subject matter experts focus on jailbreaks, metaprompt extraction, and content related to aiding cyberattacks.
138
+
5. Provides any other relevant information.
72
139
73
-
### Reporting red teaming findings
140
+
**Differentiate between identification and measurement**
74
141
75
-
You will want to summarize and report red teaming top findings at regular intervals to key stakeholders, including teams involved in the measurement and mitigation of LLM failures so that the findings can inform critical decision making and prioritizations.
142
+
In the report, be sure to clarify that the role of RAI red teaming is to expose and raise understanding of risk surface and is not a replacement for systematic measurement and rigorous mitigation work. It is important that people do not interpret specific examples as a metric for the pervasiveness of that harm.
76
143
77
-
## Next steps
144
+
Additionally, if the report contains problematic content and examples, consider including a content warning.
78
145
79
-
[Learn about other mitigation strategies like prompt engineering](./prompt-engineering.md)
146
+
The guidance in this document is not intended to be, and should not be construed as providing, legal advice. The jurisdiction in which you're operating may have various regulatory or legal requirements that apply to your AI system. Be aware that not all of these recommendations are appropriate for every scenario and, conversely, these recommendations may be insufficient for some scenarios.
0 commit comments