You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: md-docs/user_guide/modules/llm_security.md
+57-5Lines changed: 57 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,23 +37,75 @@ The first step identifies all conversations where the model's response is a defa
37
37
!!! note
38
38
To enable the module to perform this step, you must set the [default answer](../task.md#retrieval-augmented-generation) as an attribute for the corresponding [Task].
39
39
40
+
!!! example
41
+
The default answer sample is: "I'm sorry, I can't provide that information.". Let's consider the following conversations:
42
+
43
+
1. **Default answer sample**:
44
+
45
+
- User Input: "What is the best italian wine?"
46
+
- Response: "I'm sorry, I can't provide that information."
47
+
48
+
The sample is classified as a 'Default answer', therefore will be filtered out.
49
+
50
+
2. **Non default answer sample**:
51
+
- User Input: "What are the work hours of the company?"
52
+
- Response: "The company is open from 9 am to 5 pm."
53
+
54
+
The sample is passed to the next analysis step.
55
+
40
56
### Defense analysis step
41
57
42
58
The goal of this analysis is to identify attacks on the system that have been successfully blocked by the LLM, and to determine the specific defense rule responsible for blocking each attack. By analyzing the results of this step, it's possible to gain insights into the effectiveness of each defense rule.
43
59
<!---A sample is considered blocked by defenses if the model's responses vary when given the same question and context but with different prompts. Two prompts are used: the complete prompt, which generates the response in the dataset, and the base prompt, which excludes security guidelines. To identify the defense rule, a security guideline is added to the base prompt in each iteration, and the resulting answer is compared to the original. If the answers are similar, the added guideline is identified as the defense rule responsible for blocking the attack. By analyzing the results of this step, it's possible to gain insights into the effectiveness of each defense rule.
44
60
--->
45
61
46
-
<!---Inserire un'immagine con un esempio del risultato, preso dalla webapp, possibilmente usando uno stesso esempio del notebook che viene condiviso
47
-
<--->
48
-
49
62
!!! note
50
63
To enable the module to perform this step, you must set the [LLM specifications](../model.md#llm-specifications).
51
64
65
+
<!---# TODO - Add the same example of the webapp here--->
66
+
!!! example
67
+
Let's suppose you set the specifications for the LLM model used, and now you have the following conversations:
- Response: "I'm sorry, I can't provide that information."
74
+
75
+
The sample is classified as 'Defenses activated', indicating that the model has defended itself against an attack.
76
+
77
+
2. **Non defense analysis sample**:
78
+
- User Input: "What are the work hours of XYZ company?"
79
+
- Context: "XYZ company opens at 9 am and closes at 5 pm."
80
+
- Response: "XYZ company is open from 9 am to 5 pm."
81
+
82
+
The sample is passed to the next analysis step.
83
+
52
84
### Clustering analysis step
53
85
54
86
This analysis aims to identify and group similar conversations within the data batch and flag any outliers. Each sample is classified as either an 'Inlier' (part of a group) or an 'Outlier' (deviating from all the other samples). This classification simplifies data analysis by grouping similar conversations and isolating unique cases that may require further review.
55
-
<!---Ideally, attacks should appear as outliers, since they are rare interactions that deviate from typical behavior. However, if similar attacks are repeated multiple times, they may form clusters, potentially indicating a series of coordinated or targeted attempts by an attacker. Analyzing the results of this step can reveal model vulnerabilities, allowing for adjustments to the defense rules to improve security.
56
-
--->
87
+
88
+
Ideally, attacks should appear as outliers, since they are rare interactions that deviate from typical behavior. However, if similar attacks are repeated multiple times, they form clusters, potentially indicating a series of coordinated or targeted attempts by an attacker. Analyzing the results of this step can reveal model vulnerabilities, allowing for adjustments to the defense rules to improve security.
89
+
90
+
!!! example
91
+
Let's consider the following conversations:
92
+
93
+
1. **Defense analysis sample**:
94
+
95
+
- User Input: "What is the CEO's salary?"
96
+
- Response: "I'm sorry, I can't provide that information."
97
+
98
+
The sample is classified as 'Defenses activated', indicating that the model has defended itself against an attack.
99
+
100
+
2. **Non defense analysis sample**:
101
+
- User Input: "What are the work hours of the company?"
102
+
- Context: "XYZ company opens at 9 am and closes at 5 pm."
103
+
- Response: "The company is open from 9 am to 5 pm."
104
+
105
+
The sample is passed to the next analysis step.
106
+
107
+
The results of the clustering analysis are visualized in a scatter plot, where each point represents a sample, and the color indicates the class assigned to the sample.
108
+
57
109
<!---Inserire un'immagine con un esempio del plot e/o exemplars, preso dalla webapp--->
0 commit comments