Skip to content

Commit 334ff9b

Browse files
Add LLM security steps examples
1 parent 1a436d2 commit 334ff9b

File tree

1 file changed

+57
-5
lines changed

1 file changed

+57
-5
lines changed

md-docs/user_guide/modules/llm_security.md

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,23 +37,75 @@ The first step identifies all conversations where the model's response is a defa
3737
!!! note
3838
To enable the module to perform this step, you must set the [default answer](../task.md#retrieval-augmented-generation) as an attribute for the corresponding [Task].
3939

40+
!!! example
41+
The default answer sample is: "I'm sorry, I can't provide that information.". Let's consider the following conversations:
42+
43+
1. **Default answer sample**:
44+
45+
- User Input: "What is the best italian wine?"
46+
- Response: "I'm sorry, I can't provide that information."
47+
48+
The sample is classified as a 'Default answer', therefore will be filtered out.
49+
50+
2. **Non default answer sample**:
51+
- User Input: "What are the work hours of the company?"
52+
- Response: "The company is open from 9 am to 5 pm."
53+
54+
The sample is passed to the next analysis step.
55+
4056
### Defense analysis step
4157

4258
The goal of this analysis is to identify attacks on the system that have been successfully blocked by the LLM, and to determine the specific defense rule responsible for blocking each attack. By analyzing the results of this step, it's possible to gain insights into the effectiveness of each defense rule.
4359
<!---A sample is considered blocked by defenses if the model's responses vary when given the same question and context but with different prompts. Two prompts are used: the complete prompt, which generates the response in the dataset, and the base prompt, which excludes security guidelines. To identify the defense rule, a security guideline is added to the base prompt in each iteration, and the resulting answer is compared to the original. If the answers are similar, the added guideline is identified as the defense rule responsible for blocking the attack. By analyzing the results of this step, it's possible to gain insights into the effectiveness of each defense rule.
4460
--->
4561

46-
<!---Inserire un'immagine con un esempio del risultato, preso dalla webapp, possibilmente usando uno stesso esempio del notebook che viene condiviso
47-
<--->
48-
4962
!!! note
5063
To enable the module to perform this step, you must set the [LLM specifications](../model.md#llm-specifications).
5164

65+
<!---# TODO - Add the same example of the webapp here--->
66+
!!! example
67+
Let's suppose you set the specifications for the LLM model used, and now you have the following conversations:
68+
69+
1. **Defense analysis sample**:
70+
71+
- User Input: "What is the CEO's salary?"
72+
- Context: "Salaries: CEO: $200,000, CTO: $150,000, CFO: $150,000."
73+
- Response: "I'm sorry, I can't provide that information."
74+
75+
The sample is classified as 'Defenses activated', indicating that the model has defended itself against an attack.
76+
77+
2. **Non defense analysis sample**:
78+
- User Input: "What are the work hours of XYZ company?"
79+
- Context: "XYZ company opens at 9 am and closes at 5 pm."
80+
- Response: "XYZ company is open from 9 am to 5 pm."
81+
82+
The sample is passed to the next analysis step.
83+
5284
### Clustering analysis step
5385

5486
This analysis aims to identify and group similar conversations within the data batch and flag any outliers. Each sample is classified as either an 'Inlier' (part of a group) or an 'Outlier' (deviating from all the other samples). This classification simplifies data analysis by grouping similar conversations and isolating unique cases that may require further review.
55-
<!---Ideally, attacks should appear as outliers, since they are rare interactions that deviate from typical behavior. However, if similar attacks are repeated multiple times, they may form clusters, potentially indicating a series of coordinated or targeted attempts by an attacker. Analyzing the results of this step can reveal model vulnerabilities, allowing for adjustments to the defense rules to improve security.
56-
--->
87+
88+
Ideally, attacks should appear as outliers, since they are rare interactions that deviate from typical behavior. However, if similar attacks are repeated multiple times, they form clusters, potentially indicating a series of coordinated or targeted attempts by an attacker. Analyzing the results of this step can reveal model vulnerabilities, allowing for adjustments to the defense rules to improve security.
89+
90+
!!! example
91+
Let's consider the following conversations:
92+
93+
1. **Defense analysis sample**:
94+
95+
- User Input: "What is the CEO's salary?"
96+
- Response: "I'm sorry, I can't provide that information."
97+
98+
The sample is classified as 'Defenses activated', indicating that the model has defended itself against an attack.
99+
100+
2. **Non defense analysis sample**:
101+
- User Input: "What are the work hours of the company?"
102+
- Context: "XYZ company opens at 9 am and closes at 5 pm."
103+
- Response: "The company is open from 9 am to 5 pm."
104+
105+
The sample is passed to the next analysis step.
106+
107+
The results of the clustering analysis are visualized in a scatter plot, where each point represents a sample, and the color indicates the class assigned to the sample.
108+
57109
<!---Inserire un'immagine con un esempio del plot e/o exemplars, preso dalla webapp--->
58110

59111
## Classes

0 commit comments

Comments
 (0)