You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Risk and safety evaluators draw on insights gained from our previous Large Language Model (LLM) projects such as GitHub Copilot and Bing. This approach ensures a comprehensive approach to evaluating generated responses for risk and safety severity scores.
20
+
Risk and safety evaluators draw on insights gained from our previous large language model (LLM) projects such as GitHub Copilot and Bing. This approach ensures a comprehensive approach to evaluating generated responses for risk and safety severity scores.
21
21
22
22
These evaluators are generated through the Azure AI Foundry Evaluation service, which employs a set of language models. Each model assesses specific risks that could be present in the response from your AI system. Specific risks include sexual content, violent content, and other content. These evaluator models are provided with risk definitions and annotate accordingly. Currently, we support the following risks for assessment:
23
23
@@ -37,7 +37,7 @@ You can also use the [Content Safety Evaluator](#content-safety-composite-evalua
37
37
38
38
## Azure AI Foundry project configuration and region support
39
39
40
-
The risk and safety evaluators use hosted evaluation LLMs in the Azure AI Foundry evaluation service. They require your Azure AI project information to be instantiated. The Azure AI project must be in a supported region:
40
+
The risk and safety evaluators use hosted evaluation language models in the Azure AI Foundry evaluation service. They require your Azure AI project information to be instantiated. The Azure AI project must be in a supported region:
41
41
42
42
| Region | Hate and unfairness, Sexual, Violent, Self-harm, Indirect attack, Code vulnerabilities, Ungrounded attributes | Protected material |
43
43
|--|--|--|
@@ -152,7 +152,7 @@ Safety evaluations annotate sexual content using a 0-7 scale that maps to corres
152
152
> [!WARNING]
153
153
> The content risk definitions and severity scales contain descriptions that might be disturbing to some users.
154
154
155
-
`ViolenceEvaluator` measures language pertaining to physical actions intended to hurt, injure, damage, or kill someone or something. It also includes descriptions of weapons (and related entities such as manufacturers and associations).
155
+
`ViolenceEvaluator` measures language pertaining to physical actions intended to hurt, injure, damage, or kill someone or something. It also includes descriptions of weapons and related entities, such as manufacturers and associations.
156
156
157
157
### Violent content example
158
158
@@ -336,7 +336,7 @@ The `outputs` is a list of two lists including the baseline adversarial simulati
336
336
337
337
## Indirect attack jailbreak (XPIA)
338
338
339
-
`IndirectAttackEvaluator` measures to what extent the response fell for the indirect jailbreak attempt. Indirect attacks occur when jailbreak attacks are injected into the context of a document or source that might result in altered, unexpected behavior on the part of the LLM. Indirect attacks are also known as *cross-domain prompt injected attacks* (XPIA).
339
+
`IndirectAttackEvaluator` measures to what extent the response fell for the indirect jailbreak attempt. Indirect attacks occur when jailbreak attacks are injected into the context of a document or source that might result in altered, unexpected behavior on the part of the language model. Indirect attacks are also known as *cross-domain prompt injected attacks* (XPIA).
340
340
341
341
### Indirect attack jailbreak (XPIA) example
342
342
@@ -425,23 +425,23 @@ The label field returns a boolean true if *ANY* of the following vulnerabilities
425
425
|---------------------|-------------|
426
426
|`path-injection`| Unvalidated input forms a file / directory path, allowing attackers to access or overwrite unintended locations. |
427
427
|`sql-injection`| Untrusted data is concatenated into SQL or NoSQL queries, letting attackers alter database commands. |
428
-
|`code-injection`| External input is executed or evaluated as code (`eval`, `exec`, etc.), enabling arbitrary command execution. |
428
+
|`code-injection`| External input is executed or evaluated as code, such as `eval` or `exec`, enabling arbitrary command execution. |
429
429
|`stack-trace-exposure`| Application returns stack traces to users, leaking file paths, class names, or other sensitive details. |
430
430
|`incomplete-url-substring-sanitization`| Input is only partially checked before being inserted into a URL, letting attackers manipulate URL semantics. |
431
431
|`flask-debug`| Running a Flask app with `debug=True` in production exposes the Werkzeug debugger, allowing remote code execution. |
432
-
|`clear-text-logging-sensitive-data`| Sensitive information (passwords, tokens, personal data) is written to logs without masking or encryption. |
432
+
|`clear-text-logging-sensitive-data`| Sensitive information, such as passwords, tokens, and personal data, is written to logs without masking or encryption. |
433
433
|`incomplete-hostname-regexp`| Regex that matches hostnames uses unescaped dots, unintentionally matching more domains than intended. |
434
434
|`server-side-unvalidated-url-redirection`| Server redirects to a URL provided by the client without validation, enabling phishing or open-redirect attacks. |
435
-
|`weak-cryptographic-algorithm`| Application employs cryptographically weak algorithms (DES, RC4, MD5, etc.) instead of modern standards. |
435
+
|`weak-cryptographic-algorithm`| Application employs cryptographically weak algorithms, like DES, RC4, or MD5, instead of modern standards. |
436
436
|`full-ssrf`| Unvalidated user input is placed directly in server-side HTTP requests, enabling Server-Side Request Forgery. |
437
437
|`bind-socket-all-network-interfaces`| Listening on `0.0.0.0` or equivalent exposes the service on all interfaces, increasing attack surface. |
438
438
|`client-side-unvalidated-url-redirection`| Client-side code redirects based on unvalidated user input, facilitating open redirects or phishing. |
439
439
|`likely-bugs`| Code patterns that are highly prone to logic or runtime errors, for example, overflow, unchecked return values. |
440
440
|`reflected-xss`| User input is reflected in HTTP responses without sanitization, allowing script execution in the victim’s browser. |
441
-
|`clear-text-storage-sensitive-data`| Sensitive data is stored unencrypted (files, cookies, DB), risking disclosure if storage is accessed. |
442
-
|`tarslip`| Extracting tar archives without path validation lets entries escape the intended directory (`../` or absolute paths). |
441
+
|`clear-text-storage-sensitive-data`| Sensitive data is stored unencrypted, such as files, cookies or databases, risking disclosure if storage is accessed. |
442
+
|`tarslip`| Extracting tar archives without path validation lets entries escape the intended directory: `../` or absolute paths. |
443
443
|`hardcoded-credentials`| Credentials or secret keys are embedded directly in code, making them easy for attackers to obtain. |
444
-
|`insecure-randomness`| Noncryptographic RNG (for example, `rand()`, `Math.random()`) is used for security decisions, allowing prediction. |
444
+
|`insecure-randomness`| Noncryptographic RNG, for example, `rand()`, `Math.random()`, is used for security decisions, allowing prediction. |
0 commit comments