You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/evaluation-evaluators/custom-evaluators.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ To start evaluating your application's generations, built-in evaluators are grea
19
19
20
20
## Code-based evaluators
21
21
22
-
You don't need a large language model needed for certain evaluation metrics. Code-based evaluators can give you the flexibility to define metrics based on functions or callable class. You can build your own code-based evaluator, for example, by creating a simple Python class that calculates the length of an answer in `answer_length.py` under directory `answer_len/`, as in the following example.
22
+
You don't need a large language model for certain evaluation metrics. Code-based evaluators can give you the flexibility to define metrics based on functions or callable classes. You can build your own code-based evaluator, for example, by creating a simple Python class that calculates the length of an answer in `answer_length.py` under directory `answer_len/`, as in the following example.
23
23
24
24
### Code-based evaluator example: Answer length
25
25
@@ -49,11 +49,15 @@ answer_length = answer_length_evaluator(answer="What is the speed of light?")
49
49
50
50
## Prompt-based evaluators
51
51
52
-
To build your own prompt-based large language model evaluator or AI-assisted annotator, you can create a custom evaluator based on a *Prompty* file. Prompty is a file with the `.prompty` extension for developing prompt template. The Prompty asset is a markdown file with a modified front matter. The front matter is in YAML format. It contains metadata fields that define model configuration and expected inputs of the Prompty. To measure friendliness of a response, you can create a custom evaluator `FriendlinessEvaluator`:
52
+
To build your own prompt-based large language model evaluator or AI-assisted annotator, you can create a custom evaluator based on a *Prompty* file.
53
+
54
+
Prompty is a file with the `.prompty` extension for developing prompt template. The Prompty asset is a markdown file with a modified front matter. The front matter is in YAML format. It contains metadata fields that define model configuration and expected inputs of the Prompty.
55
+
56
+
To measure friendliness of a response, you can create a custom evaluator `FriendlinessEvaluator`:
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/evaluation-evaluators/textual-similarity-evaluators.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,9 @@ ms.custom:
15
15
16
16
# Textual similarity evaluators
17
17
18
-
It's important to compare how closely the textual response generated by your AI system matches the response you would expect. The expected response is the *ground truth*. Use a LLM-judge metric like [`SimilarityEvaluator`](#similarity) with a focus on the semantic similarity between the generated response and the ground truth. Or, use metrics from the field of natural language processing (NLP) including [F1 score](#f1-score), [BLEU](#bleu-score), [GLEU](#gleu-score), [ROUGE](#rouge-score), and [METEOR](#meteor-score) with a focus on the overlaps of tokens or n-grams between the two.
18
+
It's important to compare how closely the textual response generated by your AI system matches the response you would expect. The expected response is called the *ground truth*.
19
+
20
+
Use a LLM-judge metric like [`SimilarityEvaluator`](#similarity) with a focus on the semantic similarity between the generated response and the ground truth. Or, use metrics from the field of natural language processing (NLP), including [F1 score](#f1-score), [BLEU](#bleu-score), [GLEU](#gleu-score), [ROUGE](#rouge-score), and [METEOR](#meteor-score) with a focus on the overlaps of tokens or n-grams between the two.
> We recommend using `o3-mini`for a balance of reasoning capability and cost efficiency.
41
+
> We recommend that you use `o3-mini`to balance of reasoning capability and cost efficiency.
40
42
41
43
## Similarity
42
44
@@ -101,7 +103,7 @@ The numerical score is a 0-1 float. A higher score is better. Given a numerical
101
103
102
104
## BLEU score
103
105
104
-
`BleuScoreEvaluator` computes the Bilingual Evaluation Understudy (BLEU) score commonly used in natural language processing (NLP) and machine translation. It measures how closely the generated text matches the reference text.
106
+
`BleuScoreEvaluator` computes the Bilingual Evaluation Understudy (BLEU) score commonly used in natural language processing and machine translation. It measures how closely the generated text matches the reference text.
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/prompt-flow-troubleshoot.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -88,10 +88,10 @@ request_settings:
88
88
89
89
If you regenerate your Azure OpenAI key and manually update the connection used in a prompt flow, you might see error messages like "Unauthorized. Access token is missing, invalid, audience is incorrect or have expired." You might see these messages when you invoke an existing endpoint that was created before the key was regenerated.
90
90
91
-
This error occurs because the connections used in the endpoints or deployments aren't automatically updated. You should manually update any change for a key or secrets in deployments, which aims to avoid affecting online production deployment because of unintentional offline operation.
91
+
This error occurs because the connections used in the endpoints/deployments aren't automatically updated. You should manually update any change for a key or secrets in deployments, which aims to avoid affecting online production deployment because of unintentional offline operation.
92
92
93
-
- If the endpoint was deployed in the Azure AI Foundry portal, redeploy the flow to the existing endpoint by using the same deployment name.
94
-
- If the endpoint was deployed by using the SDK or the Azure CLI, make a modification to the deployment definition, such as adding a dummy environment variable. Then use `az ml online-deployment update` to update your deployment.
93
+
- If you deployed the endpoint in the Azure AI Foundry portal, redeploy the flow to the existing endpoint by using the same deployment name.
94
+
- If you deployed by using the SDK or the Azure CLI, make a modification to the deployment definition, such as adding a dummy environment variable. Then use `az ml online-deployment update` to update your deployment.
95
95
96
96
### How do I resolve vulnerability issues in prompt flow deployments?
97
97
@@ -100,7 +100,7 @@ For prompt flow runtime-related vulnerabilities, try the following approaches:
100
100
- Update the dependency packages in your `requirements.txt` file in your flow folder.
101
101
- If you use a customized base image for your flow, update the prompt flow runtime to the latest version and rebuild your base image. Then redeploy the flow.
102
102
103
-
For any other vulnerabilities of managed online deployments, Azure AI fixes the issues in monthly.
103
+
For any other vulnerabilities of managed online deployments, Azure AI fixes the issues monthly.
104
104
105
105
### What do I do if I get "MissingDriverProgram" or "Could not find driver program in the request" errors?
106
106
@@ -152,7 +152,7 @@ inference_config:
152
152
153
153
### What do I do if my model response takes too long?
154
154
155
-
You might notice that the deployment takes too long to respond. This delay can occur because of several factors:
155
+
You might notice that the deployment takes a long time to respond. This delay can occur because of several factors:
156
156
157
157
- The model used in the flow isn't powerful enough. For example, use GPT 3.5 instead of `text-ada`.
158
158
- The index query isn't optimized and takes too long.
0 commit comments