Skip to content

Commit fecf8b7

Browse files
Additional edits.
1 parent 15f299c commit fecf8b7

File tree

3 files changed

+17
-11
lines changed

3 files changed

+17
-11
lines changed

articles/ai-foundry/concepts/evaluation-evaluators/custom-evaluators.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ To start evaluating your application's generations, built-in evaluators are grea
1919

2020
## Code-based evaluators
2121

22-
You don't need a large language model needed for certain evaluation metrics. Code-based evaluators can give you the flexibility to define metrics based on functions or callable class. You can build your own code-based evaluator, for example, by creating a simple Python class that calculates the length of an answer in `answer_length.py` under directory `answer_len/`, as in the following example.
22+
You don't need a large language model for certain evaluation metrics. Code-based evaluators can give you the flexibility to define metrics based on functions or callable classes. You can build your own code-based evaluator, for example, by creating a simple Python class that calculates the length of an answer in `answer_length.py` under directory `answer_len/`, as in the following example.
2323

2424
### Code-based evaluator example: Answer length
2525

@@ -49,11 +49,15 @@ answer_length = answer_length_evaluator(answer="What is the speed of light?")
4949

5050
## Prompt-based evaluators
5151

52-
To build your own prompt-based large language model evaluator or AI-assisted annotator, you can create a custom evaluator based on a *Prompty* file. Prompty is a file with the `.prompty` extension for developing prompt template. The Prompty asset is a markdown file with a modified front matter. The front matter is in YAML format. It contains metadata fields that define model configuration and expected inputs of the Prompty. To measure friendliness of a response, you can create a custom evaluator `FriendlinessEvaluator`:
52+
To build your own prompt-based large language model evaluator or AI-assisted annotator, you can create a custom evaluator based on a *Prompty* file.
53+
54+
Prompty is a file with the `.prompty` extension for developing prompt template. The Prompty asset is a markdown file with a modified front matter. The front matter is in YAML format. It contains metadata fields that define model configuration and expected inputs of the Prompty.
55+
56+
To measure friendliness of a response, you can create a custom evaluator `FriendlinessEvaluator`:
5357

5458
### Prompt-based evaluator example: Friendliness evaluator
5559

56-
First, create a `friendliness.prompty` file that describes the definition of the friendliness metric and its grading rubric:
60+
First, create a `friendliness.prompty` file that defines the friendliness metric and its grading rubric:
5761

5862
```md
5963
---

articles/ai-foundry/concepts/evaluation-evaluators/textual-similarity-evaluators.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ ms.custom:
1515

1616
# Textual similarity evaluators
1717

18-
It's important to compare how closely the textual response generated by your AI system matches the response you would expect. The expected response is the *ground truth*. Use a LLM-judge metric like [`SimilarityEvaluator`](#similarity) with a focus on the semantic similarity between the generated response and the ground truth. Or, use metrics from the field of natural language processing (NLP) including [F1 score](#f1-score), [BLEU](#bleu-score), [GLEU](#gleu-score), [ROUGE](#rouge-score), and [METEOR](#meteor-score) with a focus on the overlaps of tokens or n-grams between the two.
18+
It's important to compare how closely the textual response generated by your AI system matches the response you would expect. The expected response is called the *ground truth*.
19+
20+
Use a LLM-judge metric like [`SimilarityEvaluator`](#similarity) with a focus on the semantic similarity between the generated response and the ground truth. Or, use metrics from the field of natural language processing (NLP), including [F1 score](#f1-score), [BLEU](#bleu-score), [GLEU](#gleu-score), [ROUGE](#rouge-score), and [METEOR](#meteor-score) with a focus on the overlaps of tokens or n-grams between the two.
1921

2022
## Model configuration for AI-assisted evaluators
2123

@@ -36,7 +38,7 @@ model_config = AzureOpenAIModelConfiguration(
3638
```
3739

3840
> [!TIP]
39-
> We recommend using `o3-mini` for a balance of reasoning capability and cost efficiency.
41+
> We recommend that you use `o3-mini` to balance of reasoning capability and cost efficiency.
4042
4143
## Similarity
4244

@@ -101,7 +103,7 @@ The numerical score is a 0-1 float. A higher score is better. Given a numerical
101103

102104
## BLEU score
103105

104-
`BleuScoreEvaluator` computes the Bilingual Evaluation Understudy (BLEU) score commonly used in natural language processing (NLP) and machine translation. It measures how closely the generated text matches the reference text.
106+
`BleuScoreEvaluator` computes the Bilingual Evaluation Understudy (BLEU) score commonly used in natural language processing and machine translation. It measures how closely the generated text matches the reference text.
105107

106108
### BLEU example
107109

articles/ai-foundry/how-to/prompt-flow-troubleshoot.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,10 @@ request_settings:
8888
8989
If you regenerate your Azure OpenAI key and manually update the connection used in a prompt flow, you might see error messages like "Unauthorized. Access token is missing, invalid, audience is incorrect or have expired." You might see these messages when you invoke an existing endpoint that was created before the key was regenerated.
9090
91-
This error occurs because the connections used in the endpoints or deployments aren't automatically updated. You should manually update any change for a key or secrets in deployments, which aims to avoid affecting online production deployment because of unintentional offline operation.
91+
This error occurs because the connections used in the endpoints/deployments aren't automatically updated. You should manually update any change for a key or secrets in deployments, which aims to avoid affecting online production deployment because of unintentional offline operation.
9292
93-
- If the endpoint was deployed in the Azure AI Foundry portal, redeploy the flow to the existing endpoint by using the same deployment name.
94-
- If the endpoint was deployed by using the SDK or the Azure CLI, make a modification to the deployment definition, such as adding a dummy environment variable. Then use `az ml online-deployment update` to update your deployment.
93+
- If you deployed the endpoint in the Azure AI Foundry portal, redeploy the flow to the existing endpoint by using the same deployment name.
94+
- If you deployed by using the SDK or the Azure CLI, make a modification to the deployment definition, such as adding a dummy environment variable. Then use `az ml online-deployment update` to update your deployment.
9595

9696
### How do I resolve vulnerability issues in prompt flow deployments?
9797

@@ -100,7 +100,7 @@ For prompt flow runtime-related vulnerabilities, try the following approaches:
100100
- Update the dependency packages in your `requirements.txt` file in your flow folder.
101101
- If you use a customized base image for your flow, update the prompt flow runtime to the latest version and rebuild your base image. Then redeploy the flow.
102102

103-
For any other vulnerabilities of managed online deployments, Azure AI fixes the issues in monthly.
103+
For any other vulnerabilities of managed online deployments, Azure AI fixes the issues monthly.
104104

105105
### What do I do if I get "MissingDriverProgram" or "Could not find driver program in the request" errors?
106106

@@ -152,7 +152,7 @@ inference_config:
152152

153153
### What do I do if my model response takes too long?
154154

155-
You might notice that the deployment takes too long to respond. This delay can occur because of several factors:
155+
You might notice that the deployment takes a long time to respond. This delay can occur because of several factors:
156156

157157
- The model used in the flow isn't powerful enough. For example, use GPT 3.5 instead of `text-ada`.
158158
- The index query isn't optimized and takes too long.

0 commit comments

Comments
 (0)