minor updates for clarity

changliu2 · changliu2 · commit ef91f7b2bb49 · 2024-11-07T19:21:14.000-05:00
diff --git a/articles/ai-studio/how-to/develop/evaluate-sdk.md b/articles/ai-studio/how-to/develop/evaluate-sdk.md
@@ -89,6 +89,10 @@ Built-in evaluators can accept *either* query and respons pairs or a list of con
 - Ground truth: the response generated by user/human as the true answer
 - Conversation: a list of messages of user and assistant turns. See more in the next section.
 
+
+> [!NOTE]
+> All evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they will consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators (and 1600 for `RetrievalEvaluator` to accommodate for longer inputs.) 
+
 #### Evaluating multi-turn conversations
 
 For evaluators that support conversations as input, you can just pass in the conversation directly into the evaluator:
@@ -162,6 +166,8 @@ groundedness_score = groundedness_eval(
 )
 print(groundedness_score)
 ```
+> [!NOTE]
+> `GroundednessEvaluator` (open-source prompt-based) supports `query` as an optional input. If `query` is provided, their optimal scenario will be Retrieval Augmented Generation Question and Answering (RAG QA); and otherwise, the optimal scenario will be summarization. This is different from `GroundednessProEvaluator` (powered by Azure Content Safety) which requires `query`.
 
 Here's an example of the result:
 
@@ -176,8 +182,7 @@ Here's an example of the result:
 
 > [!NOTE]
 > We strongly recommend users to migrate their code to use the key without prefixes (for example, `groundedness.groundedness`) to allow your code to support more evaluator models.
-> All evaluators except for `SimilarityEvaluator` come with a reason field. They employ techniques including chain-of-thought reasoning to generate an explanation for the score. Therefore they will consume more token usage in generation as a result of improved evaluation quality. Specifically, `max_token` for evaluator generation has been set to 800 for all AI-assisted evaluators (and 1600 for `RetrievalEvaluator` to accommodate for longer inputs.) 
-> `GroundednessEvaluator` (open-source prompt-based) supports `query` as an optional input. If `query` is provided, their optimal scenario will be Retrieval Augmented Generation Question and Answering (RAG QA); and otherwise, the optimal scenario will be summarization. This is different from `GroundednessProEvaluator` (powered by Azure Content Safety) which requires `query`.
+
 
 
 ### Risk and safety evaluators
@@ -264,7 +269,7 @@ Built-in evaluators are great out of the box to start evaluating your applicatio
 
 ### Code-based evaluators
 
-Sometimes a large language model isn't needed for certain evaluation metrics. This is when code-based evaluators can give you the flexibility to define metrics based on functions or callable class. Given a simple Python class in an example `answer_len/answer_length.py"` that calculates the length of an answer under a directory `answer_len/`:
+Sometimes a large language model isn't needed for certain evaluation metrics. This is when code-based evaluators can give you the flexibility to define metrics based on functions or callable class. You can create your own code-based evaluator, for example, with a simple Python class that calculates the length of an answer in `answer_length.py` under directory `answer_len/`:
 
 ```python
 class AnswerLengthEvaluator:
@@ -274,8 +279,7 @@ class AnswerLengthEvaluator:
     def __call__(self, *, answer: str, **kwargs):
         return {"answer_length": len(answer)}
 ```
-
-You can create your own code-based evaluator and run it on a row of data by importing a callable class:
+Then run the evalutor on a row of data by importing a callable class:
 
 ```python
 with open("answer_len/answer_length.py") as fin:
@@ -579,7 +583,7 @@ After local evaluations of your generative AI applications, you may want to trig
 
 ### Installation Instructions
 
-1. Create a **virtual environment of you choice**. To create one using conda, run the following command:
+1. Create a **virtual Python environment of you choice**. To create one using conda, run the following command:
     ```bash
     conda create -n remote-evaluation
     conda activate remote-evaluation
@@ -634,7 +638,7 @@ from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEv
 print("F1 Score evaluator id:", F1ScoreEvaluator.id)
 ```
 - **From UI**: Follows these steps to fetch evaluator ids after they are registered to your project:
-    - Select on **Evaluation** of your Azure AI project;
+    - Select **Evaluation** tab in your Azure AI project;
     - Select Evaluator library;
     - Select your evaluator(s) of choice by comparing the descriptions;
     - Copy its "Asset ID" which will be your evaluator id, for example, `azureml://registries/azureml/models/Groundedness-Pro-Evaluator/versions/1`.