diff --git a/docs/source/guide/manage_data.md b/docs/source/guide/manage_data.md index 11fe3f406001..788f5164d512 100644 --- a/docs/source/guide/manage_data.md +++ b/docs/source/guide/manage_data.md @@ -140,7 +140,9 @@ These two columns allow you to see agreement scores at a task level. ### Agreement -This is the average agreement score between all annotators for a particular task. Each annotation pair's agreement score will be calculated as new annotations are submitted. For example if there are three annotations for a task, there will be three unique annotation pairs, and the agreement column will show the average agreement score of those three pairs. +The **Agreement** column displays the average agreement score between all annotators for a particular task. + +Each annotation pair's agreement score will be calculated as new annotations are submitted. For example, if there are three annotations for a task, there will be three unique annotation pairs, and the agreement column will show the average agreement score of those three pairs. Here is an example with a simple label config. Let's assume we are using ["Exact matching choices" agreement calculation](stats#Exact-matching-choices-example) ```xml @@ -152,41 +154,75 @@ Here is an example with a simple label config. Let's assume we are using ["Exact ``` -Annotation 1: `Cat` -Annotation 2: `Dog` -Annotation 3: `Cat` +Annotation 1: `Cat` +Annotation 2: `Dog` +Annotation 3: `Cat` The three unique pairs are 1. Annotation 1 <> Annotation 2 - agreement score is `0` 2. Annotation 1 <> Annotation 3 - agreement score is `1` 3. Annotation 2 <> Annotation 3 - agreement score is `0` -The agreement column for this task would show the average of all annotation pair's agreement score - `33%` +The agreement column for this task would show the average of all annotation pair's agreement score: +`33%` ### Agreement (Selected) -The agreement (selected) column builds on top of the agreement column, allowing you to get agreement scores between annotators, ground truth, and model versions. The column header is a dropdown where you can make your selection of what to include in the calculation. +The **Agreement (Selected)** column builds on top of the agreement column, allowing you to get agreement scores between annotators, ground truth, and model versions. + +The column header is a dropdown where you can make your selection of which pairs you want to include in the calculation. -At least two selections need to be made before clicking Apply, which will calculate scores based on your selection and update the column with the appropriate scores. +Under **Choose What To Calculate** there are two options, which can be used for different use cases. + +#### Agreement Pairs + +This allows you to select specific annotators and/or models to compare. + + +You must select at least two items to compare. This can be used in a variety of ways. + +**Subset of annotators** + +You can select a subset of annotators to compare. This is different and more precise than the **Agreement** column which automatically includes all annotators in the score. + +This will then average all annotator vs annotator scores for only the selected annotators. + + + +**Subset of models** + +You can also select multiple models to see model consensus in your project. This will average all model vs model scores for the selected models. + + + +**Subset of models and annotators** -The available selections are -- Ground truth -- All Annotators - - Any subset of annotators -- All Model Versions - - Any subset of model versions - +Other combinations are also possible such as selecting one annotator and multiple models, multiple annotators and multiple models, etc. -There are three types of scores that can be aggregated here -1. Annotation vs annotation agreement scores (e.g. selecting two or more annotators) -2. Annotation vs model version scores (e.g. selecting at least one annotator AND at least one model version) -3. Model version vs model version scores (e.g. selecting two or more model versions) +* If multiple annotators are selected, all annotator vs annotator scores will be included in the average. +* If multiple models are selected, all model vs model scores will be included in the average. +* If one or more annotators are selected along with one or more models, all annotator vs model scores will be included in the average. -If "Ground truth" is selected, all scores from pairs that include a ground truth annotation will also be included in the aggregate score displayed in the column. +#### Ground Truth Match + +If your project contains ground truth annotations, this allows you to compare either a single annotator or a single model to ground truth annotations. + + + + +#### Limitations + +We currently only support calculating the **Agreement (Selected)** columen for tasks with 20 or less annotations. If you have a task with more than this threshold, you will see an info icon with a tooltip. + + + + +#### Example Score Calculations + +Example using the same simple label config as above: -Example using the same simple label config as above ```xml @@ -205,14 +241,15 @@ Lets say for one task we have the following: Here is how the score would be calculated for various selections in the dropdown -#### `All Annotators` selected, `Ground Truth` and `All Model Versions` unselected -This will match the behavior of the `Agreement` column - all annotation pair's scores will be averaged -1. Annotation 1 <> Annotation 2 - agreement score is `0` +#### `Agreement Pairs` with `All Annotators` selected +This will match the behavior of the **Agreement** column - all annotation pair's scores will be averaged: + +1. Annotation 1 <> Annotation 2: Agreement score is `0` Score displayed in column for this task: `0%` -#### `All Annotators` and `All Model Versions` selected, `Ground Truth` unselected -This will average all annoations pair's scores, as well as all annotation <> model version pair's scores +#### `Agreement Pairs` with `All Annotators` and `All Model Versions` selected +This will average all annotation pair's scores, as well as all annotation <> model version pair's scores 1. Annotation 1 <> Annotation 2 - agreement score is `0` 4. Annotation 1 <> Prediction 1 - agreement score is `0` 5. Annotation 1 <> Prediction 2 - agreement score is `1` @@ -221,9 +258,11 @@ This will average all annoations pair's scores, as well as all annotation <> mod Score displayed in column for this task: `40%` -#### `Ground Truth` and `model version 2` selected -This will compare all ground truth annotations with all predictions from `model version 2` -Annotation 1 is marked as ground truth and Prediction 2 is from `model version 2` +#### `Ground Truth Match` with `model version 2` selected +This will compare all ground truth annotations with all predictions from `model version 2`. + +In this example, Annotation 1 is marked as ground truth and Prediction 2 is from `model version 2`: + 1. Annotation 1 <> Prediction 2 - agreement score is `1` Score displayed in column for this task: `100%` diff --git a/docs/themes/v2/source/images/project/agreement-selected-annotators.png b/docs/themes/v2/source/images/project/agreement-selected-annotators.png new file mode 100644 index 000000000000..9a6dfeb6e8ff Binary files /dev/null and b/docs/themes/v2/source/images/project/agreement-selected-annotators.png differ diff --git a/docs/themes/v2/source/images/project/agreement-selected-gt.png b/docs/themes/v2/source/images/project/agreement-selected-gt.png new file mode 100644 index 000000000000..1c8b19a8dc3f Binary files /dev/null and b/docs/themes/v2/source/images/project/agreement-selected-gt.png differ diff --git a/docs/themes/v2/source/images/project/agreement-selected-models.png b/docs/themes/v2/source/images/project/agreement-selected-models.png new file mode 100644 index 000000000000..1ec90a83f73d Binary files /dev/null and b/docs/themes/v2/source/images/project/agreement-selected-models.png differ diff --git a/docs/themes/v2/source/images/project/agreement-selected-threshold.png b/docs/themes/v2/source/images/project/agreement-selected-threshold.png new file mode 100644 index 000000000000..c538c52ca1c6 Binary files /dev/null and b/docs/themes/v2/source/images/project/agreement-selected-threshold.png differ diff --git a/docs/themes/v2/source/images/project/agreement-selected.png b/docs/themes/v2/source/images/project/agreement-selected.png index d91426c66d4d..a7f00efba475 100644 Binary files a/docs/themes/v2/source/images/project/agreement-selected.png and b/docs/themes/v2/source/images/project/agreement-selected.png differ