Skip to content

Commit b0a6cc8

Browse files
committed
edit pass: more-concept-articles
1 parent 7fb9a49 commit b0a6cc8

File tree

3 files changed

+53
-45
lines changed

3 files changed

+53
-45
lines changed

articles/machine-learning/concept-counterfactual-analysis.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Counterfactuals analysis and what-if
33
titleSuffix: Azure Machine Learning
4-
description: Generate diverse counterfactual examples with feature perturbations to see minimal changes required to achieve desired prediction with the Responsible AI dashboard's integration of DiceML.
4+
description: Generate diverse counterfactual examples with feature perturbations to see minimal changes required to achieve desired prediction with the Responsible AI dashboard's integration of DiCE machine learning.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: enterprise-readiness
@@ -14,33 +14,34 @@ ms.custom: responsible-ml, event-tier1-build-2022
1414

1515
# Counterfactuals analysis and what-if (preview)
1616

17-
What-if counterfactuals address the question of what would the model predict if the action input is changed”, enabling understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes. Compared with approximating a machine learning model or ranking features by their predictive importance (which standard interpretability techniques do), counterfactual analysis “interrogates” a model to determine what changes to a particular datapoint would flip the model decision. Such an analysis helps in disentangling the impact of different correlated features in isolation or for acquiring a more nuanced understanding of how much of a feature change is needed to see a model decision flip for classification models and decision change for regression models.
17+
What-if counterfactuals address the question of what the model would predict if you changed the action input. They enable understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes.
1818

19-
The Counterfactual Analysis and what-if component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) consists of two functionalities:
19+
Standard interpretability techniques approximate a machine learning model or rank features by their predictive importance. By contrast, counterfactual analysis "interrogates" a model to determine what changes to a particular datapoint would flip the model decision.
2020

21-
- Generating a set of examples with minimal changes to a given point such that they change the model's prediction (showing the closest data points with opposite model predictions)
22-
- Enabling users to generate their own what-if perturbations to understand how the model reacts to features’ changes.
23-
24-
One of the top differentiators of the Responsible AI dashboard's counterfactual analysis component is the fact that you can identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
21+
Such an analysis helps in disentangling the impact of correlated features in isolation. It also helps you get a more nuanced understanding of how much of a feature change is needed to see a model decision flip for classification models and a decision change for regression models.
2522

23+
The counterfactual analysis and what-if component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) has two functions:
2624

25+
- Generate a set of examples with minimal changes to a particular point such that they change the model's prediction (showing the closest data points with opposite model predictions).
26+
- Enable users to generate their own what-if perturbations to understand how the model reacts to feature changes.
2727

28-
The capabilities of this component are founded by the [DiCE](https://github.com/interpretml/DiCE) package.
28+
One of the top differentiators of the Responsible AI dashboard's counterfactual analysis component is the fact that you can identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
2929

30+
The capabilities of this component come from the [DiCE](https://github.com/interpretml/DiCE) package.
3031

31-
Use What-If Counterfactuals when you need to:
32+
Use what-if counterfactuals when you need to:
3233

33-
- Examine fairness and reliability criteria as a decision evaluator (by perturbing sensitive attributes such as gender, ethnicity, etc., and observing whether model predictions change).
34+
- Examine fairness and reliability criteria as a decision evaluator by perturbing sensitive attributes such as gender and ethnicity, and then observing whether model predictions change.
3435
- Debug specific input instances in depth.
35-
- Provide solutions to end users and determine what they can do to get a desirable outcome from the model next time.
36+
- Provide solutions to users and determine what they can do to get a desirable outcome from the model.
3637

3738
## How are counterfactual examples generated?
3839

3940
To generate counterfactuals, DiCE implements a few model-agnostic techniques. These methods apply to any opaque-box classifier or regressor. They're based on sampling nearby points to an input point, while optimizing a loss function based on proximity (and optionally, sparsity, diversity, and feasibility). Currently supported methods are:
4041

41-
- [Randomized Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#1.-Independent-random-sampling-of-features): Samples points randomly near the given query point and returns counterfactuals as those points whose predicted label is the desired class.
42-
- [Genetic Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#2.-Genetic-Algorithm): Samples points using a genetic algorithm, given the combined objective of optimizing proximity to the given query point, changing as few features as possible, and diversity among the counterfactuals generated.
43-
- [KD Tree Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#3.-Querying-a-KD-Tree) (For counterfactuals from a given training dataset): This algorithm returns counterfactuals from the training dataset. It constructs a KD tree over the training data points based on a distance function and then returns the closest points to a given query point that yields the desired predicted label.
42+
- [Randomized search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#1.-Independent-random-sampling-of-features): This method samples points randomly near a query point and returns counterfactuals as points whose predicted label is the desired class.
43+
- [Genetic search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#2.-Genetic-Algorithm): This method samples points by using a genetic algorithm, given the combined objective of optimizing proximity to the query point, changing as few features as possible, and seeking diversity among the generated counterfactuals.
44+
- [KD tree search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#3.-Querying-a-KD-Tree): This algorithm returns counterfactuals from the training dataset. It constructs a KD tree over the training data points based on a distance function and then returns the closest points to a particular query point that yields the desired predicted label.
4445

4546
## Next steps
4647

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Understand your datasets
33
titleSuffix: Azure Machine Learning
4-
description: Perform exploratory data analysis to understand feature biases and imbalances with the Responsible AI dashboard's Data Explorer.
4+
description: Perform exploratory data analysis to understand feature biases and imbalances by using the Responsible AI dashboard's data explorer.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: enterprise-readiness
@@ -14,21 +14,23 @@ ms.custom: responsible-ml, event-tier1-build-2022
1414

1515
# Understand your datasets (preview)
1616

17-
Machine learning models "learn" from historical decisions and actions captured in training data. As a result, their performance in real-world scenarios is heavily influenced by the data they're trained on. When feature distribution in a dataset is skewed, it can cause a model to incorrectly predict data points belonging to an underrepresented group or to be optimized along an inappropriate metric. For example, while training a housing price prediction AI, the training set was representing 75% of newer houses that have less than median prices. As a result, it was much less accurate in successfully identifying more expensive historic houses. The fix was to add older and expensive houses to the training data and augment the features to include insights about the historic value of the house. Upon incorporating that data augmentation, results improved.
17+
Machine learning models "learn" from historical decisions and actions captured in training data. As a result, their performance in real-world scenarios is heavily influenced by the data they're trained on. When feature distribution in a dataset is skewed, it can cause a model to incorrectly predict data points that belong to an underrepresented group or to be optimized along an inappropriate metric.
1818

19-
The Data Explorer component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) helps visualize datasets based on predicted and actual outcomes, error groups, and specific features. This enables you to identify issues of over- and under-representation and to see how data is clustered in the dataset. Data visualizations consist of aggregate plots or individual data points.
19+
For example, while a model was training an AI system for predicting house prices, the training set was representing 75 percent of newer houses that have less than median prices. As a result, it was much less accurate in successfully identifying more expensive historic houses. The fix was to add older and expensive houses to the training data and augment the features to include insights about historic value. That data augmentation improved results.
2020

21-
## When to use data explorer?
21+
The data explorer component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) helps visualize datasets based on predicted and actual outcomes, error groups, and specific features. It helps you identify issues of overrepresentation and underrepresentation and to see how data is clustered in the dataset. Data visualizations consist of aggregate plots or individual data points.
2222

23-
Use Data Explorer when you need to:
23+
## When to use the data explorer
24+
25+
Use the data explorer when you need to:
2426

2527
- Explore your dataset statistics by selecting different filters to slice your data into different dimensions (also known as cohorts).
2628
- Understand the distribution of your dataset across different cohorts and feature groups.
27-
- Determine whether your findings related to fairness, error analysis and causality (derived from other dashboard components) are a result of your datasets distribution.
28-
- Decide in which areas to collect more data to mitigate errors arising from representation issues, label noise, feature noise, label bias, etc.
29+
- Determine whether your findings related to fairness, error analysis, and causality (derived from other dashboard components) are a result of your dataset's distribution.
30+
- Decide in which areas to collect more data to mitigate errors that come from representation issues, label noise, feature noise, label bias, and similar factors.
2931

3032
## Next steps
3133

32-
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI](how-to-responsible-ai-dashboard-ui.md).
34+
- Learn how to generate the Responsible AI dashboard via [CLI and SDK](how-to-responsible-ai-dashboard-sdk-cli.md) or [Azure Machine Learning studio UI](how-to-responsible-ai-dashboard-ui.md).
3335
- Explore the [supported data explorer visualizations](how-to-responsible-ai-dashboard.md#data-explorer) of the Responsible AI dashboard.
3436
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.

0 commit comments

Comments
 (0)