Skip to content

Commit ae723ea

Browse files
authored
Merge pull request #209553 from ShawnJackson/more-concept-articles
edit pass: More ML conceptual articles
2 parents a6ab28b + 6a60d1d commit ae723ea

File tree

5 files changed

+125
-97
lines changed

5 files changed

+125
-97
lines changed
Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Make data-driven policies and influence decision making
2+
title: Make data-driven policies and influence decision-making
33
titleSuffix: Azure Machine Learning
4-
description: Make data-driven decisions and policies with the Responsible AI dashboard's integration of the Causal Analysis tool EconML.
4+
description: Make data-driven decisions and policies with the Responsible AI dashboard's integration of the causal analysis tool EconML.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: enterprise-readiness
@@ -12,46 +12,63 @@ ms.date: 08/17/2022
1212
ms.custom: responsible-ml, event-tier1-build-2022
1313
---
1414

15-
# Make data-driven policies and influence decision making (preview)
15+
# Make data-driven policies and influence decision-making (preview)
1616

17-
While machine learning models are powerful in identifying patterns in data and making predictions, they offer little support for estimating how the real-world outcome changes in the presence of an intervention. Practitioners have become increasingly focused on using historical data to inform their future decisions and business interventions. For example, how would the revenue be affected if a corporation pursues a new pricing strategy? Would a new medication improve a patient’s condition, all else equal?
17+
Machine learning models are powerful in identifying patterns in data and making predictions. But they offer little support for estimating how the real-world outcome changes in the presence of an intervention.
1818

19+
Practitioners have become increasingly focused on using historical data to inform their future decisions and business interventions. For example, how would the revenue be affected if a corporation pursued a new pricing strategy? Would a new medication improve a patient's condition, all else equal?
1920

20-
The Causal Inference component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) addresses these questions by estimating the effect of a feature on an outcome of interest on average, across a population or a cohort, and on an individual level. It also helps to construct promising interventions by simulating different feature responses to various interventions and creating rules to determine which population cohorts would benefit from a particular intervention. Collectively, these functionalities allow decision-makers to apply new policies and affect real-world change.
21+
The *causal inference* component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) addresses these questions by estimating the effect of a feature on an outcome of interest on average, across a population or a cohort, and on an individual level. It also helps construct promising interventions by simulating feature responses to various interventions and creating rules to determine which population cohorts would benefit from an intervention. Collectively, these functionalities allow decision-makers to apply new policies and effect real-world change.
2122

22-
The capabilities of this component are founded by the [EconML](https://github.com/Microsoft/EconML) package, which estimates heterogeneous treatment effects from observational data via [double machine learning](https://econml.azurewebsites.net/spec/estimation/dml.html) technique.
23+
The capabilities of this component come from the [EconML](https://github.com/Microsoft/EconML) package. It estimates heterogeneous treatment effects from observational data via the [double machine learning](https://econml.azurewebsites.net/spec/estimation/dml.html) technique.
2324

24-
Use Causal Inference when you need to:
25+
Use causal inference when you need to:
2526

2627
- Identify the features that have the most direct effect on your outcome of interest.
2728
- Decide what overall treatment policy to take to maximize real-world impact on an outcome of interest.
2829
- Understand how individuals with certain feature values would respond to a particular treatment policy.
2930

30-
3131
## How are causal inference insights generated?
3232

3333
>[!NOTE]
34-
> Only historic data is required to generate causal insights. The causal effects computed based on the treatment features are purely a data property. Hence, a trained model is optional when computing the causal effects.
34+
> Only historical data is required to generate causal insights. The causal effects computed based on the treatment features are purely a data property. So, a trained model is optional when you're computing the causal effects.
35+
36+
Double machine learning is a method for estimating heterogeneous treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed but either of the following problems exists:
3537

36-
Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed but are either too many (high-dimensional) for classical statistical approaches to be applicable or their effect on the treatment and outcome can't be satisfactorily modeled by parametric functions (non-parametric). Both latter problems can be addressed via machine learning techniques (to see an example, check out [Chernozhukov2016](https://econml.azurewebsites.net/spec/references.html#chernozhukov2016)).
38+
- There are too many for classical statistical approaches to be applicable. That is, they're *high-dimensional*.
39+
- Their effect on the treatment and outcome can't be satisfactorily modeled by parametric functions. That is, they're *non-parametric*.
3740

38-
The method reduces the problem by first estimating two predictive tasks:
41+
You can use machine learning techniques to address both problems. For an example, see [Chernozhukov2016](https://econml.azurewebsites.net/spec/references.html#chernozhukov2016).
42+
43+
Double machine learning reduces the problem by first estimating two predictive tasks:
3944

4045
- Predicting the outcome from the controls
4146
- Predicting the treatment from the controls
4247

43-
Then the method combines these two predictive models in a final stage estimation to create a model of the heterogeneous treatment effect. The approach allows for arbitrary machine learning algorithms to be used for the two predictive tasks while maintaining many favorable statistical properties related to the final model (for example, small mean squared error, asymptotic normality, and construction of confidence intervals).
48+
Then the method combines these two predictive models in a final-stage estimation to create a model of the heterogeneous treatment effect. This approach allows for arbitrary machine learning algorithms to be used for the two predictive tasks while maintaining many favorable statistical properties related to the final model. These properties include small mean squared error, asymptotic normality, and construction of confidence intervals.
4449

4550
## What other tools does Microsoft provide for causal inference?
4651

47-
[Project Azua](https://www.microsoft.com/research/project/project_azua/) provides a novel framework focusing on end-to-end causal inference. Azua’s technology DECI (deep end-to-end causal inference) is a single model that can simultaneously do causal discovery and causal inference. We only require the user to provide data, and the model can output the causal relationships among all different variables. By itself, this can provide insights into the data and enables metrics such as individual treatment effect (ITE), average treatment effect (ATE), and conditional average treatment effect (CATE) to be calculated, which can then be used to make optimal decisions. The framework is scalable for large data, both in terms of the number of variables and the number of data points; it can also handle missing data entries with mixed statistical types.
52+
- [Project Azua](https://www.microsoft.com/research/project/project_azua/) provides a novel framework that focuses on end-to-end causal inference.
53+
54+
Azua's DECI (deep end-to-end causal inference) technology is a single model that can simultaneously do causal discovery and causal inference. The user provides data, and the model can output the causal relationships among all variables.
55+
56+
By itself, this approach can provide insights into the data. It enables the calculation of metrics such as individual treatment effect (ITE), average treatment effect (ATE), and conditional average treatment effect (CATE). You can then use these calculations to make optimal decisions.
57+
58+
The framework is scalable for large data, in terms of both the number of variables and the number of data points. It can also handle missing data entries with mixed statistical types.
59+
60+
- [EconML](https://www.microsoft.com/research/project/econml/) powers the back end of the Responsible AI dashboard's causal inference component. It's a Python package that applies machine learning techniques to estimate individualized causal responses from observational or experimental data.
61+
62+
The suite of estimation methods in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
63+
64+
- [DoWhy](https://py-why.github.io/dowhy/) is a Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible.
4865

49-
[EconML](https://www.microsoft.com/research/project/econml/) (powering the backend of the Responsible AI dashboard's causal inference component) is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
66+
The key feature of DoWhy is its state-of-the-art refutation API that can automatically test causal assumptions for any estimation method. It makes inference more robust and accessible to non-experts.
5067

51-
[DoWhy](https://py-why.github.io/dowhy/) is a Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible. The key feature of DoWhy is its state-of-the-art refutation API that can automatically test causal assumptions for any estimation method, thus making inference more robust and accessible to non-experts. DoWhy supports estimation of the average causal effect for backdoor, front-door, instrumental variable, and other identification methods, and estimation of the conditional effect (CATE) through an integration with the EconML library.
68+
DoWhy supports estimation of the average causal effect for back-door, front-door, instrumental variable, and other identification methods. It also supports estimation of the CATE through an integration with the EconML library.
5269

5370
## Next steps
5471

55-
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI](how-to-responsible-ai-dashboard-ui.md).
72+
- Learn how to generate the Responsible AI dashboard via [CLI and SDK](how-to-responsible-ai-dashboard-sdk-cli.md) or [Azure Machine Learning studio UI](how-to-responsible-ai-dashboard-ui.md).
5673
- Explore the [supported causal inference visualizations](how-to-responsible-ai-dashboard.md#causal-analysis) of the Responsible AI dashboard.
5774
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.

articles/machine-learning/concept-counterfactual-analysis.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Counterfactuals analysis and what-if
33
titleSuffix: Azure Machine Learning
4-
description: Generate diverse counterfactual examples with feature perturbations to see minimal changes required to achieve desired prediction with the Responsible AI dashboard's integration of DiceML.
4+
description: Generate diverse counterfactual examples with feature perturbations to see minimal changes required to achieve desired prediction with the Responsible AI dashboard's integration of DiCE machine learning.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: enterprise-readiness
@@ -14,33 +14,34 @@ ms.custom: responsible-ml, event-tier1-build-2022
1414

1515
# Counterfactuals analysis and what-if (preview)
1616

17-
What-if counterfactuals address the question of what would the model predict if the action input is changed”, enabling understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes. Compared with approximating a machine learning model or ranking features by their predictive importance (which standard interpretability techniques do), counterfactual analysis “interrogates” a model to determine what changes to a particular datapoint would flip the model decision. Such an analysis helps in disentangling the impact of different correlated features in isolation or for acquiring a more nuanced understanding of how much of a feature change is needed to see a model decision flip for classification models and decision change for regression models.
17+
What-if counterfactuals address the question of what the model would predict if you changed the action input. They enable understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes.
1818

19-
The Counterfactual Analysis and what-if component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) consists of two functionalities:
19+
Standard interpretability techniques approximate a machine learning model or rank features by their predictive importance. By contrast, counterfactual analysis "interrogates" a model to determine what changes to a particular data point would flip the model decision.
2020

21-
- Generating a set of examples with minimal changes to a given point such that they change the model's prediction (showing the closest data points with opposite model predictions)
22-
- Enabling users to generate their own what-if perturbations to understand how the model reacts to features’ changes.
23-
24-
One of the top differentiators of the Responsible AI dashboard's counterfactual analysis component is the fact that you can identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
21+
Such an analysis helps in disentangling the impact of correlated features in isolation. It also helps you get a more nuanced understanding of how much of a feature change is needed to see a model decision flip for classification models and a decision change for regression models.
2522

23+
The *counterfactual analysis and what-if* component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) has two functions:
2624

25+
- Generate a set of examples with minimal changes to a particular point such that they change the model's prediction (showing the closest data points with opposite model predictions).
26+
- Enable users to generate their own what-if perturbations to understand how the model reacts to feature changes.
2727

28-
The capabilities of this component are founded by the [DiCE](https://github.com/interpretml/DiCE) package.
28+
One of the top differentiators of the Responsible AI dashboard's counterfactual analysis component is the fact that you can identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
2929

30+
The capabilities of this component come from the [DiCE](https://github.com/interpretml/DiCE) package.
3031

31-
Use What-If Counterfactuals when you need to:
32+
Use what-if counterfactuals when you need to:
3233

33-
- Examine fairness and reliability criteria as a decision evaluator (by perturbing sensitive attributes such as gender, ethnicity, etc., and observing whether model predictions change).
34+
- Examine fairness and reliability criteria as a decision evaluator by perturbing sensitive attributes such as gender and ethnicity, and then observing whether model predictions change.
3435
- Debug specific input instances in depth.
35-
- Provide solutions to end users and determine what they can do to get a desirable outcome from the model next time.
36+
- Provide solutions to users and determine what they can do to get a desirable outcome from the model.
3637

3738
## How are counterfactual examples generated?
3839

3940
To generate counterfactuals, DiCE implements a few model-agnostic techniques. These methods apply to any opaque-box classifier or regressor. They're based on sampling nearby points to an input point, while optimizing a loss function based on proximity (and optionally, sparsity, diversity, and feasibility). Currently supported methods are:
4041

41-
- [Randomized Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#1.-Independent-random-sampling-of-features): Samples points randomly near the given query point and returns counterfactuals as those points whose predicted label is the desired class.
42-
- [Genetic Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#2.-Genetic-Algorithm): Samples points using a genetic algorithm, given the combined objective of optimizing proximity to the given query point, changing as few features as possible, and diversity among the counterfactuals generated.
43-
- [KD Tree Search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#3.-Querying-a-KD-Tree) (For counterfactuals from a given training dataset): This algorithm returns counterfactuals from the training dataset. It constructs a KD tree over the training data points based on a distance function and then returns the closest points to a given query point that yields the desired predicted label.
42+
- [Randomized search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#1.-Independent-random-sampling-of-features): This method samples points randomly near a query point and returns counterfactuals as points whose predicted label is the desired class.
43+
- [Genetic search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#2.-Genetic-Algorithm): This method samples points by using a genetic algorithm, given the combined objective of optimizing proximity to the query point, changing as few features as possible, and seeking diversity among the generated counterfactuals.
44+
- [KD tree search](http://interpret.ml/DiCE/notebooks/DiCE_model_agnostic_CFs.html#3.-Querying-a-KD-Tree): This algorithm returns counterfactuals from the training dataset. It constructs a KD tree over the training data points based on a distance function and then returns the closest points to a particular query point that yields the desired predicted label.
4445

4546
## Next steps
4647

0 commit comments

Comments
 (0)