Skip to content

Commit e796926

Browse files
authored
Merge pull request #208350 from lgayhardt/rai082022
RAI August doc updates
2 parents a409a6e + e7dfc7f commit e796926

22 files changed

+358
-224
lines changed

articles/machine-learning/concept-causal-inference.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -8,50 +8,50 @@ ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: mesameki
1010
author: mesameki
11-
ms.date: 05/10/2022
11+
ms.date: 08/17/2022
1212
ms.custom: responsible-ml, event-tier1-build-2022
1313
---
1414

1515
# Make data-driven policies and influence decision making (preview)
1616

17-
While machine learning models are powerful in identifying patterns in data and making predictions, they offer little support for estimating how the real-world outcome changes in the presence of an intervention. Practitioners have become increasingly focused on using historical data to inform their future decisions and business interventions. For example, how would revenue be affected if a corporation pursues a new pricing strategy? Would a new medication improve a patient’s condition, all else equal?
17+
While machine learning models are powerful in identifying patterns in data and making predictions, they offer little support for estimating how the real-world outcome changes in the presence of an intervention. Practitioners have become increasingly focused on using historical data to inform their future decisions and business interventions. For example, how would the revenue be affected if a corporation pursues a new pricing strategy? Would a new medication improve a patient’s condition, all else equal?
1818

1919

20-
The Causal Inference component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) addresses these questions by estimating the effect of a feature on an outcome of interest on average, across a population or a cohort and on an individual level. It also helps to construct promising interventions by simulating different feature responses to various interventions and creating rules to determine which population cohorts would benefit from a particular intervention. Collectively, these functionalities allow decision makers to apply new policies and affect real-world change.
20+
The Causal Inference component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) addresses these questions by estimating the effect of a feature on an outcome of interest on average, across a population or a cohort, and on an individual level. It also helps to construct promising interventions by simulating different feature responses to various interventions and creating rules to determine which population cohorts would benefit from a particular intervention. Collectively, these functionalities allow decision-makers to apply new policies and affect real-world change.
2121

22-
The capabilities of this component are founded by [EconML](https://github.com/Microsoft/EconML) package, which estimates heterogeneous treatment effects from observational data via [double machine learning](https://econml.azurewebsites.net/spec/estimation/dml.html) technique.
22+
The capabilities of this component are founded by the [EconML](https://github.com/Microsoft/EconML) package, which estimates heterogeneous treatment effects from observational data via [double machine learning](https://econml.azurewebsites.net/spec/estimation/dml.html) technique.
2323

2424
Use Causal Inference when you need to:
2525

2626
- Identify the features that have the most direct effect on your outcome of interest.
2727
- Decide what overall treatment policy to take to maximize real-world impact on an outcome of interest.
2828
- Understand how individuals with certain feature values would respond to a particular treatment policy.
29-
- The causal effects computed based on the treatment features is purely a data property. Hence, a trained model is optional when computing the causal effects.
3029

31-
## How are causal inference insights generated?
3230

33-
> [!NOTE]
34-
> Only historic data is required to generate causal insights.
31+
## How are causal inference insights generated?
3532

33+
>[!NOTE]
34+
> Only historic data is required to generate causal insights. The causal effects computed based on the treatment features are purely a data property. Hence, a trained model is optional when computing the causal effects.
3635
37-
Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed but are either too many (high-dimensional) for classical statistical approaches to be applicable or their effect on the treatment and outcome can't be satisfactorily modeled by parametric functions (non-parametric). Both latter problems can be addressed via machine learning techniques (for an example, see [Chernozhukov2016](https://econml.azurewebsites.net/spec/references.html#chernozhukov2016)).
36+
Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed but are either too many (high-dimensional) for classical statistical approaches to be applicable or their effect on the treatment and outcome can't be satisfactorily modeled by parametric functions (non-parametric). Both latter problems can be addressed via machine learning techniques (to see an example, check out [Chernozhukov2016](https://econml.azurewebsites.net/spec/references.html#chernozhukov2016)).
3837

39-
The method reduces the problem to first estimating two predictive tasks:
38+
The method reduces the problem by first estimating two predictive tasks:
4039

4140
- Predicting the outcome from the controls
4241
- Predicting the treatment from the controls
4342

44-
Then the method combines these two predictive models in a final stage estimation to create a model of the heterogeneous treatment effect. The approach allows for arbitrary machine learning algorithms to be used for the two predictive tasks, while maintaining many favorable statistical properties related to the final model (for example, small mean squared error, asymptotic normality, construction of confidence intervals).
43+
Then the method combines these two predictive models in a final stage estimation to create a model of the heterogeneous treatment effect. The approach allows for arbitrary machine learning algorithms to be used for the two predictive tasks while maintaining many favorable statistical properties related to the final model (for example, small mean squared error, asymptotic normality, and construction of confidence intervals).
4544

4645
## What other tools does Microsoft provide for causal inference?
4746

48-
[Project Azua](https://www.microsoft.com/research/project/project_azua/) provides a novel framework focusing on end-to-end causal inference. Azua’s technology DECI (deep end-to-end causal inference) is a single model that can simultaneously do causal discovery and causal inference. We only require the user to provide data, and the model can output the causal relationships among all different variables. By itself, this can provide insights into the data and enables metrics such as individual treatment effect (ITE), average treatment effect (ATE) and conditional average treatment effect (CATE) to be calculated, which can then be used to make optimal decisions. The framework is scalable for large data, both in terms of the number of variables and the number of data points; it can also handle missing data entries with mixed statistical types.
47+
[Project Azua](https://www.microsoft.com/research/project/project_azua/) provides a novel framework focusing on end-to-end causal inference. Azua’s technology DECI (deep end-to-end causal inference) is a single model that can simultaneously do causal discovery and causal inference. We only require the user to provide data, and the model can output the causal relationships among all different variables. By itself, this can provide insights into the data and enables metrics such as individual treatment effect (ITE), average treatment effect (ATE), and conditional average treatment effect (CATE) to be calculated, which can then be used to make optimal decisions. The framework is scalable for large data, both in terms of the number of variables and the number of data points; it can also handle missing data entries with mixed statistical types.
4948

50-
[EconML](https://www.microsoft.com/research/project/econml/) (powering the backend of the Responsible AI dashboard) is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
49+
[EconML](https://www.microsoft.com/research/project/econml/) (powering the backend of the Responsible AI dashboard's causal inference component) is a Python package that applies the power of machine learning techniques to estimate individualized causal responses from observational or experimental data. The suite of estimation methods provided in EconML represents the latest advances in causal machine learning. By incorporating individual machine learning steps into interpretable causal models, these methods improve the reliability of what-if predictions and make causal analysis quicker and easier for a broad set of users.
5150

52-
[DoWhy](https://py-why.github.io/dowhy/) is a Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible. The key feature of DoWhy is its state-of-the-art refutation API that can automatically test causal assumptions for any estimation method, thus making inference more robust and accessible to non-experts. DoWhy supports estimation of the average causal effect for backdoor, front-door, instrumental variable and other identification methods, and estimation of the conditional effect (CATE) through an integration with the EconML library.
51+
[DoWhy](https://py-why.github.io/dowhy/) is a Python library that aims to spark causal thinking and analysis. DoWhy provides a principled four-step interface for causal inference that focuses on explicitly modeling causal assumptions and validating them as much as possible. The key feature of DoWhy is its state-of-the-art refutation API that can automatically test causal assumptions for any estimation method, thus making inference more robust and accessible to non-experts. DoWhy supports estimation of the average causal effect for backdoor, front-door, instrumental variable, and other identification methods, and estimation of the conditional effect (CATE) through an integration with the EconML library.
5352

5453
## Next steps
5554

56-
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI ](how-to-responsible-ai-dashboard-ui.md)
57-
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md)) based on the insights observed in the Responsible AI dashboard.
55+
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI](how-to-responsible-ai-dashboard-ui.md).
56+
- Explore the [supported causal inference visualizations](how-to-responsible-ai-dashboard.md#causal-analysis) of the Responsible AI dashboard.
57+
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.

articles/machine-learning/concept-counterfactual-analysis.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,31 @@ ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: mesameki
1010
author: mesameki
11-
ms.date: 05/10/2022
11+
ms.date: 08/17/2022
1212
ms.custom: responsible-ml, event-tier1-build-2022
1313
---
1414

1515
# Counterfactuals analysis and what-if (preview)
1616

17-
What-if counterfactuals address the question of “what would the model predict if the action input is changed”, enables understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes. Compared with approximating a machine learning model or ranking features by their predictive importance (which standard interpretability techniques do), counterfactual analysis “interrogates” a model to determine what changes to a particular datapoint would flip the model decision. Such an analysis helps in disentangling the impact of different correlated features in isolation or for acquiring a more nuanced understanding on how much of a feature change is needed to see a model decision flip for classification models and decision change for regression models.
17+
What-if counterfactuals address the question of “what would the model predict if the action input is changed”, enabling understanding and debugging of a machine learning model in terms of how it reacts to input (feature) changes. Compared with approximating a machine learning model or ranking features by their predictive importance (which standard interpretability techniques do), counterfactual analysis “interrogates” a model to determine what changes to a particular datapoint would flip the model decision. Such an analysis helps in disentangling the impact of different correlated features in isolation or for acquiring a more nuanced understanding of how much of a feature change is needed to see a model decision flip for classification models and decision change for regression models.
1818

1919
The Counterfactual Analysis and what-if component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) consists of two functionalities:
2020

21-
- Generating a set of examples with minimal changes to a given point such that they change the model's prediction (showing the closest datapoints with opposite model precisions)
21+
- Generating a set of examples with minimal changes to a given point such that they change the model's prediction (showing the closest data points with opposite model predictions)
2222
- Enabling users to generate their own what-if perturbations to understand how the model reacts to features’ changes.
2323

24-
The capabilities of this component are founded by the [DiCE](https://github.com/interpretml/DiCE) package, which implements counterfactual explanations that provide this information by showing feature-perturbed versions of the same datapoint who would have received a different model prediction (for example, Taylor would have received the loan if their income was higher by $10,000). The counterfactual analysis component enables you to identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
24+
One of the top differentiators of the Responsible AI dashboard's counterfactual analysis component is the fact that you can identify which features to vary and their permissible ranges for valid and logical counterfactual examples.
25+
26+
27+
28+
The capabilities of this component are founded by the [DiCE](https://github.com/interpretml/DiCE) package.
29+
2530

2631
Use What-If Counterfactuals when you need to:
2732

2833
- Examine fairness and reliability criteria as a decision evaluator (by perturbing sensitive attributes such as gender, ethnicity, etc., and observing whether model predictions change).
2934
- Debug specific input instances in depth.
30-
- Provide solutions to end users and determining what they can do to get a desirable outcome from the model next time.
35+
- Provide solutions to end users and determine what they can do to get a desirable outcome from the model next time.
3136

3237
## How are counterfactual examples generated?
3338

@@ -39,5 +44,6 @@ To generate counterfactuals, DiCE implements a few model-agnostic techniques. Th
3944

4045
## Next steps
4146

42-
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI ](how-to-responsible-ai-dashboard-ui.md)
43-
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md)) based on the insights observed in the Responsible AI dashboard.
47+
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI](how-to-responsible-ai-dashboard-ui.md).
48+
- Explore the [supported counterfactual analysis and what-if perturbation visualizations](how-to-responsible-ai-dashboard.md#counterfactual-what-if) of the Responsible AI dashboard.
49+
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.

articles/machine-learning/concept-data-analysis.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,17 @@ ms.subservice: enterprise-readiness
88
ms.topic: how-to
99
ms.author: mesameki
1010
author: mesameki
11-
ms.date: 05/10/2022
11+
ms.date: 08/17/2022
1212
ms.custom: responsible-ml, event-tier1-build-2022
1313
---
1414

1515
# Understand your datasets (preview)
1616

17-
Machine learning models "learn" from historical decisions and actions captured in training data. As a result, their performance in real-world scenarios is heavily influenced by the data they're trained on. When feature distribution in a dataset is skewed, this can cause a model to incorrectly predict datapoints belonging to an underrepresented group or to be optimized along an inappropriate metric. For example, while training a housing price prediction AI, the training set was representing 75% of newer houses that have less than median prices. As a result, it was much less successful in successfully identifying more expensive historic houses. The fix was to add older and expensive houses to the training data and augment the features to include insights about the historic value of the house. Upon incorporating that data augmentation, results improved.
17+
Machine learning models "learn" from historical decisions and actions captured in training data. As a result, their performance in real-world scenarios is heavily influenced by the data they're trained on. When feature distribution in a dataset is skewed, it can cause a model to incorrectly predict data points belonging to an underrepresented group or to be optimized along an inappropriate metric. For example, while training a housing price prediction AI, the training set was representing 75% of newer houses that have less than median prices. As a result, it was much less accurate in successfully identifying more expensive historic houses. The fix was to add older and expensive houses to the training data and augment the features to include insights about the historic value of the house. Upon incorporating that data augmentation, results improved.
1818

19-
The Data Explorer component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) helps visualize datasets based on predicted and actual outcomes, error groups, and specific features. This enables you to identify issues of over- and underrepresentation and to see how data is clustered in the dataset. Data visualizations consist of aggregate plots or individual datapoints.
19+
The Data Explorer component of the [Responsible AI dashboard](concept-responsible-ai-dashboard.md) helps visualize datasets based on predicted and actual outcomes, error groups, and specific features. This enables you to identify issues of over- and under-representation and to see how data is clustered in the dataset. Data visualizations consist of aggregate plots or individual data points.
2020

21-
## When to use Data Explorer
21+
## When to use data explorer?
2222

2323
Use Data Explorer when you need to:
2424

@@ -29,5 +29,6 @@ Use Data Explorer when you need to:
2929

3030
## Next steps
3131

32-
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI ](how-to-responsible-ai-dashboard-ui.md)
33-
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.
32+
- Learn how to generate the Responsible AI dashboard via [CLIv2 and SDKv2](how-to-responsible-ai-dashboard-sdk-cli.md) or [studio UI](how-to-responsible-ai-dashboard-ui.md).
33+
- Explore the [supported data explorer visualizations](how-to-responsible-ai-dashboard.md#data-explorer) of the Responsible AI dashboard.
34+
- Learn how to generate a [Responsible AI scorecard](how-to-responsible-ai-scorecard.md) based on the insights observed in the Responsible AI dashboard.

0 commit comments

Comments
 (0)