Skip to content

Commit ca6de35

Browse files
Copilotmeta-codesync[bot]
authored andcommitted
Update evaluation_of_results.md with current balance package outputs (#313)
Summary: ## Plan to update evaluation_of_results.md based on quickstart tutorial - [x] Update the print(adjusted) output to match current format (includes adjustment details section) - [x] Update the summary() output to match current format (structured sections, includes KLD metrics, outcome weighted means, and confidence intervals) - [x] Update the covars().mean().T output to match current column values - [x] Update the covars().asmd().T output to match current values - [x] Update the outcomes().summary() output to match current format (includes confidence intervals, weights impact, and more detailed response rates) - [x] Update the design_effect() output value - [x] Review all changes to ensure minimal modifications - [x] Address PR review feedback: - Fixed typo: `adjust` → `adjusted` in outcomes example - Clarified that `.mean()` shows covariate means, not ASMD - Replaced tab characters with spaces for consistent formatting - Added note about when target outcome columns appear in output - Fixed grammar: "didn't get improved" → "didn't improve" <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>[FEATURE] Update website/docs/docs/general_framework/evaluation_of_results.md</issue_title> > <issue_description>Update text in: > https://github.com/facebookresearch/balance/blob/main/website/docs/docs/general_framework/evaluation_of_results.md > Based on updated output from here: > https://import-balance.org/docs/tutorials/quickstart/</issue_description> > > ## Comments on the Issue (you are copilot in this section) > > <comments> > </comments> > </details> - Fixes #312 --- ✨ Let Copilot coding agent [set things up for you](https://github.com/facebookresearch/balance/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. Pull Request resolved: #313 Reviewed By: omriharosh Differential Revision: D92278385 Pulled By: talgalili fbshipit-source-id: fd396ce4b6291ba4e830f487226a6e60109a3024
1 parent 8d58969 commit ca6de35

File tree

1 file changed

+86
-44
lines changed

1 file changed

+86
-44
lines changed

website/docs/docs/general_framework/evaluation_of_results.md

Lines changed: 86 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -24,19 +24,28 @@ print(adjusted)
2424
Output:
2525

2626
```
27-
Adjusted balance Sample object with target set using ipw
28-
1000 observations x 3 variables: gender,age_group,income
29-
id_column: id, weight_column: weight,
30-
outcome_columns: happiness
3127
32-
target:
28+
Adjusted balance Sample object with target set using ipw
29+
1000 observations x 3 variables: gender,age_group,income
30+
id_column: id, weight_column: weight,
31+
outcome_columns: happiness
3332
34-
balance Sample object
35-
10000 observations x 3 variables: gender,age_group,income
36-
id_column: id, weight_column: weight,
37-
outcome_columns: None
33+
adjustment details:
34+
method: ipw
35+
weight trimming mean ratio: 20
36+
design effect (Deff): 1.880
37+
effective sample size proportion (ESSP): 0.532
38+
effective sample size (ESS): 531.9
39+
40+
target:
41+
42+
balance Sample object
43+
10000 observations x 3 variables: gender,age_group,income
44+
id_column: id, weight_column: weight,
45+
outcome_columns: happiness
46+
47+
3 common variables: gender,age_group,income
3848
39-
3 common variables: income,age_group,gender
4049
```
4150

4251

@@ -47,17 +56,34 @@ print(adjusted.summary())
4756
```
4857

4958
This will return several results:
50-
- Covariate mean ASMD improvement: ASMD is "Absolute Standardized Mean Difference". For continuous variables, this measure is the same as taking the absolute value of [Cohen's d statistic](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d) (also related to [SSMD](https://en.wikipedia.org/wiki/Strictly_standardized_mean_difference)), when using the (weighted) standard deviation of the target population. For categorical variables it uses [one-hot encoding](https://en.wikipedia.org/wiki/One-hot).
51-
- [Design effect](https://en.wikipedia.org/wiki/Design_effect)
52-
- Covariate mean Adjusted Standardized Mean Deviation (ASMD) versus Unadjusted covariate mean ASMD
53-
- Model proportion deviance explained (if inverse propensity weighting method was used)
59+
- Adjustment details: method used and weight trimming parameters
60+
- Covariate diagnostics: ASMD is "Absolute Standardized Mean Difference". For continuous variables, this measure is the same as taking the absolute value of [Cohen's d statistic](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d) (also related to [SSMD](https://en.wikipedia.org/wiki/Strictly_standardized_mean_difference)), when using the (weighted) standard deviation of the target population. For categorical variables it uses [one-hot encoding](https://en.wikipedia.org/wiki/One-hot). Also includes KLD (Kullback-Leibler divergence) metrics.
61+
- Weight diagnostics: [Design effect](https://en.wikipedia.org/wiki/Design_effect), effective sample size proportion (ESSP), and effective sample size (ESS)
62+
- Outcome weighted means: means for each outcome variable across self (adjusted), target, and unadjusted samples
63+
- Model performance: Model proportion deviance explained (if inverse propensity weighting method was used)
5464

5565
Output:
5666

5767
```
58-
Covar ASMD reduction: 62.3%, design effect: 2.249
59-
Covar ASMD (7 variables): 0.335 -> 0.126
60-
Model performance: Model proportion deviance explained: 0.174
68+
Adjustment details:
69+
method: ipw
70+
weight trimming mean ratio: 20
71+
Covariate diagnostics:
72+
Covar ASMD reduction: 63.4%
73+
Covar ASMD (7 variables): 0.327 -> 0.120
74+
Covar mean KLD reduction: 95.3%
75+
Covar mean KLD (3 variables): 0.071 -> 0.003
76+
Weight diagnostics:
77+
design effect (Deff): 1.880
78+
effective sample size proportion (ESSP): 0.532
79+
effective sample size (ESS): 531.9
80+
Outcome weighted means:
81+
happiness
82+
source
83+
self 53.295
84+
target 56.278
85+
unadjusted 48.559
86+
Model performance: Model proportion deviance explained: 0.173
6187
```
6288

6389
Note that although we had 3 variables in our original data (age_group, gender, income), the asmd counts each level of the categorical variables as separate variable, and thus it considered 7 variables for the covar ASMD improvement.
@@ -74,18 +100,18 @@ adjusted.covars().mean().T
74100
To get:
75101

76102
```
77-
source self target unadjusted
78-
_is_na_gender[T.True] 0.103449 0.089800 0.08800
79-
age_group[T.25-34] 0.279072 0.297400 0.30900
80-
age_group[T.35-44] 0.290137 0.299200 0.17200
81-
age_group[T.45+] 0.150714 0.206300 0.04600
82-
gender[Female] 0.410664 0.455100 0.26800
83-
gender[Male] 0.485887 0.455100 0.64400
84-
gender[_NA] 0.103449 0.089800 0.08800
85-
income 9.519935 12.737608 5.99102
103+
source self target unadjusted
104+
_is_na_gender[T.True] 0.086776 0.089800 0.088000
105+
age_group[T.25-34] 0.307355 0.297400 0.300000
106+
age_group[T.35-44] 0.273609 0.299200 0.156000
107+
age_group[T.45+] 0.137581 0.206300 0.053000
108+
gender[Female] 0.406337 0.455100 0.268000
109+
gender[Male] 0.506887 0.455100 0.644000
110+
gender[_NA] 0.086776 0.089800 0.088000
111+
income 10.060068 12.737608 6.297302
86112
```
87113

88-
The `self` is the adjusted ASMD, while `unadjusted` is the unadjusted ASMD.
114+
Here, `self` is the adjusted (weighted) covariate mean, `target` is the target mean, and `unadjusted` is the unadjusted sample mean.
89115

90116

91117
And `.asmd()` to get ASMD:
@@ -98,18 +124,18 @@ To get:
98124

99125
```
100126
source self unadjusted unadjusted - self
101-
age_group[T.25-34] 0.040094 0.025375 -0.014719
102-
age_group[T.35-44] 0.019792 0.277771 0.257980
103-
age_group[T.45+] 0.137361 0.396127 0.258765
104-
gender[Female] 0.089228 0.375699 0.286472
105-
gender[Male] 0.061820 0.379314 0.317494
106-
gender[_NA] 0.047739 0.006296 -0.041444
107-
income 0.246918 0.517721 0.270802
108-
mean(asmd) 0.126310 0.334860 0.208551
127+
age_group[T.25-34] 0.021777 0.005688 -0.016090
128+
age_group[T.35-44] 0.055884 0.312711 0.256827
129+
age_group[T.45+] 0.169816 0.378828 0.209013
130+
gender[Female] 0.097916 0.375699 0.277783
131+
gender[Male] 0.103989 0.379314 0.275324
132+
gender[_NA] 0.010578 0.006296 -0.004282
133+
income 0.205469 0.494217 0.288748
134+
mean(asmd) 0.119597 0.326799 0.207202
109135
```
110136

111137
We can see that on average the ASMD improved from 0.33 to 0.12 thanks to the weights. We got improvements in income, gender, and age_group.
112-
Although we can see that `age_group[T.25-34]` didn't get improved.
138+
Although we can see that `age_group[T.25-34]` and `gender[_NA]` didn't improve.
113139

114140

115141
## Understanding the model
@@ -166,7 +192,7 @@ Or calculate the design effect using:
166192

167193
```python
168194
adjusted.weights().design_effect()
169-
# 2.24937
195+
# 1.88
170196
```
171197

172198
## Analyzing the outcome
@@ -179,21 +205,37 @@ print(adjusted.outcomes().summary())
179205

180206
To get:
181207
```
182-
183208
1 outcomes: ['happiness']
184-
Mean outcomes:
185-
happiness
186-
source
187-
self 54.221388
188-
unadjusted 48.392784
209+
Mean outcomes (with 95% confidence intervals):
210+
source self target unadjusted self_ci target_ci unadjusted_ci
211+
happiness 53.295 56.278 48.559 (52.096, 54.495) (55.961, 56.595) (47.669, 49.449)
212+
```
213+
214+
Note: The `target` column and target-based response rates appear only when the target `Sample` has outcome data. If your target has no outcomes, you will only see `self` and `unadjusted` columns.
215+
216+
```
217+
218+
Weights impact on outcomes (t_test):
219+
mean_yw0 mean_yw1 mean_diff diff_ci_lower diff_ci_upper t_stat p_value n
220+
outcome
221+
happiness 48.559 53.295 4.736 1.312 8.161 2.714 0.007 1000.0
189222
190223
Response rates (relative to number of respondents in sample):
191224
happiness
192225
n 1000.0
193226
% 100.0
227+
Response rates (relative to notnull rows in the target):
228+
happiness
229+
n 1000.0
230+
% 10.0
231+
Response rates (in the target):
232+
happiness
233+
n 10000.0
234+
% 100.0
235+
194236
```
195237

196-
For example, we see that the estimated mean happiness according to our sample is 48 without any adjustment and 54 with adjustment. The following shows the distribution of happiness before and after applying the weights:
238+
For example, we see that the estimated mean happiness according to our sample is 48.6 without any adjustment and 53.3 with adjustment (compared to the target mean of 56.3). The following shows the distribution of happiness before and after applying the weights:
197239

198240
```python
199241
adjusted.outcomes().plot()

0 commit comments

Comments
 (0)