You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update evaluation_of_results.md with current balance package outputs (#313)
Summary:
## Plan to update evaluation_of_results.md based on quickstart tutorial
- [x] Update the print(adjusted) output to match current format (includes adjustment details section)
- [x] Update the summary() output to match current format (structured sections, includes KLD metrics, outcome weighted means, and confidence intervals)
- [x] Update the covars().mean().T output to match current column values
- [x] Update the covars().asmd().T output to match current values
- [x] Update the outcomes().summary() output to match current format (includes confidence intervals, weights impact, and more detailed response rates)
- [x] Update the design_effect() output value
- [x] Review all changes to ensure minimal modifications
- [x] Address PR review feedback:
- Fixed typo: `adjust` → `adjusted` in outcomes example
- Clarified that `.mean()` shows covariate means, not ASMD
- Replaced tab characters with spaces for consistent formatting
- Added note about when target outcome columns appear in output
- Fixed grammar: "didn't get improved" → "didn't improve"
<details>
<summary>Original prompt</summary>
>
> ----
>
> *This section details on the original issue you should resolve*
>
> <issue_title>[FEATURE] Update website/docs/docs/general_framework/evaluation_of_results.md</issue_title>
> <issue_description>Update text in:
> https://github.com/facebookresearch/balance/blob/main/website/docs/docs/general_framework/evaluation_of_results.md
> Based on updated output from here:
> https://import-balance.org/docs/tutorials/quickstart/</issue_description>
>
> ## Comments on the Issue (you are copilot in this section)
>
> <comments>
> </comments>
>
</details>
- Fixes#312
---
✨ Let Copilot coding agent [set things up for you](https://github.com/facebookresearch/balance/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.
Pull Request resolved: #313
Reviewed By: omriharosh
Differential Revision: D92278385
Pulled By: talgalili
fbshipit-source-id: fd396ce4b6291ba4e830f487226a6e60109a3024
Copy file name to clipboardExpand all lines: website/docs/docs/general_framework/evaluation_of_results.md
+86-44Lines changed: 86 additions & 44 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,19 +24,28 @@ print(adjusted)
24
24
Output:
25
25
26
26
```
27
-
Adjusted balance Sample object with target set using ipw
28
-
1000 observations x 3 variables: gender,age_group,income
29
-
id_column: id, weight_column: weight,
30
-
outcome_columns: happiness
31
27
32
-
target:
28
+
Adjusted balance Sample object with target set using ipw
29
+
1000 observations x 3 variables: gender,age_group,income
30
+
id_column: id, weight_column: weight,
31
+
outcome_columns: happiness
33
32
34
-
balance Sample object
35
-
10000 observations x 3 variables: gender,age_group,income
36
-
id_column: id, weight_column: weight,
37
-
outcome_columns: None
33
+
adjustment details:
34
+
method: ipw
35
+
weight trimming mean ratio: 20
36
+
design effect (Deff): 1.880
37
+
effective sample size proportion (ESSP): 0.532
38
+
effective sample size (ESS): 531.9
39
+
40
+
target:
41
+
42
+
balance Sample object
43
+
10000 observations x 3 variables: gender,age_group,income
44
+
id_column: id, weight_column: weight,
45
+
outcome_columns: happiness
46
+
47
+
3 common variables: gender,age_group,income
38
48
39
-
3 common variables: income,age_group,gender
40
49
```
41
50
42
51
@@ -47,17 +56,34 @@ print(adjusted.summary())
47
56
```
48
57
49
58
This will return several results:
50
-
- Covariate mean ASMD improvement: ASMD is "Absolute Standardized Mean Difference". For continuous variables, this measure is the same as taking the absolute value of [Cohen's d statistic](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d) (also related to [SSMD](https://en.wikipedia.org/wiki/Strictly_standardized_mean_difference)), when using the (weighted) standard deviation of the target population. For categorical variables it uses [one-hot encoding](https://en.wikipedia.org/wiki/One-hot).
- Covariate mean Adjusted Standardized Mean Deviation (ASMD) versus Unadjusted covariate mean ASMD
53
-
- Model proportion deviance explained (if inverse propensity weighting method was used)
59
+
- Adjustment details: method used and weight trimming parameters
60
+
- Covariate diagnostics: ASMD is "Absolute Standardized Mean Difference". For continuous variables, this measure is the same as taking the absolute value of [Cohen's d statistic](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d) (also related to [SSMD](https://en.wikipedia.org/wiki/Strictly_standardized_mean_difference)), when using the (weighted) standard deviation of the target population. For categorical variables it uses [one-hot encoding](https://en.wikipedia.org/wiki/One-hot). Also includes KLD (Kullback-Leibler divergence) metrics.
- Outcome weighted means: means for each outcome variable across self (adjusted), target, and unadjusted samples
63
+
- Model performance: Model proportion deviance explained (if inverse propensity weighting method was used)
54
64
55
65
Output:
56
66
57
67
```
58
-
Covar ASMD reduction: 62.3%, design effect: 2.249
59
-
Covar ASMD (7 variables): 0.335 -> 0.126
60
-
Model performance: Model proportion deviance explained: 0.174
68
+
Adjustment details:
69
+
method: ipw
70
+
weight trimming mean ratio: 20
71
+
Covariate diagnostics:
72
+
Covar ASMD reduction: 63.4%
73
+
Covar ASMD (7 variables): 0.327 -> 0.120
74
+
Covar mean KLD reduction: 95.3%
75
+
Covar mean KLD (3 variables): 0.071 -> 0.003
76
+
Weight diagnostics:
77
+
design effect (Deff): 1.880
78
+
effective sample size proportion (ESSP): 0.532
79
+
effective sample size (ESS): 531.9
80
+
Outcome weighted means:
81
+
happiness
82
+
source
83
+
self 53.295
84
+
target 56.278
85
+
unadjusted 48.559
86
+
Model performance: Model proportion deviance explained: 0.173
61
87
```
62
88
63
89
Note that although we had 3 variables in our original data (age_group, gender, income), the asmd counts each level of the categorical variables as separate variable, and thus it considered 7 variables for the covar ASMD improvement.
@@ -74,18 +100,18 @@ adjusted.covars().mean().T
74
100
To get:
75
101
76
102
```
77
-
source self target unadjusted
78
-
_is_na_gender[T.True] 0.103449 0.089800 0.08800
79
-
age_group[T.25-34] 0.279072 0.297400 0.30900
80
-
age_group[T.35-44] 0.290137 0.299200 0.17200
81
-
age_group[T.45+] 0.150714 0.206300 0.04600
82
-
gender[Female] 0.410664 0.455100 0.26800
83
-
gender[Male] 0.485887 0.455100 0.64400
84
-
gender[_NA] 0.103449 0.089800 0.08800
85
-
income 9.519935 12.737608 5.99102
103
+
source self target unadjusted
104
+
_is_na_gender[T.True] 0.086776 0.089800 0.088000
105
+
age_group[T.25-34] 0.307355 0.297400 0.300000
106
+
age_group[T.35-44] 0.273609 0.299200 0.156000
107
+
age_group[T.45+] 0.137581 0.206300 0.053000
108
+
gender[Female] 0.406337 0.455100 0.268000
109
+
gender[Male] 0.506887 0.455100 0.644000
110
+
gender[_NA] 0.086776 0.089800 0.088000
111
+
income 10.060068 12.737608 6.297302
86
112
```
87
113
88
-
The`self` is the adjusted ASMD, while `unadjusted` is the unadjusted ASMD.
114
+
Here,`self` is the adjusted (weighted) covariate mean, `target` is the target mean, and `unadjusted` is the unadjusted sample mean.
89
115
90
116
91
117
And `.asmd()` to get ASMD:
@@ -98,18 +124,18 @@ To get:
98
124
99
125
```
100
126
source self unadjusted unadjusted - self
101
-
age_group[T.25-34] 0.040094 0.025375 -0.014719
102
-
age_group[T.35-44] 0.019792 0.277771 0.257980
103
-
age_group[T.45+] 0.137361 0.396127 0.258765
104
-
gender[Female] 0.089228 0.375699 0.286472
105
-
gender[Male] 0.061820 0.379314 0.317494
106
-
gender[_NA] 0.047739 0.006296 -0.041444
107
-
income 0.246918 0.517721 0.270802
108
-
mean(asmd) 0.126310 0.334860 0.208551
127
+
age_group[T.25-34] 0.021777 0.005688 -0.016090
128
+
age_group[T.35-44] 0.055884 0.312711 0.256827
129
+
age_group[T.45+] 0.169816 0.378828 0.209013
130
+
gender[Female] 0.097916 0.375699 0.277783
131
+
gender[Male] 0.103989 0.379314 0.275324
132
+
gender[_NA] 0.010578 0.006296 -0.004282
133
+
income 0.205469 0.494217 0.288748
134
+
mean(asmd) 0.119597 0.326799 0.207202
109
135
```
110
136
111
137
We can see that on average the ASMD improved from 0.33 to 0.12 thanks to the weights. We got improvements in income, gender, and age_group.
112
-
Although we can see that `age_group[T.25-34]` didn't get improved.
138
+
Although we can see that `age_group[T.25-34]`and `gender[_NA]`didn't improve.
113
139
114
140
115
141
## Understanding the model
@@ -166,7 +192,7 @@ Or calculate the design effect using:
Note: The `target` column and target-based response rates appear only when the target `Sample` has outcome data. If your target has no outcomes, you will only see `self` and `unadjusted` columns.
215
+
216
+
```
217
+
218
+
Weights impact on outcomes (t_test):
219
+
mean_yw0 mean_yw1 mean_diff diff_ci_lower diff_ci_upper t_stat p_value n
Response rates (relative to number of respondents in sample):
191
224
happiness
192
225
n 1000.0
193
226
% 100.0
227
+
Response rates (relative to notnull rows in the target):
228
+
happiness
229
+
n 1000.0
230
+
% 10.0
231
+
Response rates (in the target):
232
+
happiness
233
+
n 10000.0
234
+
% 100.0
235
+
194
236
```
195
237
196
-
For example, we see that the estimated mean happiness according to our sample is 48 without any adjustment and 54 with adjustment. The following shows the distribution of happiness before and after applying the weights:
238
+
For example, we see that the estimated mean happiness according to our sample is 48.6 without any adjustment and 53.3 with adjustment (compared to the target mean of 56.3). The following shows the distribution of happiness before and after applying the weights:
0 commit comments