Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 8 additions & 50 deletions tutorials/balance_quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -419,19 +419,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution diagnostics (EMD/CVMD/KS)\n",
"## Distribution diagnostics (KLD/EMD/CVMD/KS)\n",
"\n",
"Balance also exposes distributional distance diagnostics for covariates.\n",
"Balance also exposes distribution diagnostics for covariates.\n",
"These look beyond mean differences and compare the **full distributions** of covariates in the weighted sample vs. the target.\n",
"\n",
"- **KLD (Kullback-Leibler divergence)** measures the relative entropy between two probability distributions (note: this is a divergence measure, not a symmetric distance).\n",
" (See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)\n",
"- **EMD (Earth Mover's Distance)** measures the minimum \"cost\" to transform one distribution into another.\n",
" (See: https://en.wikipedia.org/wiki/Earth_mover%27s_distance)\n",
"- **CVMD (Cram\u00e9r\u2013von Mises distance)** measures the integrated squared difference between the empirical CDFs.\n",
" (See: https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion)\n",
"- **KS (Kolmogorov\u2013Smirnov distance)** measures the maximum absolute difference between the empirical CDFs.\n",
" (See: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)\n",
"\n",
"These diagnostics complement **ASMD**, which only compares means. Use EMD/CVMD/KS when you want to check whether weighting aligns the *shape* of covariate distributions (not just their means).\n"
"Note: Distribution diagnostics operate on the raw covariates (with NA indicators), rather than the model matrix, so categorical variables stay intact.\n",
"\n",
"These diagnostics complement **ASMD**, which only compares means. Use these metrics when you want to check whether weighting aligns the *shape* of covariate distributions (not just their means).\n"
]
},
{
Expand All @@ -440,30 +444,12 @@
"metadata": {},
"outputs": [],
"source": [
"print(sample_with_target.covars().kld().T)\n",
"print(sample_with_target.covars().emd().T)\n",
"print(sample_with_target.covars().cvmd().T)\n",
"print(sample_with_target.covars().ks().T)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics for comparing distributions\n",
"\n",
"Distribution diagnostics operate on the raw covariates (with NA indicators), rather than the model matrix, so categorical variables stay intact.\n",
"We can use KLD (Kullback-Leibler divergence) to compare distributions:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(sample_with_target.covars().kld().T)"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -947,34 +933,6 @@
"print(adjusted_hgb_mm.summary())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compare ASMD (Absolute Standardized Mean Difference)\n",
"\n",
"Let's compare the covariate balance achieved by each method using ASMD tables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"\\n=== Default Method (Logistic Regression) ASMD ===\")\n",
"print(adjusted_default.covars().asmd().T)\n",
"\n",
"print(\"\\n=== HistGradientBoosting + Model Matrix ASMD ===\")\n",
"print(adjusted_hgb_mm.covars().asmd().T)\n",
"\n",
"if _sklearn_version >= (1, 4):\n",
" print(\"\\n=== HistGradientBoosting + Raw Covariates ASMD ===\")\n",
" print(adjusted_hgb.covars().asmd().T)\n",
"else:\n",
" print(\"\\nHistGradientBoosting + Raw Covariates ASMD skipped (requires scikit-learn >= 1.4)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down