Skip to content

Commit 1e4e9f5

Browse files
Copilotmeta-codesync[bot]
authored andcommitted
Simplify quickstart tutorial by removing redundant diagnostic sections (#328)
Summary: ## Simplify quickstart documentation - [x] Remove "Diagnostics for comparing distributions" section (cells 29-30) - [x] Update "Distribution diagnostics (EMD/CVMD/KS)" section title to include "KLD" (cell 27) - [x] Remove "Compare ASMD (Absolute Standardized Mean Difference)" section (cells 67-68) from "Comparing Adjustment Methods" - [x] Fix capitalization of KLD to match other diagnostic abbreviations (EMD, CVMD, KS) - [x] Add KLD description to distribution diagnostics section - [x] Add kld() code example to distribution diagnostics section - [x] Include note about raw covariates applying to all diagnostics - [x] Clarify KLD is a divergence measure (not symmetric distance) - [x] Remove trailing empty strings from cells - [x] Run code review and address all feedback - [x] Run security checks - [x] Reply to all review comments Successfully simplified the quickstart tutorial by consolidating distribution diagnostics and addressing all review feedback: ### Changes Made: 1. **Removed redundant sections** (4 cells removed): - "Diagnostics for comparing distributions" section (cells 29-30) - "Compare ASMD" section from "Comparing Adjustment Methods" (cells 67-68) 2. **Enhanced distribution diagnostics section**: - Added KLD to section header: "Distribution diagnostics (KLD/EMD/CVMD/KS)" - Added KLD description with clarification that it's a divergence measure (non-symmetric) - Added `print(sample_with_target.covars().kld().T)` code example - Moved note about raw covariates to separate paragraph applying to all diagnostics - Removed trailing empty strings for cleaner notebook formatting 3. **Code review feedback addressed**: - Separated note about raw covariates from KLD bullet to apply to all metrics - Clarified KLD is a divergence measure vs distance metric - Cleaned up formatting issues **Result**: Cleaner tutorial flow with 4 cells removed (75 → 71), all essential information preserved and better organized. <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>[BUG] Simplify quick start further</issue_title> > <issue_description>Fix this: > https://import-balance.org/docs/tutorials/quickstart/ > > Remove the section > 'Diagnostics for comparing distributions' > But add the the previous section kld: > "Distribution diagnostics (EMD/CVMD/KS)" > > And remove the section > "Compare ASMD (Absolute Standardized Mean Difference)" > From the section "Comparing Adjustment Methods" > </issue_description> > > ## Comments on the Issue (you are copilot in this section) > > <comments> > </comments> > </details> - Fixes #327 --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. Pull Request resolved: #328 Differential Revision: D92658641 Pulled By: talgalili fbshipit-source-id: 60e8e34358d1cbb940a7cb3af8d6b225a2e36226
1 parent 9810a9f commit 1e4e9f5

File tree

1 file changed

+8
-50
lines changed

1 file changed

+8
-50
lines changed

tutorials/balance_quickstart.ipynb

Lines changed: 8 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -419,19 +419,23 @@
419419
"cell_type": "markdown",
420420
"metadata": {},
421421
"source": [
422-
"## Distribution diagnostics (EMD/CVMD/KS)\n",
422+
"## Distribution diagnostics (KLD/EMD/CVMD/KS)\n",
423423
"\n",
424-
"Balance also exposes distributional distance diagnostics for covariates.\n",
424+
"Balance also exposes distribution diagnostics for covariates.\n",
425425
"These look beyond mean differences and compare the **full distributions** of covariates in the weighted sample vs. the target.\n",
426426
"\n",
427+
"- **KLD (Kullback-Leibler divergence)** measures the relative entropy between two probability distributions (note: this is a divergence measure, not a symmetric distance).\n",
428+
" (See: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)\n",
427429
"- **EMD (Earth Mover's Distance)** measures the minimum \"cost\" to transform one distribution into another.\n",
428430
" (See: https://en.wikipedia.org/wiki/Earth_mover%27s_distance)\n",
429431
"- **CVMD (Cram\u00e9r\u2013von Mises distance)** measures the integrated squared difference between the empirical CDFs.\n",
430432
" (See: https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion)\n",
431433
"- **KS (Kolmogorov\u2013Smirnov distance)** measures the maximum absolute difference between the empirical CDFs.\n",
432434
" (See: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)\n",
433435
"\n",
434-
"These diagnostics complement **ASMD**, which only compares means. Use EMD/CVMD/KS when you want to check whether weighting aligns the *shape* of covariate distributions (not just their means).\n"
436+
"Note: Distribution diagnostics operate on the raw covariates (with NA indicators), rather than the model matrix, so categorical variables stay intact.\n",
437+
"\n",
438+
"These diagnostics complement **ASMD**, which only compares means. Use these metrics when you want to check whether weighting aligns the *shape* of covariate distributions (not just their means).\n"
435439
]
436440
},
437441
{
@@ -440,30 +444,12 @@
440444
"metadata": {},
441445
"outputs": [],
442446
"source": [
447+
"print(sample_with_target.covars().kld().T)\n",
443448
"print(sample_with_target.covars().emd().T)\n",
444449
"print(sample_with_target.covars().cvmd().T)\n",
445450
"print(sample_with_target.covars().ks().T)\n"
446451
]
447452
},
448-
{
449-
"cell_type": "markdown",
450-
"metadata": {},
451-
"source": [
452-
"## Diagnostics for comparing distributions\n",
453-
"\n",
454-
"Distribution diagnostics operate on the raw covariates (with NA indicators), rather than the model matrix, so categorical variables stay intact.\n",
455-
"We can use KLD (Kullback-Leibler divergence) to compare distributions:\n"
456-
]
457-
},
458-
{
459-
"cell_type": "code",
460-
"execution_count": null,
461-
"metadata": {},
462-
"outputs": [],
463-
"source": [
464-
"print(sample_with_target.covars().kld().T)"
465-
]
466-
},
467453
{
468454
"cell_type": "markdown",
469455
"metadata": {},
@@ -947,34 +933,6 @@
947933
"print(adjusted_hgb_mm.summary())"
948934
]
949935
},
950-
{
951-
"cell_type": "markdown",
952-
"metadata": {},
953-
"source": [
954-
"## Compare ASMD (Absolute Standardized Mean Difference)\n",
955-
"\n",
956-
"Let's compare the covariate balance achieved by each method using ASMD tables:"
957-
]
958-
},
959-
{
960-
"cell_type": "code",
961-
"execution_count": null,
962-
"metadata": {},
963-
"outputs": [],
964-
"source": [
965-
"print(\"\\n=== Default Method (Logistic Regression) ASMD ===\")\n",
966-
"print(adjusted_default.covars().asmd().T)\n",
967-
"\n",
968-
"print(\"\\n=== HistGradientBoosting + Model Matrix ASMD ===\")\n",
969-
"print(adjusted_hgb_mm.covars().asmd().T)\n",
970-
"\n",
971-
"if _sklearn_version >= (1, 4):\n",
972-
" print(\"\\n=== HistGradientBoosting + Raw Covariates ASMD ===\")\n",
973-
" print(adjusted_hgb.covars().asmd().T)\n",
974-
"else:\n",
975-
" print(\"\\nHistGradientBoosting + Raw Covariates ASMD skipped (requires scikit-learn >= 1.4)\")"
976-
]
977-
},
978936
{
979937
"cell_type": "markdown",
980938
"metadata": {},

0 commit comments

Comments
 (0)