Add auto-detection for n_bins and bar_width in ASCII plots#346
Add auto-detection for n_bins and bar_width in ASCII plots#346talgalili wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
Summary: ASCII histogram/barplot functions previously required explicit `n_bins` and `bar_width` parameters, making it cumbersome to get reasonable defaults in different terminal environments. This diff adds automatic detection: - `_auto_n_bins()`: Uses Sturges' rule (ceil(log2(n)+1)) capped at unique values and 50 max - `_auto_bar_width()`: Computes available width from terminal size minus label/percentage columns Additional improvements: - New `separate_categories` parameter for barplots inserts blank lines between categories for readability - Non-zero proportions too small to render a full bar character now show a dot (`.`) to distinguish from truly zero - Simplified the quickstart tutorial by moving detailed ASCII plot examples to a dedicated tutorial Differential Revision: D94231216
|
@talgalili has exported this pull request. If you are a Meta employee, you can view the originating Diff in D94231216. |
There was a problem hiding this comment.
Pull request overview
This pull request enhances the ASCII plotting functionality in the balance package by adding automatic parameter detection and improving readability. The changes simplify the user experience by eliminating the need to manually specify n_bins and bar_width parameters while providing sensible defaults based on data characteristics and terminal dimensions.
Changes:
- Added auto-detection for
n_binsusing Sturges' rule (capped at unique values and max 50 bins) andbar_widthbased on terminal width - Introduced
separate_categoriesparameter (default True) to insert blank lines between categories in barplots for improved readability - Enhanced tiny proportion rendering with a dot (
.) indicator to distinguish non-zero but very small values from truly zero values - Reorganized tutorials by moving detailed ASCII plot examples from quickstart to a new dedicated
balance_ascii_plots.ipynbtutorial
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| balance/stats_and_plots/ascii_plots.py | Added _auto_n_bins() and _auto_bar_width() helpers; updated function signatures to accept Optional[int] for auto-detection; added dot indicator logic; implemented separate_categories spacing |
| tests/test_ascii_plots.py | Added comprehensive test coverage for new helper functions and auto-detection; updated test expectations to include blank lines between categories; added test for dot indicator |
| tutorials/balance_quickstart.ipynb | Simplified by removing detailed ASCII plot examples, added reference to new dedicated tutorial |
| tutorials/balance_ascii_plots.ipynb | New dedicated tutorial covering all ASCII plot functions with examples of auto-detection and new features |
Comments suppressed due to low confidence (1)
balance/stats_and_plots/ascii_plots.py:49
- The
n_datasetsparameter is unused in this function. Consider removing it or documenting why it's present (perhaps for future use or consistency with other functions). Currently it's passed but never referenced in the calculation.
def _auto_bar_width(label_width: int, n_datasets: int) -> int:
"""Pick bar_width to fit within terminal width."""
import shutil
term_width = shutil.get_terminal_size((80, 24)).columns
# Each line: label_width + " | " (3) + bar + " (XX.X%)" (9)
available = term_width - label_width - 3 - 9
return max(10, available)
| if bar_width is None: | ||
| import shutil | ||
|
|
||
| term_width = shutil.get_terminal_size((80, 24)).columns | ||
| n_cols = len(legend_names) | ||
| # Each column needs: bar_width + pct string (~6) + spacing (3) | ||
| available = term_width - range_width - 4 # " | " separator | ||
| per_col = max(10, (available - (n_cols - 1) * 3) // n_cols - 6) | ||
| bar_width = per_col |
There was a problem hiding this comment.
The auto-detection logic for bar_width in ascii_comparative_hist is duplicated inline instead of using the _auto_bar_width helper. This creates maintenance burden and inconsistency. Consider refactoring to use a shared helper, or if the comparative histogram needs different logic due to its columnar layout, document why in a comment.
| @@ -147,6 +177,8 @@ def ascii_plot_bar( | |||
| bar_width: Maximum character width for bars. Defaults to 40. | |||
There was a problem hiding this comment.
The docstring incorrectly states that bar_width defaults to 40. The actual default is now None, which triggers automatic detection based on terminal width. Update the docstring to reflect this change, e.g., "Defaults to None, which auto-detects based on terminal width."
| bar_width: Maximum character width for bars. Defaults to 40. | |
| bar_width: Maximum character width for bars. Defaults to None, which | |
| auto-detects an appropriate width based on the terminal size. |
|
This pull request has been merged in bb86743. |
Summary:
ASCII histogram/barplot functions previously required explicit
n_binsandbar_widthparameters,making it cumbersome to get reasonable defaults in different terminal environments.
This diff adds automatic detection:
_auto_n_bins(): Uses Sturges' rule (ceil(log2(n)+1)) capped at unique values and 50 max_auto_bar_width(): Computes available width from terminal size minus label/percentage columnsAdditional improvements:
separate_categoriesparameter for barplots inserts blank lines between categories for readability.) to distinguish from truly zeroDifferential Revision: D94231216