-
Notifications
You must be signed in to change notification settings - Fork 39
Add blog post on histograms #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
koren007
wants to merge
1
commit into
scientific-python:main
Choose a base branch
from
koren007:histogram-blog
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
159 changes: 159 additions & 0 deletions
159
content/posts/matplotlib/matplotlib-histograms/index.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| --- | ||
| title: "Improving Matplotlib histogram readability" | ||
| description: "Suggestions for improving Matplob histograms by automatically aligning bins and tick positions to well-readable numbers." | ||
| date: 2026-01-04 | ||
| tags: ["matplotlib", "histogram", "readability"] | ||
| displayInList: true | ||
| author: ["Gerbrand Koren"] | ||
| draft: false | ||
| resources: | ||
| - name: featuredImage | ||
| src: "header.png" | ||
| params: | ||
| description: "Matplotlib histograms" | ||
| showOnTop: false | ||
| --- | ||
|
|
||
| ## Back to the basics: the Matplotlib histogram | ||
|
|
||
| We are all excited about exploring and developing new fancy plotting tools, but I want to take some time to revisit one of the basic plotting functions: Matplotlib's histogram [`hist()`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html), which relies on the NumPy [`histogram()` function](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). Histograms are convenient to quickly inspect a dataset and get a feeling for the distribution of values that they contain. | ||
|
|
||
| I am often not satisfied with the default histograms produced by Matplotlib, requiring me to do repetitive manual adjustments. Here I will show how the histograms can be enhanced while avoiding time-consuming manual tweaking. I hope that these suggestions will be useful to others and ultimately that these will be adopted as the default plotting style for histograms in newer versions of Matplotlib. | ||
|
|
||
| ## Default histogram plotting | ||
|
|
||
| To illustrate the Matplotlib histogram I will be using a dataset from an [online tutorial](https://www.geeksforgeeks.org/python/matplotlib-pyplot-hist-in-python/). Here are the lines I used for importing Matplotlib’s plotting library, defining a sample dataset, and plotting a histogram with the default settings. | ||
|
|
||
| ```python | ||
| # Import Matplotlib plotting module | ||
| import matplotlib.pyplot as plt | ||
|
|
||
| # Define dataset (taken from online tutorial) | ||
| data = [32, 96, 45, 67, 76, 28, 79, 62, 43, | ||
| 81, 70, 61, 95, 44, 60, 69, 71, 23, | ||
| 69, 54, 76, 67, 82, 97, 26, 34, 18, | ||
| 16, 59, 88, 29, 30, 66, 23, 65, 72, | ||
| 20, 78, 49, 73, 62, 87, 37, 68, 81, | ||
| 80, 77, 92, 81, 52, 43, 68, 71, 86] | ||
|
|
||
| # Plot default histogram | ||
| plt.figure() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The OOP approach is recommended from Matplotlib's own documentation. We should promote that here too. |
||
| plt.hist(data) | ||
| plt.xlabel('Data sample') | ||
| plt.ylabel('Counts') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| This figure gives an indication on the range of values that are present in the dataset (ranging from somewhere below 20 to somewhere above 90, and with a higher concentration around 70). I will now make some small adjustments to improve the readability of the figure. | ||
|
|
||
| ## Including black edge lines | ||
|
|
||
| I think readability can be improved already by simply including a black edge color (which should be the default in my opinion). | ||
|
|
||
| ```python | ||
| # Plot histogram with black edge lines | ||
| plt.figure() | ||
| plt.hist(data, edgecolor='k') | ||
| plt.xlabel('Data sample') | ||
| plt.ylabel('Counts') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| This helps me to more easily see where the edges of the bins are and thus to interpret the data. | ||
|
|
||
| ## Aligning tick marks and histogram bins | ||
|
|
||
| The need to visually interpolate the bins relative to the tick marks can be avoided by setting the tick marks to the locations of the bins. | ||
|
|
||
| ```python | ||
| # Plot histogram with aligned bins and xticks | ||
| plt.figure() | ||
| n, bins, patches = plt.hist(data, edgecolor='k') | ||
| plt.xticks(bins) | ||
| plt.xlabel('Data sample') | ||
| plt.ylabel('Counts') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| This prevents the need to interpolate the values at the bin edges, but the numbers on the axis are not very pleasant to read. It is possible to set the bin edges (and subsequently the ticks on the x-axis) manually to round numbers by using the `bins` argument in `hist()`, but I would like to avoid manual tweaking as much as possible. | ||
|
|
||
| ## Automatically aligning bins with well-readable ticks | ||
|
|
||
| The histogram can be further improved by first selecting bins with edges placed at well-readable numbers. Here we can benefit from the [`ticker` module](https://matplotlib.org/stable/api/ticker_api.html) in Matplotlib. Since this function is also used to determine the ticks, the ticks and bins will always be aligned if we do not further interfere with the ticks on the x-axis. | ||
|
|
||
| ```python | ||
| # Import Matplotlib’s AutoLocator | ||
| from matplotlib.ticker import AutoLocator | ||
|
|
||
| # Determining appropriate bins based on the range of values | ||
| locator = AutoLocator() | ||
| bins = locator.tick_values(min(data), max(data)) | ||
|
|
||
| # Plotting histogram with well-readable xticks aligned to bins | ||
| plt.figure() | ||
| plt.hist(data, bins=bins, edgecolor='k') | ||
| plt.xlabel('Data sample') | ||
| plt.ylabel('Counts') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| Here the labels on the x-axis are not present at each bin edge, but because the values are so simple, it is still easy to read. The ticks on the x-axis are automatically set based on Matplotlib’s `locator` module but some of the labels are omitted during the rendering of the figure. Of course, it is possible to force Matplotlib to include all the labels. | ||
|
|
||
| ```python | ||
| # Plotting histogram with well-readable xticks aligned to each bin | ||
| plt.figure() | ||
| plt.hist(data, bins=bins, edgecolor='k') | ||
| plt.xticks(bins) | ||
| plt.xlabel('Data sample') | ||
| plt.ylabel('Counts') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| In this case, I think that this looks a bit better and that the values on the x-axis are not too cluttered. | ||
|
|
||
| ## Changing the histogram orientation | ||
|
|
||
| Most of the histograms I have seen have a data range on the x-axis and counts on the y-axis, but of course it is also possible to create a histogram where this is reversed. The aforementioned tweaks are easily done. | ||
|
|
||
| ```python | ||
| # Plotting horizontal histogram with well-readable yticks aligned to bins | ||
| plt.figure() | ||
| plt.hist(data, bins=bins, orientation='horizontal', edgecolor='k') | ||
| plt.xlabel('Counts') | ||
| plt.ylabel('Data sample') | ||
| plt.show() | ||
| ``` | ||
|
|
||
|  | ||
|
|
||
| Again, some of the ticks on the data-axis have been omitted and we can include those if we want, but this time using `plt.yticks(bins)`. | ||
|
|
||
| ## How could this be implemented? | ||
|
|
||
| It is easy to use the few lines of code presented here, but I also wondered how this could be implemented in Matplotlib. To answer this, it is helpful to know how binning currently works. The `bins` argument of the `hist()` function controls the binning behavior, which can be a string, a set of bin edges, or the desired number of bins. Whatever the user selects is passed on to the NumPy `histogram()` function. | ||
|
|
||
| We could introduce an additional keyword (e.g. `'aligned'`) for the `bins` argument that would trigger a call to Matplotlib’s `AutoLocator()` behind the scenes. This could be done e.g. by adding the following lines in the [`_axes.py` file](https://github.com/matplotlib/matplotlib/blob/main/lib/matplotlib/axes/_axes.py) (before the call to NumPy’s `histogram()` function!) | ||
|
|
||
| ```python | ||
| if bins == 'aligned': | ||
| locator = mticker.AutoLocator() | ||
| bins = locator.tick_values(np.nanmin(x), np.nanmax(x)) | ||
| ``` | ||
|
|
||
| Having this keyword to simply align the bins and ticks at well-readable values would simplify things for users, avoiding the need to import `AutoLocator()` that is probably unfamiliar to many Matplotlib users. Setting this as the default behavior (and the black edge color) would be a next step :). | ||
|
|
||
| ## In conclusion | ||
|
|
||
| Histograms may not be the most exciting plots, but are nevertheless very useful. Also my proposed changes are perhaps a bit ‘boring’, but I believe that having good basic plotting can help us save time, which we then can use for more exciting visualizations :). | ||
|
|
||
| During my PhD research I have been using various Python libraries, including Matplotlib for all my visualizations. I continue using Python in my research and have also integrated it in many of the courses that I am now teaching. I have greatly benefited from this open source software and hope to contribute a little bit to future improvements of Matplotlib. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the licensing on this website. Can you confirm?