JFYI: matplotlib image differ tests

This is mainly JFYI because it came up on twitter: matplotlib has a similar system in place to do unittesting on their images. It is also used in downstream packages like seaborn. The system is based on comparing raster images and compares the rasterized output of svg, tiff and ps backends to a baseline png which is included in the repo. rasterization is done with ghostscript. I suspect that the rasterize step is there because svgs can produce the same visual but have different internal representations (e.g. when plotting a point and a line, AFAIK the xml can contain point -> line and line -> point). 

The workflow is:
- write testcase with a name in a testfile
- run once -> fails due to missing baseline images and produces a png image "result_images/testfile/name.png"
- compare image with your expected image
- If fine: copy the output to the baseline directory
- run again -> baseline image is found and plot is compared by drawing the plot on three backends, saving the results (png+ps+svg), saterize svg+ps and comparing the rasterized image to the baseline image.

From my experience with this:
- The tests should try very hard to make the available installed fonts the same on all test systems (e.g. bitstream vera or something, which can be expected to be available on dev machines and on travis/...; remove any fallbacks in the config; matplotlib actually has a font embedded in the package to have a default)
- The outputs are not always completely the same due to different systems (e.g. different antialiasing strategies on linux/windows) -> matplotlib has a tolerance parameter for the comparison, but recently tried very hard to remove all non-zero values and was almost successful (but which got again worse when automatic windows tests were introduced).
- mpl usually removes any text from a plot before it is drawn (a parameter to the comparison function), so different text rendering on axis labels on different systems is not the failure problem...
- If tolerance is not zero, it's probably best to build plots which look ugly, like increasing the size of printed dots and such things, because small dots can be on totally different positions as expected but this isn't registered because of the tolerance...
- To reproduce errors on travis/appveyor it's nice if the code spits out a directory which contains the images (+ baseline + diff + html with side-by-side placements of the images for visual inspection), so this can be uploaded (travis) or save as an artifact (appveyor)

A test looks like this:

```
@image_comparison(baseline_images=['log_scales'], remove_text=True)
def test_log_scales():
    ax = plt.subplot(122, yscale='log', xscale='symlog')

    ax.axvline(24.1)
    ax.axhline(24.1)
```

-> tests all three images formats (no `extensions=['png]`) and has a tolerance of 0 (no `tol=x`) and removes the text. `baseline_images` is a list because you can have multiple plots in a test (which is IMO not a nice feature...).

The main part is here: https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/testing/compare.py#L268 (mpl is `license="BSD"`)

CC: @hrbrmstr because twitter... :-)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JFYI: matplotlib image differ tests #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JFYI: matplotlib image differ tests #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions