Skip to content

JFYI: matplotlib image differ testsΒ #1

@jankatins

Description

@jankatins

This is mainly JFYI because it came up on twitter: matplotlib has a similar system in place to do unittesting on their images. It is also used in downstream packages like seaborn. The system is based on comparing raster images and compares the rasterized output of svg, tiff and ps backends to a baseline png which is included in the repo. rasterization is done with ghostscript. I suspect that the rasterize step is there because svgs can produce the same visual but have different internal representations (e.g. when plotting a point and a line, AFAIK the xml can contain point -> line and line -> point).

The workflow is:

  • write testcase with a name in a testfile
  • run once -> fails due to missing baseline images and produces a png image "result_images/testfile/name.png"
  • compare image with your expected image
  • If fine: copy the output to the baseline directory
  • run again -> baseline image is found and plot is compared by drawing the plot on three backends, saving the results (png+ps+svg), saterize svg+ps and comparing the rasterized image to the baseline image.

From my experience with this:

  • The tests should try very hard to make the available installed fonts the same on all test systems (e.g. bitstream vera or something, which can be expected to be available on dev machines and on travis/...; remove any fallbacks in the config; matplotlib actually has a font embedded in the package to have a default)
  • The outputs are not always completely the same due to different systems (e.g. different antialiasing strategies on linux/windows) -> matplotlib has a tolerance parameter for the comparison, but recently tried very hard to remove all non-zero values and was almost successful (but which got again worse when automatic windows tests were introduced).
  • mpl usually removes any text from a plot before it is drawn (a parameter to the comparison function), so different text rendering on axis labels on different systems is not the failure problem...
  • If tolerance is not zero, it's probably best to build plots which look ugly, like increasing the size of printed dots and such things, because small dots can be on totally different positions as expected but this isn't registered because of the tolerance...
  • To reproduce errors on travis/appveyor it's nice if the code spits out a directory which contains the images (+ baseline + diff + html with side-by-side placements of the images for visual inspection), so this can be uploaded (travis) or save as an artifact (appveyor)

A test looks like this:

@image_comparison(baseline_images=['log_scales'], remove_text=True)
def test_log_scales():
    ax = plt.subplot(122, yscale='log', xscale='symlog')

    ax.axvline(24.1)
    ax.axhline(24.1)

-> tests all three images formats (no extensions=['png]) and has a tolerance of 0 (no tol=x) and removes the text. baseline_images is a list because you can have multiple plots in a test (which is IMO not a nice feature...).

The main part is here: https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/testing/compare.py#L268 (mpl is license="BSD")

CC: @hrbrmstr because twitter... :-)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions