Skip to content

Commit 2930cbd

Browse files
authored
Merge pull request #56 from lionel-/add-regression
Skip tests on CRAN
2 parents fb7bd6d + 8096e54 commit 2930cbd

File tree

8 files changed

+212
-24
lines changed

8 files changed

+212
-24
lines changed

NEWS.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,12 @@
22
# vdiffr 0.2.99.9000
33

44
This release of vdiffr features a major overhaul of the internals to
5-
make the package more robust and reliable across platforms:
5+
make the package more robust.
6+
7+
8+
## Cross-platform reliability
9+
10+
vdiffr now works reliably across platforms:
611

712
* svglite is now embedded in vdiffr to protect against updates of the
813
SVG generation engine.
@@ -20,6 +25,34 @@ Now that vdiffr has a stable engine, the next release will focus on
2025
improving the Shiny UI.
2126

2227

28+
## Regression testing versus Unit testing
29+
30+
Another important change is that figure mismatches are no longer
31+
reported as failures, except when the tests are run locally, on
32+
Travis, Appveyor, or any environment where the `Sys.getenv("CI")` or
33+
`Sys.getenv("NOT_CRAN")` variables are set. Because vdiffr is more of
34+
a monitoring than a unit testing tool, it shouldn't cause R CMD check
35+
failures on the CRAN machines.
36+
37+
Despite our efforts to make vdiffr robust and reliable across
38+
platforms, checking the appearance of a figure is still inherently
39+
fragile. It is similar to testing for errors by matching exact error
40+
messages: these messages are susceptible to change at any
41+
time. Similarly, the appearance of plots depends on a lot of upstream
42+
code, such as the way margins and spacing are computed. vdiffr uses a
43+
special ggplot2 theme that should change very rarely, but there are
44+
just too many upstream factors that could cause breakages. For this
45+
reason, figure mismatches are not necessarily representative of actual
46+
failures.
47+
48+
Visual testing is not an alternative to writing unit tests for the
49+
internal data transformations performed during the creation of your
50+
figure. It is more of a monitoring tool that allows you to quickly
51+
check how the appearance of your figures changes over time, and to
52+
manually assess whether changes reflect actual problems in your
53+
package.
54+
55+
2356
## Features
2457

2558
* vdiffr now advises user to run `manage_cases()` when a figure was

R/testthat-ui.R

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,19 @@
11
#' Does a figure look like its expected output?
22
#'
3-
#' If the test has never been validated yet, the test is skipped. If
4-
#' the test has previously been validated but \code{fig} does not look
5-
#' like its expected output, an error is issued. Use
6-
#' [validate_cases()] or [manage_cases()] to (re)validate
7-
#' the test.
3+
#' @description
84
#'
9-
#' `fig` can be a ggplot object, a recordedplot, a function to be
10-
#' called, or more generally any object with a `print` method.
5+
#' `expect_doppelganger()` takes a figure to check visually.
6+
#'
7+
#' * If the figure has yet to be validated, the test is skipped. Call
8+
#' [manage_cases()] to validate the new figure, so vdiffr knows what
9+
#' to compare against.
10+
#'
11+
#' * If the test has been validated, `fig` is compared to the
12+
#' validated figure. If the plot differs, a failure is issued
13+
#' (except on CRAN, see section on regression testing below).
14+
#'
15+
#' Either fix the problem, or call [manage_cases()] to validate the
16+
#' new figure appearance.
1117
#'
1218
#' @param title A brief description of what is being tested in the
1319
#' figure. For instance: "Points and lines overlap".
@@ -17,7 +23,12 @@
1723
#'
1824
#' The title is also used as file name for storing SVG (in a
1925
#' sanitzed form, with special characters converted to `"-"`).
20-
#' @param fig A figure to test.
26+
#' @param fig A figure to test. This can be a ggplot object, a
27+
#' recordedplot, or more generally any object with a `print` method.
28+
#'
29+
#' For plots that can't be represented as printable objects, you can
30+
#' pass a function. This function must construct the plot and print
31+
#' it.
2132
#' @param path The path where the test case should be stored, relative
2233
#' to the `tests/figs/` folder. If `NULL` (the default), the current
2334
#' testthat context is used to create a subfolder. Supply an empty
@@ -31,6 +42,31 @@
3142
#' in a deterministic way and write it to the target file. See
3243
#' [write_svg()] (the default) for an example.
3344
#'
45+
#' @section Regression testing versus Unit testing:
46+
#'
47+
#' Failures to match a validated appearance are only reported when the
48+
#' tests are run locally, on Travis, Appveyor, or any environment
49+
#' where the `Sys.getenv("CI")` or `Sys.getenv("NOT_CRAN")` variables
50+
#' are set. Because vdiffr is more of a monitoring than a unit testing
51+
#' tool, it shouldn't cause R CMD check failures on the CRAN machines.
52+
#'
53+
#' Checking the appearance of a figure is inherently fragile. It is
54+
#' similar to testing for errors by matching exact error messages:
55+
#' these messages are susceptible to change at any time. Similarly,
56+
#' the appearance of plots depends on a lot of upstream code, such as
57+
#' the way margins and spacing are computed. vdiffr uses a special
58+
#' ggplot2 theme that should change very rarely, but there are just
59+
#' too many upstream factors that could cause breakages. For this
60+
#' reason, figure mismatches are not necessarily representative of
61+
#' actual failures.
62+
#'
63+
#' Visual testing is not an alternative to writing unit tests for the
64+
#' internal data transformations performed during the creation of your
65+
#' figure. It is more of a monitoring tool that allows you to quickly
66+
#' check how the appearance of your figures changes over time, and to
67+
#' manually assess whether changes reflect actual problems in your
68+
#' package.
69+
#'
3470
#' @section Debugging:
3571
#'
3672
#' It is sometimes difficult to understand the cause of a failure.
@@ -167,6 +203,7 @@ new_expectation <- function(msg, case, type, vdiffr_type) {
167203
classes <- c(class(exp), vdiffr_type)
168204
structure(exp, class = classes, vdiffr_case = case)
169205
}
206+
170207
new_exp <- function(msg, case) {
171208
new_expectation(msg, case, "skip", "vdiffr_new")
172209
}
@@ -177,8 +214,10 @@ mismatch_exp <- function(msg, case) {
177214
if (is_vdiffr_stale()) {
178215
msg <- "The vdiffr engine is too old. Please update vdiffr and revalidate the figures."
179216
new_expectation(msg, case, "skip", "vdiffr_mismatch")
180-
} else {
217+
} else if (is_ci()) {
181218
new_expectation(msg, case, "failure", "vdiffr_mismatch")
219+
} else {
220+
new_expectation(msg, case, "skip", "vdiffr_mismatch")
182221
}
183222
}
184223

R/utils.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,7 @@ is_vdiffr_stale <- function() {
170170
hash_encode_url <- function(url){
171171
gsub("#", "%23", url)
172172
}
173+
174+
is_ci <- function() {
175+
nzchar(Sys.getenv("CI")) || nzchar(Sys.getenv("NOT_CRAN"))
176+
}

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ that each plot is correct.
4141

4242
1) Run `devtools::test()` to execute the tests as normal.
4343

44+
When a figure doesn't matched the saved version, vdiffr signals a failure when it is run interactively, or when it is run on Travis or Appveyor. Mismatches do not cause R CMD check to fail on CRAN machines. See the testing versus monitoring section below.
45+
4446

4547
### Adding expectations
4648

@@ -123,6 +125,23 @@ You can run the tests the usual way, for example with
123125
will be skipped. Failed tests will show as an error.
124126

125127

128+
### Testing versus Monitoring
129+
130+
When a figure doesn't match its saved version, it is only reported as a failure under these circumstances:
131+
132+
- When the `NOT_CRAN` environment is set. In particular, devtools sets this when running the tests interactively.
133+
134+
- On Travis, Appveyor, or any environment where the `Sys.getenv("CI")` is set.
135+
136+
Otherwise, the failure is ignored. The motivation for this is that vdiffr is a monitoring tool and shouldn't cause R CMD check failures on the CRAN machines.
137+
138+
Checking the appearance of a figure is inherently fragile. It is a bit like testing for errors by matching exact error messages. These messages are susceptible to change at any time. Similarly, the appearance of plots depends on a lot of upstream code, such as the way margins and spacing are computed. vdiffr uses a special ggplot2 theme that should change very rarely, but there are just too many upstream factors that could cause breakages. For this reason, figure mismatches are not necessarily representative of actual failures.
139+
140+
Visual testing is not an alternative to writing unit tests for the internal data transformations performed during the creation of your figure. It is more of a monitoring tool that allows you to quickly check how the appearance of your figures changes over time, and to manually assess whether changes reflect actual problems in your packages.
141+
142+
If you want vdiffr to fail on CRAN machines as well, just set the environment variable `CI` to `"true"` in a `setup-vdiffr.R` file in your testthat folder.
143+
144+
126145
### RStudio integration
127146

128147
An addin to launch `manage_cases()` is provided with vdiffr. Use the

man/expect_doppelganger.Rd

Lines changed: 44 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/mock.Rout.fail

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,43 @@ Failed doppelganger: myplot (../figs//myplot.svg)
4343
Mi41Mg==)' />
4444
<defs>
4545

46+
47+
Failed doppelganger: myplot (../figs//myplot.svg)
48+
49+
< before
50+
> after
51+
@@ 50,4 / 50,5 @@
52+
<rect x='641.72' y='401.98' width='20.80' height='118.58' style='stroke-wi
53+
dth: 1.07; stroke: none; stroke-linecap: butt; fill: #595959;' clip-path='
54+
url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wyMi41Mg==)' />
55+
<rect x='662.52' y='401.98' width='20.80' height='118.58' style='stroke-wi
56+
dth: 1.07; stroke: none; stroke-linecap: butt; fill: #595959;' clip-path='
57+
url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wyMi41Mg==)' />
58+
> <line x1='417.09' y1='544.27' x2='417.09' y2='22.52' style='stroke-width:
59+
: 1.07; stroke-linecap: butt;' clip-path='url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3w
60+
: yMi41Mg==)' />
61+
<rect x='28.09' y='22.52' width='686.43' height='521.75' style='stroke-wid
62+
th: 1.07; stroke: #333333;' clip-path='url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wy
63+
Mi41Mg==)' />
64+
<defs>
65+
66+
67+
Failed doppelganger: myplot (../figs//myplot.svg)
68+
69+
< before
70+
> after
71+
@@ 50,4 / 50,5 @@
72+
<rect x='641.72' y='401.98' width='20.80' height='118.58' style='stroke-wi
73+
dth: 1.07; stroke: none; stroke-linecap: butt; fill: #595959;' clip-path='
74+
url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wyMi41Mg==)' />
75+
<rect x='662.52' y='401.98' width='20.80' height='118.58' style='stroke-wi
76+
dth: 1.07; stroke: none; stroke-linecap: butt; fill: #595959;' clip-path='
77+
url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wyMi41Mg==)' />
78+
> <line x1='417.09' y1='544.27' x2='417.09' y2='22.52' style='stroke-width:
79+
: 1.07; stroke-linecap: butt;' clip-path='url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3w
80+
: yMi41Mg==)' />
81+
<rect x='28.09' y='22.52' width='686.43' height='521.75' style='stroke-wid
82+
th: 1.07; stroke: #333333;' clip-path='url(#cpMjguMDl8NzE0LjUyfDU0NC4yN3wy
83+
Mi41Mg==)' />
84+
<defs>
85+

tests/testthat/mock-pkg/tests/testthat/test-failed.R

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ skip_if_maintenance <- function() {
1212
}
1313
}
1414

15-
test_that("New plots work are collected", {
15+
test_that("mismatches are hard failures when NOT_CRAN is set", {
1616
skip_if_maintenance()
1717
expect_doppelganger("myplot", p1_fail, "")
1818
})
@@ -22,6 +22,18 @@ test_that("Duplicated expectations issue a warning", {
2222
expect_doppelganger("myplot", p1_fail, "")
2323
})
2424

25+
test_that("mismatches are hard failures when CI is set", {
26+
skip_if_maintenance()
27+
withr::local_envvar(c(NOT_CRAN = "", CI = "true"))
28+
expect_doppelganger("myplot", p1_fail, "")
29+
})
30+
31+
test_that("mismatches are skipped when NOT_CRAN is unset", {
32+
skip_if_maintenance()
33+
withr::local_envvar(c(NOT_CRAN = "", CI = ""))
34+
expect_doppelganger("myplot", p1_fail, "")
35+
})
36+
2537

2638
# Maintenance --------------------------------------------------------
2739

tests/testthat/test-expectations.R

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,18 @@
11

22
context("Expectations")
33

4-
test_that("Mismatches fail", {
5-
failed_result <- subset_results(test_results, "test-failed.R", "New plots work are collected")[[1]]
4+
test_that("Mismatches are skipped except on CI and interactively", {
5+
notcran_result <- subset_results(test_results, "test-failed.R", "mismatches are hard failures when NOT_CRAN is set")[[1]]
6+
expect_match(notcran_result$message, "Figures don't match: myplot.svg\n")
7+
expect_is(notcran_result, "expectation_failure")
8+
9+
failed_result <- subset_results(test_results, "test-failed.R", "mismatches are hard failures when CI is set")[[1]]
610
expect_match(failed_result$message, "Figures don't match: myplot.svg\n")
11+
expect_is(failed_result, "expectation_failure")
712

8-
class <- class(failed_result)[[1]]
9-
expect_equal(class, "expectation_failure")
13+
skipped_result <- subset_results(test_results, "test-failed.R", "mismatches are skipped when NOT_CRAN is unset")[[1]]
14+
expect_match(skipped_result$message, "Figures don't match: myplot.svg\n")
15+
expect_is(skipped_result, "expectation_skip")
1016
})
1117

1218
test_that("Duplicated expectations issue warning", {

0 commit comments

Comments
 (0)