diff --git a/.Rbuildignore b/.Rbuildignore index 2953c5409..6472d287a 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -25,3 +25,4 @@ ^\.git-blame-ignore-rev$ ^CLAUDE\.md$ ^\.claude$ +^vignettes/articles$ diff --git a/CLAUDE.md b/CLAUDE.md index 0739e34b8..32006a401 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,7 +6,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co testthat is R's most popular unit testing framework, used by thousands of CRAN packages. It provides functions to make testing R code as fun and addictive as possible, with clear expectations, visual progress indicators, and seamless integration with R package development workflows. -## Key Development Commands +## Key development commands General advice: * When running R from the console, always run it with `--quiet --vanilla` @@ -25,10 +25,11 @@ General advice: ### Documentation -- Always run `devtools::document()` after changing any roxygen2 docs. +- Run `devtools::document()` after changing any roxygen2 docs. - Every user facing function should be exported and have roxygen2 documentation. - Whenever you add a new documentation file, make sure to also add the topic name to `_pkgdown.yml`. - Run `pkgdown::check_pkgdown()` to check that all topics are included in the reference index. +- Use sentence case for all headings ## Core Architecture diff --git a/NEWS.md b/NEWS.md index e0b284f00..a9ddc93db 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,7 @@ # testthat (development version) +* New `vignette("mocking")` explains mocking in detail (#1265). +* New `vignette("challenging-functions")` provides an index to other documentation organised by testing challenges (#1265). * When running a test interactively, testthat now reports the number of succeses. The results should also be more useful if you are using nested tests. * The hints generated by `expect_snapshot()` and `expect_snapshot_file()` now include the path to the package, if its not in the current working directory (#1577). * `expect_snapshot_file()` now clearly errors if the `path` doesnt exist (#2191). diff --git a/R/expect-named.R b/R/expect-named.R index 4f8cb10c6..4a71ca121 100644 --- a/R/expect-named.R +++ b/R/expect-named.R @@ -35,7 +35,6 @@ expect_named <- function( check_bool(ignore.order) check_bool(ignore.case) - act <- quasi_label(enquo(object), label) if (missing(expected)) { diff --git a/_pkgdown.yml b/_pkgdown.yml index b301e5eb1..bb17b1520 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -83,6 +83,24 @@ reference: - ends_with("Reporter") - -Reporter +articles: +- title: Setup and configuration + navbar: Setup + contents: + - third-edition + - parallel + - special-files + - custom-expectation + +- title: Testing techniques + navbar: Techniques + contents: + - challenging-tests + - mocking + - skipping + - snapshotting + - test-fixtures + news: releases: - text: "Version 3.2.0" diff --git a/vignettes/challenging-tests.Rmd b/vignettes/challenging-tests.Rmd new file mode 100644 index 000000000..dd3e7c517 --- /dev/null +++ b/vignettes/challenging-tests.Rmd @@ -0,0 +1,169 @@ +--- +title: "Testing challenging functions" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Testing challenging functions} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r} +#| include: false +library(testthat) +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") + +# Pretend we're snapshotting +snapper <- local_snapshotter(fail_on_new = FALSE) +snapper$start_file("snapshotting.Rmd", "test") + +# Pretend we're testing testthat so we can use mocking +Sys.setenv(TESTTHAT_PKG = "testthat") +``` + +This vignette is a quick reference guide for testing challenging functions. It's organized by problem type rather than technique, so you can quickly skim the whole vignette, spot the problem you're facing, and then learn more about useful tools for solving it. In it, you'll learn how to overcome the following challenges: + +* Functions with implicit inputs, like options and environment variables. +* Random number generators. +* Tests that can't be run in some environments. +* Testing web APIs. +* Testing graphical output. +* User interaction. +* User-facing text. +* Repeated code. + +## Options and environment variables + +If your function depends on options or environment variables, first try refactoring the function to make the [inputs explicit](https://design.tidyverse.org/inputs-explicit.html). If that's not possible, use functions like `withr::local_options()` or `withr::local_envvar()` to temporarily change options and environment values within a test. Learn more in `vignette("test-fixtures")`. + + + +## Random numbers + +What happens if you want to test a function that relies on randomness in some way? If you're writing a random number generator, you probably want to generate a large quantity of random numbers and then apply some statistical test. But what if your function just happens to use a little bit of pre-existing randomness? How do you make your tests repeatable and reproducible? Under the hood, random number generators generate different numbers because they update a special `.Random.seed` variable stored in the global environment. You can temporarily set this seed to a known value to make your random numbers reproducible with `withr::local_seed()`, making random numbers a special case of test fixtures (`vignette("test-fixtures")`). + +Here's a simple example showing how you might test the basic operation of a function that rolls a die: + +```{r} +#| label: random-local-seed +dice <- function() { + sample(6, 1) +} + +test_that("dice returns different numbers", { + withr::local_seed(1234) + + expect_equal(dice(), 4) + expect_equal(dice(), 2) + expect_equal(dice(), 6) +}) +``` + +Alternatively, you might want to mock (`vignette("mocking")`) the function to eliminate randomness. + +```{r} +#| label: random-mock + +roll_three <- function() { + sum(dice(), dice(), dice()) +} + +test_that("three dice adds values of individual calls", { + local_mocked_bindings(dice = mock_output_sequence(1, 2, 3)) + expect_equal(roll_three(), 6) +}) +``` + +When should you set the seed and when should you use mocking? As a general rule of thumb, set the seed when you want to test the actual random behavior, and use mocking when you want to test the logic that uses the random results. + +## Some tests can't be run in some circumstances + +You can skip a test without it passing or failing if you can't or don't want to run it (e.g., it's OS dependent, it only works interactively, or it shouldn't be tested on CRAN). Learn more in `vignette("skipping")`. + +## HTTP requests + +If you're trying to test functions that rely on HTTP requests, we recommend using {vcr} or {httptest2}. These packages both allow you to interactively record HTTP responses and then later replay them in tests. This is a specialized type of mocking (`vignette("mocking")`) that works with {httr} and {httr2} to isolates your tests from failures in the underlying API. + +If your package is going to CRAN, you **must** either use one of these packages or use `skip_on_cran()` for all internet-facing tests. Otherwise, you are at high risk of failing `R CMD check` if the underlying API is temporarily down. This sort of failure causes extra work for the CRAN maintainers and extra hassle for you. + +## Graphics + +The only type of testing you can use for graphics is snapshot testing (`vignette("snapshotting")`) via `expect_snapshot_file()`. Graphical snapshot testing is surprisingly challenging because you need pixel-perfect rendering across multiple versions of multiple operating systems, and this is hard, mostly due to imperceptble differences in font rendering. Fortunately we've needed to overcome these challenges in order to test ggplot2, and you can benefit from our experience by using {vdiffr} when testing graphical output. + +## User interaction + +If you're testing a function that relies on user feedback (e.g. from `readline()`, `utils::menu()`, or `utils::askYesNo()`), you can use mocking (`vignette("mocking")`) to return fixed values within the test. For example, imagine that you've written the following function that asks the user if they want to continue: + +```{r} +#| label: continue + +continue <- function(prompt) { + cat(prompt, "\n", sep = "") + + repeat { + val <- readline("Do you want to continue? (y/n) ") + if (val %in% c("y", "n")) { + return(val == "y") + } + cat("! You must enter y or n\n") + } +} + +readline <- NULL +``` + +You could test its behavior by mocking `readline()` and using a snapshot test: + +```{r} +#| label: mock-readline + +test_that("user must respond y or n", { + mock_readline <- local({ + i <- 0 + function(prompt) { + i <<- i + 1 + cat(prompt) + val <- if (i == 1) "x" else "y" + cat(val, "\n", sep = "") + val + } + }) + + local_mocked_bindings(readline = mock_readline) + expect_snapshot(val <- continue("This is dangerous")) + expect_true(val) +}) +``` + +If you were testing the behavior of some function that uses `continue()`, you might want to mock `continue()` instead of `readline()`. For example, the function below requires user confirmation before overwriting an existing file. In order to focus our tests on the behavior of just this function, we mock `continue()` to return either `TRUE` or `FALSE` without any user messaging. + +```{r} +#| label: mock-continue + +save_file <- function(path, data) { + if (file.exists(path)) { + if (!continue("`path` already exists")) { + stop("Failed to continue") + } + } + writeLines(data, path) +} + +test_that("save_file() requires confirmation to overwrite file", { + path <- withr::local_tempfile(lines = letters) + + local_mocked_bindings(continue = function(...) TRUE) + save_file(path, "a") + expect_equal(readLines(path), "a") + + local_mocked_bindings(continue = function(...) FALSE) + expect_snapshot(save_file(path, "a"), error = TRUE) +}) +``` + +## User-facing text + +Errors, warnings, and other user-facing text should be tested to ensure they're both actionable and consistent across the package. Obviously, it's not possible to test this automatically, but you can use snapshots (`vignette("snapshotting")`) to ensure that user-facing messages are clearly shown in PRs and easily reviewed by another human. + +## Repeated code + +If you find yourself repeating the same set of expectations again and again across your test suite, it may be a sign that you should design your own expectation. Learn how in `vignette("custom-expectations")`. diff --git a/vignettes/custom-expectation.Rmd b/vignettes/custom-expectation.Rmd index 0b50083fe..c6802a0de 100644 --- a/vignettes/custom-expectation.Rmd +++ b/vignettes/custom-expectation.Rmd @@ -7,7 +7,8 @@ vignette: > %\VignetteEncoding{UTF-8} --- -```{r setup, include = FALSE} +```{r setup} +#| include: false library(testthat) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") @@ -16,7 +17,9 @@ snapper <- local_snapshotter(fail_on_new = FALSE) snapper$start_file("snapshotting.Rmd", "test") ``` -This vignette shows you how to write your own expectations. You can use them within your package by putting them in a helper file, or share them with others by exporting them from your package. +This vignette shows you how to write your own expectations. Custom expectations allow you to extend testthat to meet your own specialized testing needs, creating new `expect_*` functions that work exactly the same way as the built-ins. Custom expectations are particularly useful if you want to produce expectations tailored for domain-specific data structures, combine multiple checks into a single expectation, or create more actionable feedback when an expectation fails. You can use them within your package by putting them in a helper file, or share them with others by exporting them from your package. + +In this vignette, you'll learn about the three-part structure of expectations, how to test your custom expectations, see a few examples, and, if you're writing a lot of expectations, learn how to reduce repeated code. ## Expectation basics @@ -32,7 +35,7 @@ expect_length <- function(object, n) { if (act_n != n) { msg <- c( sprintf("Expected %s to have length %i.", act$lab, n), - sprintf("Actual length: %i.", act$n) + sprintf("Actual length: %i.", act_n) ) return(fail(msg)) } @@ -42,30 +45,34 @@ expect_length <- function(object, n) { } ``` -The first step in any expectation is to use `quasi_label()` to capture a "labelled value", i.e. a list that contains both the value (`$val`) for testing and a label (`$lab`) for messaging. This is a pattern that exists for fairly esoteric reasons; you don't need to understand it, just copy and paste it 🙂. +The first step in any expectation is to use `quasi_label()` to capture a "labeled value", i.e., a list that contains both the value (`$val`) for testing and a label (`$lab`) used to make failure messages as informative as possible. This is a pattern that exists for fairly esoteric reasons; you don't need to understand it, just copy and paste it. Next you need to check each way that `object` could violate the expectation. In this case, there's only one check, but in more complicated cases there can be multiple checks. In most cases, it's easier to check for violations one by one, using early returns to `fail()`. This makes it easier to write informative failure messages that first describe what was expected and then what was actually seen. -Also note that you need to use `return(fail())` here. You won't see the problem when interactively testing your function because when run outside of `test_that()`, `fail()` throws an error, causing the function to terminate early. When running inside of `test_that()`, however, `fail()` does not stop execution because we want to collect all failures in a given test. +Note that you need to use `return(fail())` here. If you don't, your expectation might end up failing multiple times or both failing and succeeding. You won't see these problems when interactively testing your expectation, but forgetting to `return()` can lead to incorrect fail and pass counts in typical usage. In the next section, you'll learn how to test your expectation to avoid this issue. -Finally, if the object is as expected, call `pass()` with `act$val`. Returning the input value is good practice since expectation functions are called primarily for their side-effects (triggering a failure). This allows expectations to be chained: +Finally, if the object is as expected, call `pass()` with `act$val`. This is good practice because expectation functions are called primarily for their side-effects (triggering a failure), and returning the value allows expectations to be piped together: ```{r} -mtcars |> - expect_type("list") |> - expect_s3_class("data.frame") |> - expect_length(11) +#| label: piping + +test_that("mtcars is a 13 row data frame", { + mtcars |> + expect_type("list") |> + expect_s3_class("data.frame") |> + expect_length(11) +}) ``` ### Testing your expectations -Once you've written your expectation, you need to test it, and luckily testthat comes with three expectations designed specifically to test expectations: +Once you've written your expectation, you need to test it: expectations are functions that can have bugs, just like any other function, and it's really important that they generate actionable failure messages. Luckily testthat comes with three expectations designed specifically to test expectations: * `expect_success()` checks that your expectation emits exactly one success and zero failures. * `expect_failure()` checks that your expectation emits exactly one failure and zero successes. -* `expect_failure_snapshot()` captures the failure message in a snapshot, making it easier to review if it's useful or not. +* `expect_snapshot_failure()` captures the failure message in a snapshot, making it easier to review whether it's useful. -The first two expectations are particularly important because they ensure that your expectation reports the correct number of successes and failures to the user. +The first two expectations are particularly important because they ensure that your expectation always reports either a single success or a single failure. If it doesn't, the end user is going to get confusing results in their test suite reports. ```{r} test_that("expect_length works as expected", { @@ -82,11 +89,11 @@ test_that("expect_length gives useful feedback", { ## Examples -The following sections show you a few more variations, loosely based on existing testthat expectations. +The following sections show you a few more variations, loosely based on existing testthat expectations. These expectations were picked to show how you can generate actionable failures in slightly more complex situations. ### `expect_vector_length()` -Let's make `expect_length()` a bit more strict by also checking that the input is a vector. R is a bit weird in that it gives a length to pretty much every object, and you can imagine not wanting this code to succeed: +Let's make `expect_length()` a bit more strict by also checking that the input is a vector. R is a bit unusual in that it gives a length to pretty much every object, and you can imagine not wanting code like the following to succeed, because it's likely that the user passed the wrong object to the test. ```{r} expect_length(mean, 1) @@ -129,7 +136,7 @@ expect_vector_length(mtcars, 15) ### `expect_s3_class()` -Or imagine if you're checking to see if an object inherits from an S3 class. In R, there's no direct way to tell if an object is an S3 object: you can confirm that it's an object, then that it's not an S4 object. So you might organize your expectation this way: +Or imagine you're checking to see if an object inherits from an S3 class. R has a lot of different OO systems, and you want your failure messages to be as informative as possible, so before checking that the class matches, you probably want to check that the object is from the correct OO family. ```{r} expect_s3_class <- function(object, class) { @@ -177,24 +184,63 @@ expect_s3_class(x3, "integer") expect_s3_class(x3, "factor") ``` -Note the variety of messages: +Note the variety of error messages. We always print what was expected, and where possible, also display what was actually received: + +* When `object` isn't an object, we can only say what we expected. +* When `object` is an S4 object, we can report that. +* When `inherits()` is `FALSE`, we provide the actual class, since that's most informative. -* When `object` isn't an object, we only need to say what we expect. -* When `object` isn't an S3 object, we know it's an S4 object. -* When `inherits()` is `FALSE`, we provide the actual _class_, since that's most informative. +The general principle is to tailor error messages to what the user can act on based on what you know about the input. -I also check that the `class` argument must be a string. This is an error, not a failure, because it suggests you're using the function incorrectly. +Also note that I check that the `class` argument is a string. If it's not a string, I throw an error. This is not a test failure; the user is calling the function incorrectly. In general, you should check the type of all arguments that affect the operation and error if they're not what you expect. ```{r} #| error: true expect_s3_class(x1, 1) ``` +### Optional `class` + +A common pattern in testthat's own expectations it to use arguments to control the level of detail in the test. Here it would be nice if we check that an object is an S3 object without checking for a specific class. I think we could do that by renaming `expect_s3_class()` to `expect_s3_object()`. Now `expect_s3_object(x)` would verify that `x` is an S3 object, and `expect_s3_object(x, class = "foo")` to verify that `x` is an S3 object with the given class. The implementation of this is straightforward: we also allow `class` to be `NULL` and then only verify inheritance when non-`NULL`. + +```{r} +expect_s3_object <- function(object, class = NULL) { + if (!rlang::is_string(class) && is.null(class)) { + rlang::abort("`class` must be a string or NULL.") + } + + act <- quasi_label(rlang::enquo(object)) + + if (!is.object(act$val)) { + msg <- sprintf("Expected %s to be an object.", act$lab) + return(fail(msg)) + } + + if (isS4(act$val)) { + msg <- c( + sprintf("Expected %s to be an S3 object.", act$lab), + "Actual OO type: S4" + ) + return(fail(msg)) + } + + if (!is.null(class) && !inherits(act$val, class)) { + msg <- c( + sprintf("Expected %s to inherit from %s.", act$lab, class), + sprintf("Actual class: %s", class(act$val)) + ) + return(fail(msg)) + } + + pass(act$val) +} +``` + ## Repeated code -As you write more expectations, you might discover repeated code that you want to extract out into a helper. Unfortunately, creating helper functions is not straightforward in testthat because every `fail()` captures the calling environment in order to give maximally useful tracebacks. Because getting this right is not critical (you'll just get a slightly suboptimal traceback in the case of failure), we don't recommend bothering. However, we document it here because it's important to get it right in testthat itself. +As you write more expectations, you might discover repeated code that you want to extract into a helper. Unfortunately, creating 100% correct helper functions is not straightforward in testthat because `fail()` captures the calling environment in order to give useful tracebacks, and testthat's own expectations don't expose this as an argument. Fortunately, getting this right is not critical (you'll just get a slightly suboptimal traceback in the case of failure), so we don't recommend bothering in most cases. We document it here, however, because it's important to get it right in testthat itself. -The key challenge is that `fail()` captures a `trace_env` which should be the execution environment of the expectation. This usually works, because the default value of `trace_env` is `caller_env()`. But when you introduce a helper, you'll need to explicitly pass it along: +The key challenge is that `fail()` captures a `trace_env`, which should be the execution environment of the expectation. This usually works because the default value of `trace_env` is `caller_env()`. But when you introduce a helper, you'll need to explicitly pass it along: ```{r} expect_length_ <- function(act, n, trace_env = caller_env()) { @@ -215,6 +261,8 @@ expect_length <- function(object, n) { A few recommendations: -* The helper shouldn't be user facing, so we give it a `_` suffix to make that clear. -* It's typically easiest for a helper to take the labelled value produced by `quasi_label()`. +* The helper shouldn't be user-facing, so we give it a `_` suffix to make that clear. +* It's typically easiest for a helper to take the labeled value produced by `quasi_label()`. * Your helper should usually call both `fail()` and `pass()` and be returned from the wrapping expectation. + +Again, you're probably not writing so many expectations that it makes sense for you to go to this effort, but it is important for testthat to get it right. diff --git a/vignettes/mocking.Rmd b/vignettes/mocking.Rmd new file mode 100644 index 000000000..0afc53f9f --- /dev/null +++ b/vignettes/mocking.Rmd @@ -0,0 +1,235 @@ +--- +title: "Mocking" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Mocking} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r} +#| include: false +library(testthat) +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") + +# Pretend we're snapshotting +snapper <- local_snapshotter(fail_on_new = FALSE) +snapper$start_file("snapshotting.Rmd", "test") + +# Pretend we're testing testthat so we can use mocking +Sys.setenv(TESTTHAT_PKG = "testthat") +``` + +Mocking allows you to temporarily replace the implementation of a function with something that makes it easier to test. It's useful when testing failure scenarios that are hard to generate organically (e.g., what happens if dependency X isn't installed?), making tests more reliable, and making tests faster. It's also a general escape hatch to resolve almost any challenging testing problem. That said, mocking comes with downsides too: it's an advanced technique that can lead to brittle tests or tests that silently conceal problems. You should only use it when all other approaches fail. + +(If, like me, you're confused as to why you'd want to cruelly make fun of your tests, mocking here is used in the sense of making a fake or simulated version of something, i.e., a mock-up.) + +testthat's primary mocking tool is `local_mocked_bindings()` which is used to mock functions and is the focus of this vignette. But it also provides other tools for specialized cases: you can use `local_mocked_s3_method()` to mock an S3 method, `local_mocked_s4_method()` to mock an S4 method, and `local_mocked_r6_class()` to mock an R6 class. Once you understand the basic idea of mocking, it should be straightforward to apply these other tools where needed. + +In this vignette, we'll start by illustrating the basics of mocking with a few examples, continue to some real-world case studies from throughout the tidyverse, then finish up with the technical details so you can understand the tradeoffs of the current implementation. + +## Getting started with mocking + +Let's begin by motivating mocking with a simple example. Imagine you're writing a function like `rlang::check_installed()`. The goal of this function is to check if a package is installed, and if not, give a nice error message. It also takes an optional `min_version` argument that you can use to enforce a version constraint. A simple base R implementation might look something like this: + +```{r} +check_installed <- function(pkg, min_version = NULL) { + if (!requireNamespace(pkg, quietly = TRUE)) { + stop(sprintf("{%s} is not installed.", pkg)) + } + if (!is.null(min_version)) { + pkg_version <- packageVersion(pkg) + if (pkg_version < min_version) { + stop(sprintf( + "{%s} version %s is installed, but %s is required.", + pkg, + pkg_version, + min_version + )) + } + } + + invisible() +} +``` + +Now that we've written this function, we want to test it. There are many ways we might tackle this, but it's reasonable to start by testing the case where we don't specify a minimum version. To do this, we need to come up with a package we know is installed and a package we know isn't installed: + +```{r} +test_that("check_installed() checks package is installed", { + expect_no_error(check_installed("testthat")) + expect_snapshot(check_installed("doesntexist"), error = TRUE) +}) +``` + +This is probably fine as we certainly know that testthat must be installed but it feels a little fragile as it depends on external state that we don't control. While it's pretty unlikely, if someone does create a `doesntexist` package, this test will no longer work. As a general principle, the less your tests rely on state outside of your control, the more robust and reliable they'll be. + +Next we want to check the case where we specify a minimum version, and again we need to make up some inputs: + +```{r} +test_that("check_installed() checks minimum version", { + expect_no_error(check_installed("testthat", "1.0.0")) + expect_snapshot(check_installed("testthat", "99.99.999"), error = TRUE) +}) +``` + +Again, this is probably safe (since I'm unlikely to release 90+ new versions of testthat), but if you look at the snapshot message carefully, you'll notice that it includes the current version of testthat. That means every time a new version of testthat is released, we'll have to update the snapshot. We could use the `transform` argument to fix this: + +```{r} +test_that("check_installed() checks minimum version", { + expect_no_error(check_installed("testthat", "1.0.0")) + expect_snapshot( + check_installed("testthat", "99.99.999"), + error = TRUE, + transform = function(lines) gsub(packageVersion("testthat"), "", lines) + ) +}) +``` + +But it's starting to feel like we've accumulating more and more hacks. So let's take a fresh look and see how mocking might help us. The basic idea of mocking is to temporarily replace the implementation of functions being used by the function we're testing. Here we're testing `check_installed()` and want to mock `requireNamespace()` and `packageVersion()` so we can control their versions. There's a small wrinkle here in that `requireNamespace` and `packageVersion` are base functions, not our functions, so we need to make bindings in our package namespace so we can mock them (we'll come back to why later). + +```{r} +requireNamespace <- NULL +packageVersion <- NULL +``` + +For the first test, we mock `requireNamespace()` twice: first to always return `TRUE` (pretending every package is installed), and then to always return `FALSE` (pretending that no packages are installed). Now the test is completely self-contained and doesn't depend on what packages happen to be installed. + +```{r} +test_that("check_installed() checks package is installed", { + local_mocked_bindings(requireNamespace = function(...) TRUE) + expect_no_error(check_installed("package-name")) + + local_mocked_bindings(requireNamespace = function(...) FALSE) + expect_snapshot(check_installed("package-name"), error = TRUE) +}) +``` + +For the second test, we mock `requireNamespace()` to return `TRUE`, and then `packageVersion()` to always return version 2.0.0. This again ensures our test is independent of system state. + +```{r} +test_that("check_installed() checks minimum version", { + local_mocked_bindings( + requireNamespace = function(...) TRUE, + packageVersion = function(...) numeric_version("2.0.0") + ) + + expect_no_error(check_installed("package-name", "1.0.0")) + expect_snapshot(check_installed("package-name", "3.4.5"), error = TRUE) +}) +``` + +## Case studies + +To give you more experience with mocking, this section looks at a few places where we use mocking in the tidyverse: + +* Testing `testthat::skip_on_os()` regardless of what operating system is running the test. +* Speeding up `usethis::use_release_issue()`. +* Testing the passage of time in `httr2::req_throttle()`. + +These situations are all a little complex, as this is the nature of mocking: if you can use a simpler technique, you should. Mocking is only needed for otherwise intractable problems. + +### Pretending we're on a different platform + +```{r} +#| include: false +system_os <- NULL +``` + +`testthat::skip_on_os()` allows you to skip tests on specific operating systems, using the internal `system_os()` function which is a thin wrapper around `Sys.info()[["sysname"]]`. To test that this skip works correctly, we have to use mocking because there's no other way to pretend we're running on a different operating system. This yields the following test, where we using mocking to pretend that we're always on Windows: + +```{r} +#| eval: false +test_that("can skip on multiple oses", { + local_mocked_bindings(system_os = function() "windows") + + expect_skip(skip_on_os("windows")) + expect_skip(skip_on_os(c("windows", "linux"))) + expect_no_skip(skip_on_os("linux")) +}) +``` + +(The logic of `skip_on_os()` is simple enough that I feel confident we only need to simulate one platform.) + +### Speeding up tests + +`usethis::use_release_issue()` creates a GitHub issue with a bulleted list of actions to follow when releasing a package. But some of the bullets depend on complex conditions that can take a while to compute. So the [tests for this function](https://github.com/r-lib/usethis/blob/main/tests/testthat/test-release.R) use mocks like this: + +```{r} +#| eval: false +local_mocked_bindings( + get_revdeps = function() character(), + gh_milestone_number = function(...) NA +) +``` + +Here we pretend that there are no reverse dependencies (revdeps) for the package, which is both slow to compute and will vary over time if we use a real package. We also pretend that there are no related GitHub milestones, which otherwise requires an GitHub API call, which is again slow and might vary over time. Together, these mocks keep the tests fast and self-contained, free from any state outside of our direct control. + +### Managing time + +`httr2::req_throttle()` prevents multiple requests from being made too quickly, using a technique called a leaky token bucket. This technique is inextricably tied to real time because you want to allow more requests as time elapses. So how do you test this? I started by using `Sys.sleep()`, but this made my tests both slow (because I'd sleep for a second or two) and unreliable (because sometimes more time elapsed than I expected). Eventually I figured out that I could "manually control" time by using a [mocked function](https://github.com/r-lib/httr2/blob/main/tests/testthat/test-req-throttle.R) that returns the value of a variable I control. This allows me to manually advance time and carefully test the implications. + +You can see the basic idea with a simpler example. Let's first begin with a function that returns the "unix time", the number of seconds elapsed since midnight on Jan 1, 1970. This is easy to compute, but will make some computations simpler later as well as providing a convenient function to mock. + +```{r} +unix_time <- function() unclass(Sys.time()) +unix_time() +``` + +Now I'm going to create a function factory that makes it easy to compute how much time has elapsed since some fixed starting point: + +```{r} +elapsed <- function() { + start <- unix_time() + function() { + unix_time() - start + } +} + +timer <- elapsed() +Sys.sleep(0.5) +timer() +``` + +Imagine trying to test this function without mocking! You'd probably think it's not worth it. In fact, that's what I thought originally, but I soon learned my lesson because I introduce bug because I'd forgotten the complexities of computing the difference between two POSIXct values. + +With mocking, however, I can "manipulate time" by mocking `unix_time()` so that it returns the value of a variable I control. Now I can write a reliable test: + +```{r} +test_that("elapsed() measures elapsed time", { + time <- 1 + local_mocked_bindings(unix_time = function() time) + + timer <- elapsed() + expect_equal(timer(), 0) + + time <- 2 + expect_equal(timer(), 1) +}) +``` + +## How does mocking work? + +To finish up, it's worth discussing how mocking works. The fundamental challenge of mocking is that you want it to be "hygienic", i.e. it should only affect the operation of your package code, not all running code. You can see why this might be problematic if you imagine mocking a function that testthat itself uses: you don't want to accidentally break testthat while trying to test your code! To achieve this goal, `local_mocked_bindings()` works by modifying your package's [namespace environment](https://adv-r.hadley.nz/environments.html#special-environments). + +You can implement the basic idea using base R code like this: + +```{r} +#| eval: false + +old <- getFromNamespace("my_function", "mypackage") +assignInNamespace("my_function", new, "mypackage") + +# run the test... + +# restore the previous value +assignInNamespace("my_function", old, "mypackage") +``` + +This implementation leads to two limitations of `local_mocked_bindings()`: + +1. The package namespace is locked, which means that you can't add new bindings to it. That means if you want to mock base functions, you have to provide some binding that can be overridden. The easiest way to do this is with something like `mean <- NULL`. This creates a binding that `local_mocked_bindings()` can modify, but because of R's [lexical scoping rules](https://adv-r.hadley.nz/functions.html#functions-versus-variables) doesn't affect ordinary calls. + +2. `::` doesn't use the package namespace, so if you want to mock an explicitly namespaced function, you either have import `fun` into your `NAMESPACE` (e.g., with `@importFrom pkg fun`) or create your own wrapper function that you can mock. Typically, one of these options will feel fairly natural. + +Overall, these limitations feel correct to me: `local_mocked_bindings()` makes it easy to temporarily change the implementation of functions that you have written, while offering workarounds to override the implementations of functions that others have written in the scope of your package. diff --git a/vignettes/skipping.Rmd b/vignettes/skipping.Rmd index 93556421e..ba509f7b1 100644 --- a/vignettes/skipping.Rmd +++ b/vignettes/skipping.Rmd @@ -14,20 +14,19 @@ knitr::opts_chunk$set( ) ``` -Some times you have tests that you don't want to run in certain circumstances. +Sometimes you have tests that you can't or don't want to run in certain circumstances. This vignette describes how to **skip** tests to avoid execution in undesired environments. -Skipping is a relatively advanced topic because in most cases you want all your tests to run everywhere. -The most common exceptions are: +The most common scenarios are: - You're testing a web service that occasionally fails, and you don't want to run the tests on CRAN. - Or maybe the API requires authentication, and you can only run the tests when you've [securely distributed](https://gargle.r-lib.org/articles/articles/managing-tokens-securely.html) some secrets. + Or the API requires authentication, and you can only run the tests when you've [securely distributed](https://gargle.r-lib.org/articles/articles/managing-tokens-securely.html) secrets. -- You're relying on features that not all operating systems possess, and want to make sure your code doesn't run on a platform where it doesn't work. - This platform tends to be Windows, since amongst other things, it lacks full utf8 support. +- You're relying on features that not all operating systems possess, and want to make sure your code doesn't run on platforms where it doesn't work. + The most common platform with limitations is Windows, which among other things lacks full UTF-8 support. -- You're writing your tests for multiple versions of R or multiple versions of a dependency and you want to skip when a feature isn't available. +- You're writing tests for multiple versions of R or multiple versions of a dependency, and you want to skip when a feature isn't available. You generally don't need to skip tests if a suggested package is not installed. - This is only needed in exceptional circumstances, e.g. when a package is not available on some operating system. + This is only needed in exceptional circumstances, e.g., when a package is not available on some operating systems. ```{r setup} library(testthat) @@ -37,18 +36,38 @@ library(testthat) testthat comes with a variety of helpers for the most common situations: -- `skip_on_cran()` skips tests on CRAN. +- `skip_if_not_installed()` skips if a required package is not installed. You can optionally supply a minimal version too. + +- `skip_on_cran()` skips tests on CRAN. `skip_on_bioc()` skips tests on Bioconductor. This is useful for slow tests and tests that occasionally fail for reasons outside of your control. - `skip_on_os()` allows you to skip tests on a specific operating system. Generally, you should strive to avoid this as much as possible (so your code works the same on all platforms), but sometimes it's just not possible. -- `skip_on_ci()` skips tests on most continuous integration platforms (e.g. GitHub Actions, Travis, Appveyor). +- `skip_on_ci()` skips tests on most CI platforms (e.g., GitHub Actions). + +- `skip_on_covr()` skips tests during code coverage. + +- `skip_unless_r(">= 4.2")` only runs tests for newer R versions. + `skip_unless_r("< 4.2")` only runs tests for older R versions. + +You can implement your own using skips `skip_if()` or `skip_if_not()`: + +```{r} +#| eval: false -You can also easily implement your own using either `skip_if()` or `skip_if_not()`, which both take an expression that should yield a single `TRUE` or `FALSE`. +# Only run test if a token file is available +skip_if_not(file.exists("secure-token.json")) -All reporters show which tests as skipped. -As of testthat 3.0.0, ProgressReporter (used interactively) and CheckReporter (used inside of `R CMD check`) also display a summary of skips across all tests. +# Only run test if R has memory profiling capabilities +skip_if_not(capabilities("profmem")) + +# Only run if we've opted-in to slow tests with an env var +skip_if(Sys.getenv("RUN_SLOW_TESTS") == "true") +``` + +All reporters show which tests are skipped. +As of testthat 3.0.0, ProgressReporter (used interactively) and CheckReporter (used inside `R CMD check`) also display a summary of skips across all tests. It looks something like this: ``` @@ -57,12 +76,12 @@ It looks something like this: ● On CRAN (1) ``` -You should keep an on eye this when developing interactively to make sure that you're not accidentally skipping the wrong things. +This display is really important, and you should keep an eye on it when working on your test suite. If you accidentally skip too many tests, you can trick yourself into believing your code is working correctly, when actually you're just not testing it. ## Helpers -If you find yourself using the same `skip_if()`/`skip_if_not()` expression across multiple tests, it's a good idea to create a helper function. -This function should start with `skip_` and live in a `test/helper-{something}.R` file: +If you find yourself using the same `skip_if()` or `skip_if_not()` expression across multiple tests, it's a good idea to create a helper function. +This function should start with `skip_` and live in a `tests/testthat/helper-{something}.R` file: ```{r} skip_if_dangerous <- function() { @@ -76,8 +95,8 @@ skip_if_dangerous <- function() { ## Embedding `skip()` in package functions -Another useful technique that can sometimes be useful is to build a `skip()` directly into a package function. -For example take a look at [`pkgdown:::convert_markdown_to_html()`](https://github.com/r-lib/pkgdown/blob/v2.0.7/R/markdown.R#L95-L106), which absolutely, positively cannot work if the Pandoc tool is unavailable: +Another useful technique is to embed a `skip()` directly into a package function. +For example, take a look at [`pkgdown:::convert_markdown_to_html()`](https://github.com/r-lib/pkgdown/blob/v2.0.7/R/markdown.R#L95-L106), which absolutely cannot work if the Pandoc tool is unavailable: ```{r eval = FALSE} convert_markdown_to_html <- function(in_path, out_path, ...) { @@ -98,18 +117,18 @@ convert_markdown_to_html <- function(in_path, out_path, ...) { ``` If Pandoc is not available when `convert_markdown_to_html()` executes, it throws an error *unless* it appears to be part of a test run, in which case the test is skipped. -This is an alternative to implementing a custom skipper, e.g. `skip_if_no_pandoc()`, and inserting it into many of pkgdown's tests. +This is an alternative to implementing a custom skipper, e.g., `skip_if_no_pandoc()`, and inserting it into many of pkgdown's tests. We don't want pkgdown to have a runtime dependency on testthat, so pkgdown includes a copy of `testthat::is_testing()`: -```{r eval = FALSE} +```{r} is_testing <- function() { identical(Sys.getenv("TESTTHAT"), "true") } ``` -It might look like the code appears to still have a runtime dependency on testthat, because of the call to `testthat::skip()`. -But `testthat::skip()` is only executed during a test run, which implies that testthat is installed. +It might look like the code still has a runtime dependency on testthat, because of the call to `testthat::skip()`. +But `testthat::skip()` is only executed during a test run, which means that testthat is installed. We have mixed feelings about this approach. On the one hand, it feels elegant and concise, and it absolutely guarantees that you'll never miss a needed skip in one of your tests. diff --git a/vignettes/snapshotting.Rmd b/vignettes/snapshotting.Rmd index d0f3019ba..2dba8668a 100644 --- a/vignettes/snapshotting.Rmd +++ b/vignettes/snapshotting.Rmd @@ -16,19 +16,19 @@ set.seed(1014) ``` The goal of a unit test is to record the expected output of a function using code. -This is a powerful technique because not only does it ensure that code doesn't change unexpectedly, it also expresses the desired behaviour in a way that a human can understand. +This is a powerful technique because it not only ensures that code doesn't change unexpectedly, but it also expresses the desired behavior in a way that a human can understand. -However, it's not always convenient to record the expected behaviour with code. +However, it's not always convenient to record the expected behavior with code. Some challenges include: - Text output that includes many characters like quotes and newlines that require special handling in a string. -- Output that is large, making it painful to define the reference output, and bloating the size of the test file and making it hard to navigate. +- Output that is large, making it painful to define the reference output and bloating the size of the test file. -- Binary formats like plots or images, which are very difficult to describe in code: i.e. the plot looks right, the error message is useful to a human, the print method uses colour effectively. +- Binary formats like plots or images, which are very difficult to describe in code: e.g., the plot looks right, the error message is actionable, or the print method uses color effectively. For these situations, testthat provides an alternative mechanism: snapshot tests. -Instead of using code to describe expected output, snapshot tests (also known as [golden tests](https://ro-che.info/articles/2017-12-04-golden-tests)) record results in a separate human readable file. +Instead of using code to describe expected output, snapshot tests (also known as [golden tests](https://ro-che.info/articles/2017-12-04-golden-tests)) record results in a separate human-readable file. Snapshot tests in testthat are inspired primarily by [Jest](https://jestjs.io/docs/en/snapshot-testing), thanks to a number of very useful discussions with Joe Cheng. ```{r setup} @@ -42,8 +42,8 @@ snapper$start_file("snapshotting.Rmd", "test") ## Basic workflow -We'll illustrate the basic workflow with a simple function that generates an HTML heading. -It can optionally include an `id` attribute, which allows you to construct a link directly to that heading. +We'll illustrate the basic workflow with a simple function that generates HTML bullets. +It can optionally include an `id` attribute, which allows you to construct a link directly to that list. ```{r} bullets <- function(text, id = NULL) { @@ -72,7 +72,7 @@ To do this we make two changes to our code: - We use `expect_snapshot()` instead of `expect_equal()` -- We wrap the call in `cat()` (to avoid `[1]` in the output, like in my first interactive example). +- We wrap the call in `cat()` (to avoid `[1]` in the output, like in the first interactive example above). This yields the following test: @@ -89,9 +89,9 @@ snapper$end_file() snapper$start_file("snapshotting.Rmd", "test") ``` -When we run the test for the first time, it automatically generates reference output, and prints it, so that you can visually confirm that it's correct. +When we run the test for the first time, it automatically generates reference output and prints it, so that you can visually confirm that it's correct. The output is automatically saved in `_snaps/{name}.md`. -The name of the snapshot matches your test file name --- e.g. if your test is `test-pizza.R` then your snapshot will be saved in `test/testthat/_snaps/pizza.md`. +The name of the snapshot matches your test file name --- e.g. if your test is `test-pizza.R` then your snapshot will be saved in `tests/testthat/_snaps/pizza.md`. As the file name suggests, this is a markdown file, which I'll explain shortly. If you run the test again, it'll succeed: @@ -103,8 +103,9 @@ test_that("bullets", { }) ``` -```{r, include = FALSE} -# Reset snapshot test +```{r} +#| include: false +# finalise snapshot to in order to get an error snapper$end_file() snapper$start_file("snapshotting.Rmd", "test") ``` @@ -125,16 +126,16 @@ test_that("bullets", { }) ``` -If this is a deliberate change, you can follow the advice in the message and update the snapshots for that file by running `snapshot_accept("pizza")`; otherwise you can fix the bug and your tests will pass once more. -(You can also accept snapshot for all files with `snapshot_accept()`). +If this is a deliberate change, you can follow the advice in the message and update the snapshots for that file by running `snapshot_accept("pizza")`; otherwise, you can fix the bug and your tests will pass once more. +(You can also accept snapshots for all files with `snapshot_accept()`.) If you delete the test, the corresponding snapshot will be removed the next time you run the tests. If you delete all snapshots in the file, the entire snapshot file will be deleted the next time you run all the tests. ### Snapshot format Snapshots are recorded using a subset of markdown. -You might wonder why we use markdown? -It's important that snapshots be readable by humans, because humans have to look at it during code reviews. +You might wonder why we use markdown. +We use it because it's important that snapshots be human-readable because humans have to read them during code reviews. Reviewers often don't run your code but still want to understand the changes. Here's the snapshot file generated by the test above: @@ -154,41 +155,21 @@ Here's the snapshot file generated by the test above: ``` Each test starts with `# {test name}`, a level 1 heading. -Within a test, each snapshot expectation is indented by four spaces, i.e. as code, and are separated by `---`, a horizontal rule. +Within a test, each snapshot expectation is indented by four spaces, i.e., as code, and they are separated by `---`, a horizontal rule. ### Interactive usage Because the snapshot output uses the name of the current test file and the current test, snapshot expectations don't really work when run interactively at the console. Since they can't automatically find the reference output, they instead just print the current value for manual inspection. -## Other types of output +## Testing errors -So far we've focussed on snapshot tests for output printed to the console. +So far we've focused on snapshot tests for output printed to the console. But `expect_snapshot()` also captures messages, errors, and warnings[^1]. -The following function generates a some output, a message, and a warning: - -[^1]: We no longer recommend `expect_snapshot_output()`, `expect_snapshot_warning()`, or `expect_snapshot_error()`. - Just use `expect_snapshot()`. - -```{r} -f <- function() { - print("Hello") - message("Hi!") - warning("How are you?") -} -``` - -And `expect_snapshot()` captures them all: +Messages and warnings are straightforward, but capturing errors is *slightly* more difficult because `expect_snapshot()` will fail if there's an error: ```{r} -test_that("f() makes lots of noise", { - expect_snapshot(f()) -}) -``` - -Capturing errors is *slightly* more difficult because `expect_snapshot()` will fail when there's an error: - -```{r, error = TRUE} +#| error: true test_that("you can't add a number and a letter", { expect_snapshot(1 + "a") }) @@ -210,16 +191,137 @@ test_that("you can't add weird things", { expect_snapshot(error = TRUE, { 1 + "a" mtcars + iris - mean + sum + Sys.Date() + factor() }) }) ``` -## Snapshotting values +Just be careful: when you set `error = TRUE`, `expect_snapshot()` checks that at least one expression throws an error, not that every expression throws an error. For example, look above and notice that adding a date and a factor generated a warning, not an error. + +Snapshot tests are particularly important when testing complex error messages, such as those that you might generate with cli. Here's a more realistic example illustrating how you might test `check_unnamed()`, a function that ensures all arguments in `...` are unnamed. + +```{r} +check_unnamed <- function(..., call = parent.frame()) { + names <- ...names() + has_name <- names != "" + if (!any(has_name)) { + return(invisible()) + } + + named <- names[has_name] + cli::cli_abort( + c( + "All elements of {.arg ...} must be unnamed.", + i = "You supplied argument{?s} {.arg {named}}." + ), + call = call + ) +} + +test_that("no errors if all arguments unnamed", { + expect_no_error(check_unnamed()) + expect_no_error(check_unnamed(1, 2, 3)) +}) + +test_that("actionable feedback if some or all arguments named", { + expect_snapshot(error = TRUE, { + check_unnamed(x = 1, 2) + check_unnamed(x = 1, y = 2) + }) +}) +``` + +## Other challenges + +### Varying outputs + +Sometimes part of the output varies in ways that you can't easily control. In many cases, it's convenient to use mocking (`vignette("mocking")`) to ensure that every run of the function always produces the same output. In other cases, it's easier to manipulate the text output with a regular expression or similar. That's the job of the `transform` argument, which should be passed a function that takes a character vector of lines and returns a modified vector. + +This type of problem often crops up when you are testing a function that gives feedback about a path. In your tests, you'll typically use a temporary path (e.g., from `withr::local_tempfile()`), so if you display the path in a snapshot, it will be different every time. +For example, consider this "safe" version of `writeLines()` that requires you to explicitly opt in to overwriting an existing file: + +```{r} +safe_write_lines <- function(lines, path, overwrite = FALSE) { + if (file.exists(path) && !overwrite) { + cli::cli_abort(c( + "{.path {path}} already exists.", + i = "Set {.code overwrite = TRUE} to overwrite" + )) + } + + writeLines(lines, path) +} +``` + +If you use a snapshot test to confirm that the error message is useful, the snapshot will be different every time the test is run: + +```{r} +#| include: false +snapper$end_file() +snapper$start_file("snapshotting.Rmd", "safe-write-lines") +``` + +```{r} +test_that("generates actionable error message", { + path <- withr::local_tempfile(lines = "") + expect_snapshot(safe_write_lines(letters, path), error = TRUE) +}) +``` + +```{r} +#| include: false +snapper$end_file() +snapper$start_file("snapshotting.Rmd", "safe-write-lines") +``` + +```{r} +#| error: true +test_that("generates actionable error message", { + path <- withr::local_tempfile(lines = "") + expect_snapshot(safe_write_lines(letters, path), error = TRUE) +}) +``` + +```{r} +#| include: false +snapper$end_file() +snapper$start_file("snapshotting.Rmd", "test-2") +``` + +One way to fix this problem is to use the `transform` argument to replace the temporary path with a fixed value: + +```{r} +test_that("generates actionable error message", { + path <- withr::local_tempfile(lines = "") + expect_snapshot( + safe_write_lines(letters, path), + error = TRUE, + transform = \(lines) gsub(path, "", lines, fixed = TRUE) + ) +}) +``` + +Now even though the path varies, the snapshot does not. + +### `local_reproducible_output()` + +By default, testthat sets a number of options that simplify and standardize output: + +* The console width is set to 80. +* {cli} ANSI coloring and hyperlinks are suppressed. +* Unicode characters are suppressed. + +These are sound defaults that we have found useful to minimize spurious differences between tests run in different environments. However, there are times when you want to deliberately test different widths, ANSI escapes, or Unicode characters, so you can override the defaults with `local_reproducible_output()`. + +### Snapshotting graphics + +If you need to test graphical output, use {vdiffr}. vdiffr is used to test ggplot2 and incorporates everything we know about high-quality graphics tests that minimize false positives. Graphics testing is still often fragile, but using vdiffr means you will avoid all the problems we know how to avoid. + +### Snapshotting values `expect_snapshot()` is the most used snapshot function because it records everything: the code you run, printed output, messages, warnings, and errors. -If you care about the return value rather than any side-effects, you may might to use `expect_snapshot_value()` instead. -It offers a number of serialisation approaches that provide a tradeoff between accuracy and human readability. +If you care about the return value rather than any side effects, you might want to use `expect_snapshot_value()` instead. +It offers a number of serialization approaches that provide a tradeoff between accuracy and human readability. ```{r} test_that("can snapshot a simple list", { @@ -231,13 +333,13 @@ test_that("can snapshot a simple list", { ## Whole file snapshotting `expect_snapshot()`, `expect_snapshot_output()`, `expect_snapshot_error()`, and `expect_snapshot_value()` use one snapshot file per test file. -But that doesn't work for all file types --- for example, what happens if you want to snapshot an image? +But that doesn't work for all file types—for example, what happens if you want to snapshot an image? `expect_snapshot_file()` provides an alternative workflow that generates one snapshot per expectation, rather than one file per test. -Assuming you're in `test-burger.R` then the snapshot created by `expect_snapshot_file(code_that_returns_path_to_file(), "toppings.png")` would be saved in `tests/testthat/_snaps/burger/toppings.png`. -If a future change in the code creates a different file it will be saved in `tests/testthat/_snaps/burger/toppings.new.png`. +Assuming you're in `test-burger.R`, then the snapshot created by `expect_snapshot_file(code_that_returns_path_to_file(), "toppings.png")` would be saved in `tests/testthat/_snaps/burger/toppings.png`. +If a future change in the code creates a different file, it will be saved in `tests/testthat/_snaps/burger/toppings.new.png`. Unlike `expect_snapshot()` and friends, `expect_snapshot_file()` can't provide an automatic diff when the test fails. -Instead you'll need to call `snapshot_review()`. +Instead, you'll need to call `snapshot_review()`. This launches a Shiny app that allows you to visually review each change and approve it if it's deliberate: ```{r} @@ -260,12 +362,13 @@ knitr::include_graphics("review-text.png") The display varies based on the file type (currently text files, common image files, and csv files are supported). -Sometimes the failure occurs in a non-interactive environment where you can't run `snapshot_review()`, e.g. in `R CMD check`. -In this case, the easiest fix is to retrieve the `.new` file, copy it into the appropriate directory, then run `snapshot_review()` locally. -If your code was run on a CI platform, you'll need to start by downloading the run "artifact", which contains the check folder. +Sometimes the failure occurs in a non-interactive environment where you can't run `snapshot_review()`, e.g., in `R CMD check`. +In this case, the easiest fix is to retrieve the `.new` file, copy it into the appropriate directory, and then run `snapshot_review()` locally. +If this happens on GitHub, testthat provides some tools to help you in the form of `gh_download_artifact()`. In most cases, we don't expect you to use `expect_snapshot_file()` directly. Instead, you'll use it via a wrapper that does its best to gracefully skip tests when differences in platform or package versions make it unlikely to generate perfectly reproducible output. +That wrapper should also typically call `announce_snapshot_file()` to avoid snapshots being incorrectly cleaned up—see the documentation for more details. ## Previous work @@ -277,12 +380,12 @@ This section describes some of the previous attempts and why we believe the new - You have to supply a path where the output will be saved. This seems like a small issue, but thinking of a good name, and managing the difference between interactive and test-time paths introduces a surprising amount of friction. - - It always overwrites the previous result; automatically assuming that the changes are correct. - That means you have to use it with git and it's easy to accidentally accept unwanted changes. + - It always overwrites the previous result, automatically assuming that the changes are correct. + That means you have to use it with git, and it's easy to accidentally accept unwanted changes. - It's relatively coarse grained, which means tests that use it tend to keep growing and growing. -- `expect_known_output()` is finer grained version of `verify_output()` that captures output from a single function. +- `expect_known_output()` is a finer-grained version of `verify_output()` that captures output from a single function. The requirement to produce a path for each individual expectation makes it even more painful to use. -- `expect_known_value()` and `expect_known_hash()` have all the disadvantages of `expect_known_output()`, but also produce binary output meaning that you can't easily review test differences in pull requests. +- `expect_known_value()` and `expect_known_hash()` have all the disadvantages of `expect_known_output()`, but also produce binary output, meaning that you can't easily review test differences in pull requests. diff --git a/vignettes/test-fixtures.Rmd b/vignettes/test-fixtures.Rmd index e218a85a9..d7cf541f8 100644 --- a/vignettes/test-fixtures.Rmd +++ b/vignettes/test-fixtures.Rmd @@ -20,7 +20,7 @@ knitr::opts_chunk$set( > > ― Chief Si'ahl -Ideally, a test should leave the world exactly as it found it. But you often need to make some changes in order to exercise every part of your code: +Ideally, a test should leave the world exactly as it found it. But you often need to make changes to exercise every part of your code: - Create a file or directory - Create a resource on an external system @@ -29,23 +29,60 @@ Ideally, a test should leave the world exactly as it found it. But you often nee - Change working directory - Change an aspect of the tested package's state -How can you clean up these changes to get back to a clean slate? Scrupulous attention to cleanup is more than just courtesy or being fastidious. It is also self-serving. The state of the world after test `i` is the starting state for test `i + 1`. Tests that change state willy-nilly eventually end up interfering with each other in ways that can be very difficult to debug. +How can you clean up these changes to get back to a clean slate? Scrupulous attention to cleanup is more than just courtesy or being fastidious. It's also self-serving. The state of the world after test `i` is the starting state for test `i + 1`. Tests that change state willy-nilly eventually end up interfering with each other in ways that can be very difficult to debug. -Most tests are written with an implicit assumption about the starting state, usually whatever *tabula rasa* means for the target domain of your package. If you accumulate enough sloppy tests, you will eventually find yourself asking the programming equivalent of questions like "Who forgot to turn off the oven?" and "Who didn't clean up after the dog?". +Most tests are written with an implicit assumption about the starting state, usually whatever *tabula rasa* means for the target domain of your package. If you accumulate enough sloppy tests, you will eventually find yourself asking the programming equivalent of questions like "Who forgot to turn off the oven?" and "Who didn't clean up after the dog?" (If you've got yourself into this state, testthat provides another tool to help you figure out exactly which test is to blame: `set_state_inspector()`.) -It's also important that your setup and cleanup is easy to use when working interactively. When a test fails, you want to be able to quickly recreate the exact environment in which the test is run so you can interactively experiment to figure out what went wrong. +It's also important that your setup and cleanup are easy to use when working interactively. When a test fails, you want to be able to quickly recreate the exact environment in which the test is run so you can interactively experiment to figure out what went wrong. -This article introduces a powerful technique that allows you to solve both problems: **test fixtures**. We'll begin with an introduction to the tools that make fixtures possible, then talk about exactly what a test fixture is, and show a few examples. - -Much of this vignette is derived from ; if this is your first encounter with `on.exit()` or `withr::defer()`, I'd recommend starting with that blog as it gives a gentler introduction. This vignette moves a little faster since it's designed as more of a reference doc. +This article introduces a powerful technique that allows you to solve both problems: **test fixtures**. We'll begin by discussing some canned tools, then learn about the underlying theory, discuss exactly what a test fixture is, and finish with a few examples. ```{r} library(testthat) ``` +## `local_` helpers + +We'll begin by giving you the minimal knowledge needed to change global state *just* within your test. The withr package provides a number of functions that temporarily change the state of the world, carefully undoing the changes when the current function or test finishes: + +| Do / undo this | withr function | +|-----------------------------|-------------------| +| Create a file | `local_tempfile()`| +| Create a directory | `local_tempdir()` | +| Set an R option | `local_options()` | +| Set an environment variable | `local_envvar()` | +| Change working directory | `local_dir()` | + +(You can see a full list at but these five are by far the most commonly used.) + +These allow you to control options that would otherwise be painful. For example, imagine you're testing base R code that rounds numbers to a fixed number of places when printing. You could write code like this: + +```{r} +test_that("print() respects digits option", { + x <- 1.23456789 + + withr::local_options(digits = 1) + expect_equal(capture.output(x), "[1] 1") + + withr::local_options(digits = 5) + expect_equal(capture.output(x), "[1] 1.2346") +}) +``` + +If you write a lot of code like this in your tests, you might decide you want a helper function or **test fixture** that reduces the duplication. Fortunately withr's local functions allow us to solve this problem by providing an `.local_envir` or `envir` argument that controls when cleanup occurs. The exact details of how this works are rather complicated, but fortunately there's a common pattern you can use without understanding all the details. Your helper function should always have an `env` argument that defaults to `parent.frame()`, which you pass to the `.local_envir` argument of `local_()`: + +```{r} +local_digits <- function(sig_digits, env = parent.frame()) { + withr::local_options(digits = sig_digits, .local_envir = env) + + # mark that this function is called for its side-effects not its return value + invisible() +} +``` + ## Foundations -Before we can talk about test fixtures, we need to lay some foundations to help you understand how they work. We'll motivate the discussion with a `sloppy()` function that prints a number with a specific number of significant digits by adjusting an R option: +Before we go further, let's lay some foundations to help you understand how `local_` functions work. We'll motivate the discussion with a `sloppy()` function that prints a number with a specific number of significant digits by adjusting an R option: ```{r include = FALSE} op <- options() @@ -66,13 +103,13 @@ pi options(op) ``` -Notice how `pi` prints differently before and after the call to `sloppy()`. Calling `sloppy()` has a side effect: it changes the `digits` option globally, not just within its own scope of operations. This is what we want to avoid[^1]. +Notice how `pi` prints differently before and after the call to `sloppy()`. Calling `sloppy()` has a side effect: it changes the `digits` option globally, not just within its own scope. This is what we want to avoid[^1]. [^1]: Don't worry, I'm restoring global state (specifically, the `digits` option) behind the scenes here. ### `on.exit()` -The first function you need to know about is base R's `on.exit()`. `on.exit()` calls the code to supplied to its first argument when the current function exits, regardless of whether it returns a value or errors. You can use `on.exit()` to clean up after yourself by ensuring that every mess-making function call is paired with an `on.exit()` call that cleans up. +The first function you need to know about is base R's `on.exit()`. `on.exit()` calls the code supplied to its first argument when the current function exits, regardless of whether it returns a value or throws an error. You can use `on.exit()` to clean up after yourself by ensuring that every mess-making function call is paired with an `on.exit()` call that cleans up. We can use this idea to turn `sloppy()` into `neat()`: @@ -88,7 +125,7 @@ neat(pi, 2) pi ``` -Here we make use of a useful pattern `options()` implements: when you call `options(digits = sig_digits)` it both sets the `digits` option *and* (invisibly) returns the previous value of digits. We can then use that value to restore the previous options. +Here we make use of a useful pattern that `options()` implements: when you call `options(digits = sig_digits)`, it both sets the `digits` option *and* (invisibly) returns the previous value of digits. We can then use that value to restore the previous options. `on.exit()` also works in tests: @@ -104,7 +141,7 @@ pi There are three main drawbacks to `on.exit()`: -- You should always call it with `add = TRUE` and `after = FALSE`. These ensure that the call is **added** to the list of deferred tasks (instead of replaces) and is added to the **front** of the stack (not the back, so that cleanup occurs in reverse order to setup). These arguments only matter if you're using multiple `on.exit()` calls, but it's a good habit to always use them to avoid potential problems down the road. +- You should always call it with `add = TRUE` and `after = FALSE`. These ensure that the call is **added** to the list of deferred tasks (instead of replacing them) and is added to the **front** of the stack (not the back), so that cleanup occurs in reverse order to setup. These arguments only matter if you're using multiple `on.exit()` calls, but it's a good habit to always use them to avoid potential problems down the road. - It doesn't work outside a function or test. If you run the following code in the global environment, you won't get an error, but the cleanup code will never be run: @@ -115,13 +152,13 @@ There are three main drawbacks to `on.exit()`: This is annoying when you are running tests interactively. -- You can't program with it; `on.exit()` always works inside the *current* function so you can't wrap up repeated `on.exit()` code in a helper function. +- You can't program with it; `on.exit()` always works inside the *current* function, so you can't wrap up repeated `on.exit()` code in a helper function. To resolve these drawbacks, we use `withr::defer()`. ### `withr::defer()` -`withr::defer()` resolves the main drawbacks of `on.exit()`. First, it has the behaviour we want by default; no extra arguments needed: +`withr::defer()` resolves the main drawbacks of `on.exit()`. First, it has the behavior we want by default; no extra arguments needed: ```{r} neat <- function(x, sig_digits) { @@ -143,11 +180,11 @@ withr::deferred_run() #> [1] "hi" ``` -Finally, `withr::defer()` lets you pick which function to bind the clean up behaviour too. This makes it possible to create helper functions. +Finally, `withr::defer()` lets you pick which function to bind the cleanup behavior to. This makes it possible to create helper functions. ### "Local" helpers -Imagine we have many functions where we want to temporarily set the digits option. Wouldn't it be nice if we could write a helper function to automate? Unfortunately we can't write a helper with `on.exit()`: +Imagine we have many functions where we want to temporarily set the digits option. Wouldn't it be nice if we could write a helper function to automate this? Unfortunately, we can't write a helper with `on.exit()`: ```{r} local_digits <- function(sig_digits) { @@ -161,7 +198,7 @@ neater <- function(x, sig_digits) { neater(pi) ``` -This code doesn't work because the cleanup happens too soon, when `local_digits()` exits, not when `neat()` finishes. +This code doesn't work because the cleanup happens too soon, when `local_digits()` exits, not when `neater()` finishes. Fortunately, `withr::defer()` allows us to solve this problem by providing an `envir` argument that allows you to control when cleanup occurs. The exact details of how this works are rather complicated, but fortunately there's a common pattern you can use without understanding all the details. Your helper function should always have an `env` argument that defaults to `parent.frame()`, which you pass to the second argument of `defer()`: @@ -188,34 +225,27 @@ test_that("withr lets us write custom helpers for local state manipulation", { print(exp(1)) ``` -We always call these helper functions `local_`; "local" here refers to the fact that the state change persists only locally, for the lifetime of the associated function or test. - -### Pre-existing helpers +We always call these helper functions `local_*`; "local" here refers to the fact that the state change persists only locally, for the lifetime of the associated function or test. Another reason we call them "local" is that you can also use the `local()` function if you want to scope their effect to a smaller part of the test: -But before you write your own helper function, make sure to check out the wide range of local functions already provided by withr: +```{r} +test_that("local_options() only affects a minimal amount of code", { + withr::local_options(x = 1) + expect_equal(getOption("x"), 1) -| Do / undo this | withr function | -|-----------------------------|-------------------| -| Create a file | `local_file()` | -| Set an R option | `local_options()` | -| Set an environment variable | `local_envvar()` | -| Change working directory | `local_dir()` | + local({ + withr::local_options(x = 2) + expect_equal(getOption("x"), 2) + }) -We can use `withr::local_options()` to write yet another version of `neater()`: + expect_equal(getOption("x"), 1) +}) -```{r} -neatest <- function(x, sig_digits) { - withr::local_options(list(digits = sig_digits)) - print(x) -} -neatest(pi, 3) +getOption("x") ``` -Each `local_*()` function has a companion `with_()` function, which is a nod to `with()`, and the inspiration for withr's name. We won't use the `with_*()` functions here, but you can learn more about them at [withr.r-lib.org](https://withr.r-lib.org). - ## Test fixtures -Testing is often demonstrated with cute little tests and functions where all the inputs and expected results can be inlined. But in real packages, things aren't always so simple and functions often depend on other global state. For example, take this variant on `message()` that only shows a message if the `verbose` option is `TRUE`. How would you test that setting the option does indeed silence the message? +Testing is often demonstrated with cute little tests and functions where all the inputs and expected results can be inlined. But in real packages, things aren't always so simple, and functions often depend on global state. For example, take this variant on `message()` that only shows a message if the `verbose` option is `TRUE`. How would you test that setting the option does indeed silence the message? ```{r} message2 <- function(...) { @@ -237,13 +267,13 @@ message3 <- function(..., verbose = getOption("verbose")) { } ``` -Making external state explicit is often worthwhile, because it makes it more clear exactly what inputs determine the outputs of your function. But it's simply not possible in many cases. That's where test fixtures come in: they allow you to temporarily change global state in order to test your function. Test fixture is a pre-existing term in the software engineering world (and beyond): +Making external state explicit is often worthwhile because it makes clearer exactly what inputs determine the outputs of your function. But it's simply not possible in many cases. That's where test fixtures come in: they allow you to temporarily change global state to test your function. Test fixture is a pre-existing term in the software engineering world (and beyond): > A test fixture is something used to consistently test some item, device, or piece of software. > > --- [Wikipedia](https://en.wikipedia.org/wiki/Test_fixture) -A **test fixture** is just a `local_` function that you use to change state in such a way that you can reach inside and test parts of your code that would otherwise be challenging. For example, here's how you could use `withr::local_options()` as a test fixture to test `message2()`: +A **test fixture** is just a `local_*` function that you use to change state in such a way that you can reach inside and test parts of your code that would otherwise be challenging. For example, here's how you could use `withr::local_options()` as a test fixture to test `message2()`: ```{r} test_that("message2() output depends on verbose option", { @@ -259,7 +289,7 @@ test_that("message2() output depends on verbose option", { One place that we use test fixtures extensively is in the usethis package ([usethis.r-lib.org](https://usethis.r-lib.org)), which provides functions for looking after the files and folders in R projects, especially packages. Many of these functions only make sense in the context of a package, which means to test them, we also have to be working inside an R package. We need a way to quickly spin up a minimal package in a temporary directory, then test some functions against it, then destroy it. -To solve this problem we create a test fixture, which we place in `R/test-helpers.R` so that's it's available for both testing and interactive experimentation: +To solve this problem we create a test fixture, which we place in `R/test-helpers.R` so that it's available for both testing and interactive experimentation: ```{r, eval = FALSE} local_create_package <- function(dir = file_temp(), env = parent.frame()) { @@ -270,8 +300,7 @@ local_create_package <- function(dir = file_temp(), env = parent.frame()) { withr::defer(fs::dir_delete(dir), envir = env) # -A # change working directory - setwd(dir) # B - withr::defer(setwd(old_project), envir = env) # -B + withr::local_dir(dir, .local_envir = env) # B + -B # switch to new usethis project proj_set(dir) # C @@ -281,7 +310,7 @@ local_create_package <- function(dir = file_temp(), env = parent.frame()) { } ``` -Note that the cleanup automatically unfolds in the opposite order from the setup. Setup is `A`, then `B`, then `C`; cleanup is `-C`, then `-B`, then `-A`. This is important because we must create directory `dir` before we can make it the working directory; and we must restore the original working directory before we can delete `dir`; we can't delete `dir` while it's still the working directory! +Note that the cleanup automatically unfolds in the opposite order from the setup. Setup is `A`, then `B`, then `C`; cleanup is `-C`, then `-B`, then `-A`. This is important because we must create directory `dir` before we can make it the working directory, and we must restore the original working directory before we can delete `dir`—we can't delete `dir` while it's still the working directory! `local_create_package()` is used in over 170 tests. Here's one example that checks that `usethis::use_roxygen_md()` does the setup necessary to use roxygen2 in a package, with markdown support turned on. All 3 expectations consult the DESCRIPTION file, directly or indirectly. So it's very convenient that `local_create_package()` creates a minimal package, with a valid `DESCRIPTION` file, for us to test against. And when the test is done --- poof! --- the package is gone. @@ -302,7 +331,7 @@ So far we have applied our test fixture to individual tests, but it's also possi ### File -If you move the `local_()` call outside of a `test_that()` block, it will affect all tests that come after it. This means that by calling the test fixture at the top of the file you can change the behaviour for all tests. This has both advantages and disadvantages: +If you move the `local_*()` call outside of a `test_that()` block, it will affect all tests that come after it. This means that by calling the test fixture at the top of the file, you can change the behavior for all tests. This has both advantages and disadvantages: - If you would otherwise have called the fixture in every test, you've saved yourself a bunch of work and duplicate code. @@ -326,8 +355,8 @@ Setup code is typically best used to create external resources that are needed b ## Other challenges -A collection of miscellaneous problems that I don't know where else to describe: +A collection of miscellaneous problems that don't fit elsewhere: -- There are a few base functions that are hard to test because they depend on state that you can't control. One such example is `interactive()`: there's no way to write a test fixture that allows you to pretend that interactive is either `TRUE` or `FALSE`. So we now usually use `rlang::is_interactive()` which can be controlled by the `rlang_interactive` option. +- There are a few base functions that are hard to test because they depend on state that you can't control. One such example is `interactive()`: there's no way to write a test fixture that allows you to pretend that interactive is either `TRUE` or `FALSE`. So we now usually use `rlang::is_interactive()`, which can be controlled by the `rlang_interactive` option. -- If you're using a test fixture in a function, be careful about what you return. For example, if you write a function that does `dir <- create_local_package()` you shouldn't return `dir`, because after the function returns the directory will no longer exist. +- If you're using a test fixture in a function, be careful about what you return. For example, if you write a function that does `dir <- create_local_package()`, you shouldn't return `dir`, because after the function returns, the directory will no longer exist.