The mtest library makes it easy to write and run tests for C++ code. There is no macro magic, just regular functions for you to call as you see fit. The syntax for creating tests is directly adopted from the JavaScript test framework "Mocha".
mtest has been developed by Mario Laux.
mtest is a single-file header-only library with no external dependencies for C++11 or higher. You can simply #include "mtest.h"
and you are good to go. The standard library header <thread>
is required. Everything is defined in the namespace mtest
, so you can either use using namespace mtest;
or specify the namespace explicitly where needed.
The header file contains the documentation of the public API, but the following quickstart guide should give you a good enough overview.
All assets, this tutorial, and the header file are all published under the MIT license. Internal benchmarks and tests are not part of this public repository.
describe("arithmetic tests", [] {
it("should know when a number is even", [] {
assert_ok(42 % 2 == 0);
});
it("should multiply correctly", [] {
assert_eq(6 * 7, 42);
assert_eq(-3 * -2, 6);
});
it("should be able to take square roots", [] {
assert_near(std::sqrt(4.), 2., 0.0001);
});
});
std::cout << run_all_tests();
In mtest, a test can be created from a description and any callable f
by calling it
. A lambda [] { ... }
will likely be the most convenient way to create such a callable. The test fails if f()
throws an exception and passes otherwise. Assertions are a convenient way to create exceptions in case a requirement is not met (assert_ok
, assert_eq
, assert_near
, assert_throws
).
Tests with a common theme can optionally be grouped into test suites by creating them from inside a call to describe
.
The generated test report will look like this (timing will vary, coloring depends on the console):
╒══════ 3 tests (1.13µs total)
arithmetic tests
✔ should know when a number is even (84ns)
✔ should multiply correctly (41ns)
✔ should be able to take square roots (42ns)
╘══════ all 3 tests passed.
If you have many computationally intensive tests, you can run them in parallel to save some time: just pass the desired amount of threads to run_all_tests
.
mtest doesn't impose structure. Its tools are just regular C++ functions, not declarations. Want conditional tests? Call it
from inside an if
. Need to generate assertions dynamically? Use a loop. It's your code.
Automated tests should be written after the initial experimentation and development phase, but before the code is used as an important building block elsewhere. Try to keep your tests simple. Some thoughts to consider:
Making changes to an implementation is easier with automated tests in place: Simply re-run the tests to ensure that the modifications did not break existing functionality. However, existing tests make it harder to change interfaces: Every change in how a function is called or how an object is used will incur the need to refactor the corresponding tests as well.
When writing brand-new code, interfaces and even the exact requirements change frequently. Printing values to the console or inspecting them at a debugger breakpoint can then be the preferred method to gauge the behavior of the new code.
At some point, we have to make the decision that the current interfaces are good enough (at least for now) and that we want to move on. If we intend to rely on the current code in the future, we should consider writing automated tests:
- If it turns out that the code is hard to test, it might very well be hard to use. If tests keep using the same pattern to work around an inconvenient interface, some adaptations might be in order.
- Writing documentation can go hand in hand with writing tests. Every feature or behavior that is described in the documentation is a good candidate for being tested.
- As mentioned earlier, future changes to the implementation will become easier as their correctness can easily be checked.
For examples to make sense, we need a certain amount of complexity: more than "accidentally" implementing add(int a, int b)
with a minus sign.
As a source of errors, we use the following "prime generator":
// turn n into a prime number
int make_prime(int n) {
return 41 + n * (n + 1);
}
This function does not really return a prime number for all inputs, but it performs surprisingly well: For inputs between -40 and 39 the output is in fact prime (and for many other inputs as well). This is our simulation of a subtle bug.
Further, we use three mathematical utility functions with intentionally cryptic implementations (no need to understand them):
// test whether n is prime
bool is_prime(int n) {
if (n == 2 || n == 3) return true;
if (n <= 1 || n % 2 == 0 || n % 3 == 0) return false;
for (int i = 5; i * i <= n; i += 6) {
if (n % i == 0 || n % (i + 2) == 0) return false;
}
return true;
}
// compute the greatest common divisor of a and b
int gcd(int a, int b) {
while (b != 0) { int h = a % b; a = b; b = h; }
return (a < 0) ? -a : a;
}
// for n >= 0 approximate sqrt(n), for n < 0 throw
float sqrt(int n) {
if (n < 0) throw std::domain_error("n < 0");
if (n == 0) return 0.;
int i = n, m = 1;
do m *= 2; while ((i /= 4) > 0);
auto x = static_cast<float>(n) / (m * m), y = x;
for (i = 0; i < 4; ++i) y = .5f * (y + x / y);
return m * y;
}
is_prime
is a brute-force primality test with some mild optimization. gcd
is an implementation of Euclid's algorithm to compute the greatest common divisor. sqrt
uses Heron's method with a fixed number of iterations to approximate the square root of a given integer.
An assertion checks whether a certain statement is correct. If it is, the assertion does nothing (it passes). If it is not, it throws an exception containing information about the statement in question (it fails).
The assertions in mtest always accept all of their arguments by read-only reference. Failing assertions always throw an instance of std::runtime_error
. There are four kinds of assertions provided:
assert_ok(value)
checks whethervalue
is true.assert_eq(actual, expected)
checks whetheractual
andexpected
compare equal, i.e. whetheractual == expected
is true.assert_near(actual, expected, max_distance)
checks whetheractual
andexpected
are at mostmax_distance
apart. Due to their finite precision, this is most useful for comparing floating-point numbers.assert_throws(functor)
checks whetherfunctor()
throws, whileassert_throws<ExceptionType>(functor)
additionally checks whether the thrown exception can be caught asExceptionType const &
.
Here are some examples for passing assertions:
assert_ok(is_prime(17));
assert_eq(gcd(6, 15), 3);
assert_near(sqrt(2), 1.4142f, 0.0001f);
assert_throws([] { sqrt(-1); });
As a first source of inspiration for what to test, one can try to answer the following questions:
- What are the simplest yet usual cases?
- What might be odd, but still simple edge cases that are not explicitly forbidden by the documentation?
- What would be a real-world use case?
// simple cases
it("should classify small numbers correctly", [] {
assert_ok(!is_prime(0));
assert_ok(!is_prime(1));
assert_ok(is_prime(2));
assert_ok(is_prime(3));
assert_ok(!is_prime(4));
assert_ok(is_prime(5));
assert_ok(!is_prime(6));
});
// slightly unusual edge case
it("should not classify negative numbers as prime", [] {
assert_ok(!is_prime(-2));
assert_ok(!is_prime(-7));
assert_ok(!is_prime(-100));
});
// real-world use case
it("should recognize big primes", [] {
assert_ok(is_prime(7919));
assert_ok(is_prime(48611));
assert_ok(is_prime(104729));
assert_ok(is_prime(1299709));
assert_ok(is_prime(15485863));
assert_ok(is_prime(179424673));
assert_ok(!is_prime(7919 * 104729));
assert_ok(!is_prime(27449 * 27457));
});
First, we try to come up with the simplest inputs, for which we can easily predict the desired output. Second, we actively try to break the code by thinking about unusual but technically allowed inputs. This can often lead to the discovery of unhandled cases or missing documentation. In this case:
- If we strongly believe that "prime" unambiguously implies "positive", then the documentation and the test are fine.
- If the above behavior is desired, but there is doubt about whether it is implied by the documentation, the documentation should be updated.
- If negative inputs should not be allowed, then the test needs to be removed and the documentation has to be updated in one of two ways:
is_prime
could explicitly produce an error for negative inputs, e.g. by throwing an exception (then the implementation would have to be updated as well).- If we cannot afford an extra check or throwing an exception, negative inputs could be declared to cause undefined behavior. Then it would be the caller's duty to make sure it never happens.
Lastly, we try to work out some explicit real-world cases. For is_prime
, we could simply look up some big primes online. Finding realistic test cases can be a lot of work. Some strategies are discussed later in the advanced topics section.
Here are analogous examples for gcd
and sqrt
:
it("must compute the gcds of small numbers correctly", [] {
assert_eq(gcd(21, 14), 7);
assert_eq(gcd(8, 16), 8);
assert_eq(gcd(-9, 12), 3);
assert_eq(gcd(8, -12), 4);
assert_eq(gcd(-20, -35), 5);
});
it("should handle the edge cases of gcd correctly", [] {
assert_eq(gcd(0, 0), 0);
assert_eq(gcd(0, 1), 1);
assert_eq(gcd(1, 0), 1);
assert_eq(gcd(1, 1), 1);
});
it("should find the gcd of bigger numbers", [] {
// first argument prime
assert_eq(gcd(179424673, 129832322), 1);
// all factors prime
assert_eq(gcd(13 * 17 * 19, 17 * 19 * 23), 17 * 19);
assert_eq(gcd(7919 * 104729, 7919 * 27449), 7919);
});
it("should approximate reasonably for small inputs", [] {
constexpr float max_error = 0.001;
assert_near(sqrt(0), 0.f, max_error);
assert_near(sqrt(1), 1.f, max_error);
assert_near(sqrt(2), 1.4142136f, max_error);
assert_near(sqrt(3), 1.7320508f, max_error);
assert_near(sqrt(4), 2.f, max_error);
});
it("should throw for negative inputs", [] {
assert_throws([] { sqrt(-1); });
assert_throws([] { sqrt(-25514); });
});
it("must approximate reasonably for larger inputs", [] {
assert_near(sqrt(100), 10.f, 0.01f);
assert_near(sqrt(169), 13.f, 0.01f);
assert_near(sqrt(213 * 213), 213.f, 0.1f);
assert_near(sqrt(8933 * 8933), 8933.f, 1.f);
auto square = [] (float f) { return f * f; };
assert_near(square(sqrt(1234)), 1234.f, 1.f);
assert_near(square(sqrt(27412)), 27412.f, 10.f);
});
While the behavior of gcd
is pretty clearly defined, the documentation of sqrt
does not say anything about the precision of the approximation. This is not ideal, but hard to fix. In this case, we use the tests to define some minimal requirements for the output to make sense. We want to test for conceptual errors, not for precision. As such, we accept sqrt(100)
to be 10.008, but a result of 9.9 would indicate a fundamental problem with the algorithm (9.9 would be a better approximation for sqrt(98)
after all).
Our prime generator make_prime
has very little specified behavior, and so we cannot craft many tests based on its documentation. Here's what we might do (remember, this is our simulation of a subtle bug that is hard to find):
it("should only make primes", [] {
assert_ok(is_prime(make_prime(0)));
assert_ok(is_prime(make_prime(1)));
assert_ok(is_prime(make_prime(2)));
assert_ok(is_prime(make_prime(5)));
assert_ok(is_prime(make_prime(20)));
assert_ok(is_prime(make_prime(80)));
assert_ok(is_prime(make_prime(100)));
assert_ok(is_prime(make_prime(-100)));
});
As the number of tests grows, it can become helpful to wrap related tests in a call to describe
to structure the code (and the test report). In this case, a natural choice for the name of a test suite is the name of the function being tested:
describe("is_prime", [] {
// ...
});
describe("gcd", [] {
// ...
});
describe("sqrt", [] {
// ...
});
describe("make_prime", [] {
// ...
});
std::cout << run_all_tests();
Test suites (i.e. calls to describe
) can be nested arbitrarily. As a general guarantee, all functions provided by mtest can be called from everywhere.
Calling run_all_tests
produces a test report, which can be inserted into any output stream (std::ostream
, std::stringstream
, std::ofstream
, ...) using operator<<
.
Under the hood, mtest maintains a global test registry. Calls to it
and describe
both communicate with that registry, which is how a test knows as part of which suite it is being made. Note that describe
executes its second argument immediately and it
does not: tests are executed once run_all_tests
is called. Each test will be executed only once and the produced test report will take ownership of the result. Subsequent calls to run_all_tests
will not pick up the same tests again.
For the call above, the output could look like this:
╒══════ 10 tests (82.3µs total)
is_prime
✔ should classify small numbers correctly (39ns)
✔ should not classify negative numbers as prime (43ns)
✔ should recognize big primes (11.5µs)
gcd
✔ must compute the gcds of small numbers correctly (125ns)
✔ should handle the edge cases of gcd correctly (41ns)
✔ should find the gcd of bigger numbers (209ns)
sqrt
✔ should approximate reasonably for small inputs (41ns)
✔ should throw for negative inputs (69.9µs)
✔ must approximate reasonably for larger inputs (42ns)
make_prime
✔ should only make primes (40ns)
╘══════ all 10 tests passed.
The duration in the first line is the total runtime of run_all_tests
. The runtime of each individual passing test is also shown. Note that this is not a proper benchmark and that the measurements will vary between runs. Nevertheless, the figures can be interesting as a rough guide: We can clearly see that is_prime
takes significantly longer for large inputs.
The formatting of a test report can be controlled by five properties:
.indent
: the number of spaces per level of indentation in the tree-like test overview (2 by default)..use_color
: whether to color the output (true by default). This is usually desired for printing to the console, but must probably be switched off for printing to a file..show_overview
: whether the tree-like summary of the executed tests is shown (true by default). If disabled, only the reports for the failed tests are displayed..show_passing_tests
: whether passing tests are listed (true by default)..show_empty_suites
: whether test suites that would appear empty are shown in the tree-like overview (false by default). A test suite may appear empty if all tests in it passed and passing tests are not shown.
For example, we could have done the following to reduce the amount of output:
auto report = run_all_tests();
report.show_passing_tests = false;
std::cout << report;
Since test suites that appear empty are not shown by default, the output would have then looked like this:
╒══════ 10 tests (82.3µs total)
╘══════ all 10 tests passed.
Tests can optionally be run in parallel, which will matter if there are many computationally intensive tests. To this aim, run_all_tests
accepts the number of desired threads as an optional argument (which defaults to 1). Executing tests in parallel does not affect the order in which they are listed in the report.
So far, we have not yet detected that make_prime
is flawed. Explicitly chosen examples are not likely to catch rare bugs. Luckily, we can generate more exhaustive tests programmatically:
it("should only make primes for inputs 0-99", [] {
for (int i = 0; i < 100; ++i) {
assert_ok(is_prime(make_prime(i)));
}
});
This test finally fails, but the error message is not very helpful:
╒══════ 1 test (15.6µs total)
✘ should only make primes for inputs 0-99 [1]
───────
[1] should only make primes for inputs 0-99
not ok
╘══════ the test failed.
This is because there is no way for assert_ok
to know how its first argument (which is a bool
) was made. We can add the missing information to the assertions in question:
it("should only make primes for inputs 0-99", [] {
for (int i = 0; i < 100; ++i) {
assert_ok(
is_prime(make_prime(i)),
"not prime for i=" + std::to_string(i)
);
}
});
Now, instead of "not ok", the error message will read "not prime for i=40". This also represents the typical workflow: We use plain assertions first, and only once a test fails do we care about creating diagnostic error messages.
In mtest, every assertion accepts an additional std::string const &
argument to be used as the error message in case of failure.
It would be nice if assertions could generate more helpful error messages automatically, something along the lines of "is_prime(make_prime(40)) evaluated to false". At its core, this would require access to the test's syntax tree, which many test frameworks approximate using macros or user-defined literals. However, such band-aids often complicate usage and syntax to a degree where an explicitly built std::string
like in mtest's assertions is just simpler.
For pure functions handling a well-defined task, explicit tests will detect the majority of errors. But there are (many) cases where a higher confidence in the correctness of an algorithm is required or where explicit or simple tests are hard to find.
- If we need library-quality code or heavily rely on
gcd
as a building block of other algorithms, we would like to increase our confidence in the correctness: Could computations over- or underflow? Does the algorithm always terminate? Does it take unacceptably long to execute for certain inputs? Does it handle all combinations of positive and negative numbers correctly? - For an edge-detection algorithm operating on images, it might be hard to define what the acceptable outputs are. Similarly for
sqrt
, the line between acceptable and unacceptable precision is hard to draw. - Let's consider an event system where different threads can concurrently send and receive messages. Such a system is not a single function, but rather reacts to indeterminately sequenced inputs, constantly interacting with its users. How to test this?
A possible strategy is to look for symmetries and algebraic properties:
- For example,
gcd
must be symmetric and associative, i.e. gcd(a, b) = gcd(b, a) and also gcd(a, gcd(b, c)) = gcd(gcd(a, b), c). These properties must be satisfied for all inputs, and we don't have to know any of the outputs explicitly to state this as an assertion. - Assume the edge-detector
ed
consumes an image and returns a binary image with only the edge-pixels marked. Then we might expected(flip(image))
to matchflip(ed(image))
, whereflip
simply turns an image upside-down. Again, we could state this as an assertion without having to know any of the outputs explicitly. - An event system like described above would first have to run for a while and each user would need to keep their own log of sent and received messages. What would have to be true for the data collected this way? Sequential consistency could be a reasonable requirement: If a user has sent message A before message B, then message B must have never been received before A by any single user.
Once we have discovered a meaningful property that must be satisfied for all inputs, we can programmatically draw samples from the input space and verify the property each time.
Generating example input usually requires access to random numbers. In mtest, the templates int_rng
and real_rng
are designed to do just that: They provide a deterministic sequence of pseudo-random numbers in a user-defined range. Should different sequences in the same range be required, an initial seed value can be specified. This deterministic behavior is important, because otherwise we can never be sure whether a previously observed error has been fixed or simply did not occur this time.
Now let's have a look at some examples:
it("must be associative", [] {
// generate random integers in [-MAX, MAX]
constexpr int MAX = 1000000000;
int_rng<int> rng(-MAX, MAX);
for (int i = 0; i < 1000; ++i) {
int a = rng(), b = rng(), c = rng();
assert_eq(gcd(a, gcd(b, c)), gcd(gcd(a, b), c));
}
});
it("must be consistently accurate", [] {
// generate random integers in [1, 10^9]
// (output type "int")
int_rng<int> rng(1, 1000000000);
// test whether |a - √n| < |b - √n| for a, b >= 0
auto is_better = [](int n, float a, float b) {
if (a == b) return false;
auto m = .5 * (a + b);
m *= m;
return (a < b) ? n < m : m < n;
};
for (int i = 1; i < 1000; ++i) {
// sqrt(n) must be closer to real value √n
// than sqrt(n-d) and sqrt(n+d)
int n = rng();
int d = n / 10000 + 1;
auto f = sqrt(n);
assert_ok(is_better(n, f, sqrt(n - d)));
assert_ok(is_better(n, f, sqrt(n + d)));
}
});
Thanks to class template argument deduction (CTAD), you can omit the template arguments starting from C++17 and simply write int_rng rng(-MAX, MAX)
, etc.
The check for associativity simply verifies the property for random triples of inputs. Note that int_rng
generates uniformly distributed integers: In the case above, this will result in big numbers almost always.
The check of the square root algorithm is more involved, and it showcases how testing can become difficult. In this case, we had to come up with a notion of what it means for the results to be consistent, taking into account the limited floating-point precision. Neither the implementation of is_better
nor the choice of an appropriate d
is obvious. This kind of testing will often require additional non-trivial code.
Tests often share a specific setup, a common set of example inputs or some locally defined helper functions, especially if they belong to the same test suite. Being created from a function object, tests in mtest can naturally have state:
describe("gcd algebraic properties", [] {
// "interesting" example inputs
std::vector<int> const samples {
0, 1, 2, 3, 4, 5, 6, 7,
8, 16, 32, 64, 128, 1024,
9, 25, 36, 49, 81, 144, 9801,
7919, 27449, 104729
};
it("should produce a common divisor", [=] {
for (int a : samples) {
for (int b : samples) {
int d = gcd(a, b);
if (d == 0) continue;
assert_eq(a % d, 0);
assert_eq(b % d, 0);
}
}
});
it("must be symmetric", [=] {
// gcd(a, b) = gcd(b, a)
for (int a : samples)
for (int b : samples)
assert_eq(gcd(a, b), gcd(b, a));
});
it("must be homogeneous", [=] {
// gcd(m * a, m * b) = |m| * gcd(a, b)
for (int a : samples)
for (int b : samples)
for (int m = -5; m <= 5; ++m)
assert_eq(
gcd(m * a, m * b),
std::abs(m) * gcd(a, b)
);
});
});
Note that each test captures a copy of samples
such that they are all independent and self-contained. Remember that the tests are generated immediately, but executed later (describe
immediately executes its second argument, it
just schedules the test for later).
If you need to process the test results in a custom way, you can use the test report's .visit
member function. Let's say you have created the following tree of tests:

The numbers above indicate in what order the individual items are visited (depth-first order). Every test suite is visited twice, once when it is entered and once when it is exited. Individual test results are either a pass or a fail. The visitor needs to handle those four cases by overloading operator()
to accept the corresponding information. Let's say we just want to count the total number of test suites and the total number of failed tests:
struct CustomVisitor {
std::size_t nr_of_suites { 0 };
std::size_t nr_of_fails { 0 };
// called when entering a suite
void operator()(std::string const & suite_description) {
++nr_of_suites;
}
// called when exiting a suite
void operator()() {
}
// called for passed tests
void operator()(
std::string const & test_description,
std::chrono::nanoseconds const & duration
) {
}
// called for failed tests
void operator()(
std::string const & test_description,
std::exception_ptr const & exc
) {
++nr_of_fails;
}
};
The custom type CustomVisitor
provides four overloads of operator()
:
- If a suite is entered, the visitor is called with just the description of the suite.
- If a suite is exited (once all its child elements have been visited), the visitor is called without any arguments.
- If a passed test is encountered, the visitor is called with the description of the test and how long that test took to execute.
- If a failed test is encountered, the visitor is called with the description of the test and an exception pointer holding the information about what went wrong.
The .visit
member function returns the visitor after it has visited the whole tree:
auto report = run_all_tests();
auto cv = report.visit(CustomVisitor());
std::cout << cv.nr_of_suites << " suites, "
<< cv.nr_of_fails << " fails";
If the tests are set up like in the tree shown above, this example will print "3 suites, 2 fails".
The visitation feature exposes all the information contained in a test report, allowing for arbitrary post-processing of the results. A visitor could, for example, convert a test report to an XML or JSON representation. For convenience, every test report directly exposes some statistics via the getter methods .pass_count()
, .fail_count()
, .total_duration()
, .nr_of_threads()
, .max_nr_of_threads()
.
mtest has been designed with thread-safety in mind such that there are no restrictions as to when and where tests may be created or run. Creating a test or a test suite is always lock-free and does not require any synchronization, even in the case where run_all_tests
is called concurrently. If you really wanted to, you could run tests and schedule new tests from inside a test.
Each thread accumulates test suites separately and only commits them to the global test registry once the outermost test suite is complete. This ensures that test suites only ever appear in test reports as a whole.