mirror of
https://github.com/catchorg/Catch2.git
synced 2025-08-01 12:55:40 +02:00
Integrate Nonius benchmark into Catch2
Changes done to Nonius: * Moved things into "Catch::Benchmark" namespace * Benchmarks were integrated with `TEST_CASE`/`SECTION`/`GENERATE` macros * Removed Nonius's parameters for benchmarks, Generators should be used instead * Added relevant methods to the reporter interface (default-implemented, to avoid breaking existing 3rd party reporters) * Async processing is guarded with `_REENTRANT` macro for GCC/Clang, used by default on MSVC * Added a macro `CATCH_CONFIG_DISABLE_BENCHMARKING` that removes all traces of benchmarking from Catch
This commit is contained in:

committed by
Martin Hořeňovský

parent
00347f1e79
commit
ce2560ca95
249
docs/benchmarks.md
Normal file
249
docs/benchmarks.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Authoring benchmarks
|
||||
|
||||
Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
|
||||
always need to take care about various aspects. Understanding a few things about
|
||||
the way Catch runs your code will be very helpful when writing your benchmarks.
|
||||
|
||||
First off, let's go over some terminology that will be used throughout this
|
||||
guide.
|
||||
|
||||
- *User code*: user code is the code that the user provides to be measured.
|
||||
- *Run*: one run is one execution of the user code.
|
||||
- *Sample*: one sample is one data point obtained by measuring the time it takes
|
||||
to perform a certain number of runs. One sample can consist of more than one
|
||||
run if the clock available does not have enough resolution to accurately
|
||||
measure a single run. All samples for a given benchmark execution are obtained
|
||||
with the same number of runs.
|
||||
|
||||
## Execution procedure
|
||||
|
||||
Now I can explain how a benchmark is executed in Catch. There are three main
|
||||
steps, though the first does not need to be repeated for every benchmark.
|
||||
|
||||
1. *Environmental probe*: before any benchmarks can be executed, the clock's
|
||||
resolution is estimated. A few other environmental artifacts are also estimated
|
||||
at this point, like the cost of calling the clock function, but they almost
|
||||
never have any impact in the results.
|
||||
|
||||
2. *Estimation*: the user code is executed a few times to obtain an estimate of
|
||||
the amount of runs that should be in each sample. This also has the potential
|
||||
effect of bringing relevant code and data into the caches before the actual
|
||||
measurement starts.
|
||||
|
||||
3. *Measurement*: all the samples are collected sequentially by performing the
|
||||
number of runs estimated in the previous step for each sample.
|
||||
|
||||
This already gives us one important rule for writing benchmarks for Catch: the
|
||||
benchmarks must be repeatable. The user code will be executed several times, and
|
||||
the number of times it will be executed during the estimation step cannot be
|
||||
known beforehand since it depends on the time it takes to execute the code.
|
||||
User code that cannot be executed repeatedly will lead to bogus results or
|
||||
crashes.
|
||||
|
||||
## Benchmark specification
|
||||
|
||||
Benchmarks can be specified anywhere inside a Catch test case.
|
||||
There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
|
||||
|
||||
Let's have a look how a naive Fibonacci implementation could be benchmarked:
|
||||
```c++
|
||||
std::uint64_t Fibonacci(std::uint64_t number) {
|
||||
return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
|
||||
}
|
||||
```
|
||||
Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
|
||||
```c++
|
||||
TEST_CASE("Fibonacci") {
|
||||
CHECK(Fibonacci(0) == 1);
|
||||
// some more asserts..
|
||||
CHECK(Fibonacci(5) == 8);
|
||||
// some more asserts..
|
||||
|
||||
// now let's benchmark:
|
||||
BENCHMARK("Fibonacci 20") {
|
||||
return Fibonacci(20);
|
||||
};
|
||||
|
||||
BENCHMARK("Fibonacci 25") {
|
||||
return Fibonacci(25);
|
||||
};
|
||||
|
||||
BENCHMARK("Fibonacci 30") {
|
||||
return Fibonacci(30);
|
||||
};
|
||||
|
||||
BENCHMARK("Fibonacci 35") {
|
||||
return Fibonacci(35);
|
||||
};
|
||||
}
|
||||
```
|
||||
There's a few things to note:
|
||||
- As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
|
||||
the closing brace (as opposed to the first experimental version).
|
||||
- The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
|
||||
|
||||
Running this already runs the benchmarks and outputs something similar to:
|
||||
```
|
||||
-------------------------------------------------------------------------------
|
||||
Fibonacci
|
||||
-------------------------------------------------------------------------------
|
||||
C:\path\to\Catch2\Benchmark.tests.cpp(10)
|
||||
...............................................................................
|
||||
benchmark name samples iterations estimated
|
||||
mean low mean high mean
|
||||
std dev low std dev high std dev
|
||||
-------------------------------------------------------------------------------
|
||||
Fibonacci 20 100 416439 83.2878 ms
|
||||
2 ns 2 ns 2 ns
|
||||
0 ns 0 ns 0 ns
|
||||
|
||||
Fibonacci 25 100 400776 80.1552 ms
|
||||
3 ns 3 ns 3 ns
|
||||
0 ns 0 ns 0 ns
|
||||
|
||||
Fibonacci 30 100 396873 79.3746 ms
|
||||
17 ns 17 ns 17 ns
|
||||
0 ns 0 ns 0 ns
|
||||
|
||||
Fibonacci 35 100 145169 87.1014 ms
|
||||
468 ns 464 ns 473 ns
|
||||
21 ns 15 ns 34 ns
|
||||
```
|
||||
|
||||
### Advanced benchmarking
|
||||
The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
|
||||
However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
|
||||
the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
|
||||
while the blocks of the advanced benchmarks are invoked exactly twice:
|
||||
once during the estimation phase, and another time during the execution phase.
|
||||
|
||||
```c++
|
||||
BENCHMARK("simple"){ return long_computation(); };
|
||||
|
||||
BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
|
||||
set_up();
|
||||
meter.measure([] { return long_computation(); });
|
||||
};
|
||||
```
|
||||
|
||||
These advanced benchmarks no longer consist entirely of user code to be measured.
|
||||
In these cases, the code to be measured is provided via the
|
||||
`Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
|
||||
kind of state that might be required for the benchmark but is not to be included
|
||||
in the measurements, like making a vector of random integers to feed to a
|
||||
sorting algorithm.
|
||||
|
||||
A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
|
||||
by invoking the callable object passed in as many times as necessary. Anything
|
||||
that needs to be done outside the measurement can be done outside the call to
|
||||
`measure`.
|
||||
|
||||
The callable object passed in to `measure` can optionally accept an `int`
|
||||
parameter.
|
||||
|
||||
```c++
|
||||
meter.measure([](int i) { return long_computation(i); });
|
||||
```
|
||||
|
||||
If it accepts an `int` parameter, the sequence number of each run will be passed
|
||||
in, starting with 0. This is useful if you want to measure some mutating code,
|
||||
for example. The number of runs can be known beforehand by calling
|
||||
`Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
|
||||
mutated by each run.
|
||||
|
||||
```c++
|
||||
std::vector<std::string> v(meter.runs());
|
||||
std::fill(v.begin(), v.end(), test_string());
|
||||
meter.measure([&v](int i) { in_place_escape(v[i]); });
|
||||
```
|
||||
|
||||
Note that it is not possible to simply use the same instance for different runs
|
||||
and resetting it between each run since that would pollute the measurements with
|
||||
the resetting code.
|
||||
|
||||
It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get
|
||||
the same semantics as providing a callable to `meter.measure` with `int` argument:
|
||||
|
||||
```c++
|
||||
BENCHMARK("indexed", i){ return long_computation(i); };
|
||||
```
|
||||
|
||||
### Constructors and destructors
|
||||
|
||||
All of these tools give you a lot mileage, but there are two things that still
|
||||
need special handling: constructors and destructors. The problem is that if you
|
||||
use automatic objects they get destroyed by the end of the scope, so you end up
|
||||
measuring the time for construction and destruction together. And if you use
|
||||
dynamic allocation instead, you end up including the time to allocate memory in
|
||||
the measurements.
|
||||
|
||||
To solve this conundrum, Catch provides class templates that let you manually
|
||||
construct and destroy objects without dynamic allocation and in a way that lets
|
||||
you measure construction and destruction separately.
|
||||
|
||||
```c++
|
||||
BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter)
|
||||
{
|
||||
std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
|
||||
meter.measure([&](int i) { storage[i].construct("thing"); });
|
||||
})
|
||||
|
||||
BENCHMARK_ADVANCED("destroy", [](Catch::Benchmark::Chronometer meter)
|
||||
{
|
||||
std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
|
||||
for(auto&& o : storage)
|
||||
o.construct("thing");
|
||||
meter.measure([&](int i) { storage[i].destruct(); });
|
||||
})
|
||||
```
|
||||
|
||||
`Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
|
||||
objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
|
||||
create an object in that storage. So if you want to measure the time it takes
|
||||
for a certain constructor to run, you can just measure the time it takes to run
|
||||
this function.
|
||||
|
||||
When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
|
||||
constructed there it will be automatically destroyed, so nothing leaks.
|
||||
|
||||
If you want to measure a destructor, though, we need to use
|
||||
`Catch::Benchmark::destructable_object<T>`. These objects are similar to
|
||||
`Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
|
||||
it does not destroy anything automatically. Instead, you are required to call
|
||||
the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
|
||||
can use to measure the destruction time.
|
||||
|
||||
### The optimizer
|
||||
|
||||
Sometimes the optimizer will optimize away the very code that you want to
|
||||
measure. There are several ways to use results that will prevent the optimiser
|
||||
from removing them. You can use the `volatile` keyword, or you can output the
|
||||
value to standard output or to a file, both of which force the program to
|
||||
actually generate the value somehow.
|
||||
|
||||
Catch adds a third option. The values returned by any function provided as user
|
||||
code are guaranteed to be evaluated and not optimised out. This means that if
|
||||
your user code consists of computing a certain value, you don't need to bother
|
||||
with using `volatile` or forcing output. Just `return` it from the function.
|
||||
That helps with keeping the code in a natural fashion.
|
||||
|
||||
Here's an example:
|
||||
|
||||
```c++
|
||||
// may measure nothing at all by skipping the long calculation since its
|
||||
// result is not used
|
||||
BENCHMARK("no return"){ long_calculation(); };
|
||||
|
||||
// the result of long_calculation() is guaranteed to be computed somehow
|
||||
BENCHMARK("with return"){ return long_calculation(); };
|
||||
```
|
||||
|
||||
However, there's no other form of control over the optimizer whatsoever. It is
|
||||
up to you to write a benchmark that actually measures what you want and doesn't
|
||||
just measure the time to do a whole bunch of nothing.
|
||||
|
||||
To sum up, there are two simple rules: whatever you would do in handwritten code
|
||||
to control optimization still works in Catch; and Catch makes return values
|
||||
from user code into observable effects that can't be optimized away.
|
||||
|
||||
<i>Adapted from nonius' documentation.</i>
|
@@ -20,7 +20,10 @@
|
||||
[Specify a seed for the Random Number Generator](#specify-a-seed-for-the-random-number-generator)<br>
|
||||
[Identify framework and version according to the libIdentify standard](#identify-framework-and-version-according-to-the-libidentify-standard)<br>
|
||||
[Wait for key before continuing](#wait-for-key-before-continuing)<br>
|
||||
[Specify multiples of clock resolution to run benchmarks for](#specify-multiples-of-clock-resolution-to-run-benchmarks-for)<br>
|
||||
[Specify the number of benchmark samples to collect](#specify-the-number-of-benchmark-samples-to-collect)<br>
|
||||
[Specify the number of benchmark resamples for bootstrapping](#specify-the-number-of-resamples-for-bootstrapping)<br>
|
||||
[Specify the confidence interval for bootstrapping](#specify-the-confidence-interval-for-bootstrapping)<br>
|
||||
[Disable statistical analysis of collected benchmark samples](#disable-statistical-analysis-of-collected-benchmark-samples)<br>
|
||||
[Usage](#usage)<br>
|
||||
[Specify the section to run](#specify-the-section-to-run)<br>
|
||||
[Filenames as tags](#filenames-as-tags)<br>
|
||||
@@ -57,7 +60,10 @@ Click one of the following links to take you straight to that option - or scroll
|
||||
<a href="#rng-seed"> ` --rng-seed`</a><br />
|
||||
<a href="#libidentify"> ` --libidentify`</a><br />
|
||||
<a href="#wait-for-keypress"> ` --wait-for-keypress`</a><br />
|
||||
<a href="#benchmark-resolution-multiple"> ` --benchmark-resolution-multiple`</a><br />
|
||||
<a href="#benchmark-samples"> ` --benchmark-samples`</a><br />
|
||||
<a href="#benchmark-resamples"> ` --benchmark-resamples`</a><br />
|
||||
<a href="#benchmark-confidence-interval"> ` --benchmark-confidence-interval`</a><br />
|
||||
<a href="#benchmark-no-analysis"> ` --benchmark-no-analysis`</a><br />
|
||||
<a href="#use-colour"> ` --use-colour`</a><br />
|
||||
|
||||
</br>
|
||||
@@ -267,13 +273,40 @@ See [The LibIdentify repo for more information and examples](https://github.com/
|
||||
Will cause the executable to print a message and wait until the return/ enter key is pressed before continuing -
|
||||
either before running any tests, after running all tests - or both, depending on the argument.
|
||||
|
||||
<a id="benchmark-resolution-multiple"></a>
|
||||
## Specify multiples of clock resolution to run benchmarks for
|
||||
<pre>--benchmark-resolution-multiple <multiplier></pre>
|
||||
<a id="benchmark-samples"></a>
|
||||
## Specify the number of benchmark samples to collect
|
||||
<pre>--benchmark-samples <# of samples></pre>
|
||||
|
||||
When running benchmarks the clock resolution is estimated. Benchmarks are then run for exponentially increasing
|
||||
numbers of iterations until some multiple of the estimated resolution is exceed. By default that multiple is 100, but
|
||||
it can be overridden here.
|
||||
When running benchmarks a number of "samples" is collected. This is the base data for later statistical analysis.
|
||||
Per sample a clock resolution dependent number of iterations of the user code is run, which is independent of the number of samples. Defaults to 100.
|
||||
|
||||
<a id="benchmark-resamples"></a>
|
||||
## Specify the number of resamples for bootstrapping
|
||||
<pre>--benchmark-resamples <# of resamples></pre>
|
||||
|
||||
After the measurements are performed, statistical [bootstrapping] is performed
|
||||
on the samples. The number of resamples for that bootstrapping is configurable
|
||||
but defaults to 100000. Due to the bootstrapping it is possible to give
|
||||
estimates for the mean and standard deviation. The estimates come with a lower
|
||||
bound and an upper bound, and the confidence interval (which is configurable but
|
||||
defaults to 95%).
|
||||
|
||||
[bootstrapping]: http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29
|
||||
|
||||
<a id="benchmark-confidence-interval"></a>
|
||||
## Specify the confidence-interval for bootstrapping
|
||||
<pre>--benchmark-confidence-interval <confidence-interval></pre>
|
||||
|
||||
The confidence-interval is used for statistical bootstrapping on the samples to
|
||||
calculate the upper and lower bounds of mean and standard deviation.
|
||||
Must be between 0 and 1 and defaults to 0.95.
|
||||
|
||||
<a id="benchmark-no-analysis"></a>
|
||||
## Disable statistical analysis of collected benchmark samples
|
||||
<pre>--benchmark-no-analysis</pre>
|
||||
|
||||
When this flag is specified no bootstrapping or any other statistical analysis is performed.
|
||||
Instead the user code is only measured and the plain mean from the samples is reported.
|
||||
|
||||
<a id="usage"></a>
|
||||
## Usage
|
||||
|
@@ -149,6 +149,7 @@ by using `_NO_` in the macro, e.g. `CATCH_CONFIG_NO_CPP17_UNCAUGHT_EXCEPTIONS`.
|
||||
CATCH_CONFIG_DISABLE // Disables assertions and test case registration
|
||||
CATCH_CONFIG_WCHAR // Enables use of wchart_t
|
||||
CATCH_CONFIG_EXPERIMENTAL_REDIRECT // Enables the new (experimental) way of capturing stdout/stderr
|
||||
CATCH_CONFIG_DISABLE_BENCHMARKING // Disables the compile-time heavy benchmarking features
|
||||
|
||||
Currently Catch enables `CATCH_CONFIG_WINDOWS_SEH` only when compiled with MSVC, because some versions of MinGW do not have the necessary Win32 API support.
|
||||
|
||||
|
Reference in New Issue
Block a user