Integrate Nonius benchmark into Catch2

Changes done to Nonius: * Moved things into "Catch::Benchmark" namespace * Benchmarks were integrated with `TEST_CASE`/`SECTION`/`GENERATE` macros * Removed Nonius's parameters for benchmarks, Generators should be used instead * Added relevant methods to the reporter interface (default-implemented, to avoid breaking existing 3rd party reporters) * Async processing is guarded with `_REENTRANT` macro for GCC/Clang, used by default on MSVC * Added a macro `CATCH_CONFIG_DISABLE_BENCHMARKING` that removes all traces of benchmarking from Catch
2025-12-15 22:52:12 +01:00 · 2019-04-23 23:41:13 +02:00
parent 00347f1e79
commit ce2560ca95
46 changed files with 2614 additions and 190 deletions
--- a/docs/benchmarks.md
+++ b/docs/benchmarks.md
@@ -0,0 +1,249 @@
+# Authoring benchmarks
+
+Writing benchmarks is not easy. Catch simplifies certain aspects but you'll
+always need to take care about various aspects. Understanding a few things about
+the way Catch runs your code will be very helpful when writing your benchmarks.
+
+First off, let's go over some terminology that will be used throughout this
+guide.
+
+- *User code*: user code is the code that the user provides to be measured.
+- *Run*: one run is one execution of the user code.
+- *Sample*: one sample is one data point obtained by measuring the time it takes
+  to perform a certain number of runs. One sample can consist of more than one
+  run if the clock available does not have enough resolution to accurately
+  measure a single run. All samples for a given benchmark execution are obtained
+  with the same number of runs.
+
+## Execution procedure
+
+Now I can explain how a benchmark is executed in Catch. There are three main
+steps, though the first does not need to be repeated for every benchmark.
+
+1. *Environmental probe*: before any benchmarks can be executed, the clock's
+resolution is estimated. A few other environmental artifacts are also estimated
+at this point, like the cost of calling the clock function, but they almost
+never have any impact in the results.
+
+2. *Estimation*: the user code is executed a few times to obtain an estimate of
+the amount of runs that should be in each sample. This also has the potential
+effect of bringing relevant code and data into the caches before the actual
+measurement starts.
+
+3. *Measurement*: all the samples are collected sequentially by performing the
+number of runs estimated in the previous step for each sample.
+
+This already gives us one important rule for writing benchmarks for Catch: the
+benchmarks must be repeatable. The user code will be executed several times, and
+the number of times it will be executed during the estimation step cannot be
+known beforehand since it depends on the time it takes to execute the code.
+User code that cannot be executed repeatedly will lead to bogus results or
+crashes.
+
+## Benchmark specification
+
+Benchmarks can be specified anywhere inside a Catch test case.
+There is a simple and a slightly more advanced version of the `BENCHMARK` macro.
+
+Let's have a look how a naive Fibonacci implementation could be benchmarked:
+```c++
+std::uint64_t Fibonacci(std::uint64_t number) {
+    return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2);
+}
+```
+Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case:
+```c++
+TEST_CASE("Fibonacci") {
+    CHECK(Fibonacci(0) == 1);
+    // some more asserts..
+    CHECK(Fibonacci(5) == 8);
+    // some more asserts..
+
+    // now let's benchmark:
+    BENCHMARK("Fibonacci 20") {
+        return Fibonacci(20);
+    };
+
+    BENCHMARK("Fibonacci 25") {
+        return Fibonacci(25);
+    };
+
+    BENCHMARK("Fibonacci 30") {
+        return Fibonacci(30);
+    };
+
+    BENCHMARK("Fibonacci 35") {
+        return Fibonacci(35);
+    };
+}
+```
+There's a few things to note:
+- As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after
+ the closing brace (as opposed to the first experimental version).
+- The `return` is a handy way to avoid the compiler optimizing away the benchmark code.
+
+Running this already runs the benchmarks and outputs something similar to:
+```
+-------------------------------------------------------------------------------
+Fibonacci
+-------------------------------------------------------------------------------
+C:\path\to\Catch2\Benchmark.tests.cpp(10)
+...............................................................................
+benchmark name                                  samples       iterations    estimated
+                                                mean          low mean      high mean
+                                                std dev       low std dev   high std dev
+-------------------------------------------------------------------------------
+Fibonacci 20                                            100       416439   83.2878 ms
+                                                       2 ns         2 ns         2 ns
+                                                       0 ns         0 ns         0 ns
+
+Fibonacci 25                                            100       400776   80.1552 ms
+                                                       3 ns         3 ns         3 ns
+                                                       0 ns         0 ns         0 ns
+
+Fibonacci 30                                            100       396873   79.3746 ms
+                                                      17 ns        17 ns        17 ns
+                                                       0 ns         0 ns         0 ns
+
+Fibonacci 35                                            100       145169   87.1014 ms
+                                                     468 ns       464 ns       473 ns
+                                                      21 ns        15 ns        34 ns
+```
+
+### Advanced benchmarking
+The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured.
+However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after
+the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run,
+while the blocks of the advanced benchmarks are invoked exactly twice:
+once during the estimation phase, and another time during the execution phase.
+
+```c++
+BENCHMARK("simple"){ return long_computation(); };
+
+BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) {
+    set_up();
+    meter.measure([] { return long_computation(); });
+};
+```
+
+These advanced benchmarks no longer consist entirely of user code to be measured.
+In these cases, the code to be measured is provided via the
+`Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any
+kind of state that might be required for the benchmark but is not to be included
+in the measurements, like making a vector of random integers to feed to a
+sorting algorithm.
+
+A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements
+by invoking the callable object passed in as many times as necessary. Anything
+that needs to be done outside the measurement can be done outside the call to
+`measure`.
+
+The callable object passed in to `measure` can optionally accept an `int`
+parameter.
+
+```c++
+meter.measure([](int i) { return long_computation(i); });
+```
+
+If it accepts an `int` parameter, the sequence number of each run will be passed
+in, starting with 0. This is useful if you want to measure some mutating code,
+for example. The number of runs can be known beforehand by calling
+`Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be
+mutated by each run.
+
+```c++
+std::vector<std::string> v(meter.runs());
+std::fill(v.begin(), v.end(), test_string());
+meter.measure([&v](int i) { in_place_escape(v[i]); });
+```
+
+Note that it is not possible to simply use the same instance for different runs
+and resetting it between each run since that would pollute the measurements with
+the resetting code.
+
+It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get 
+the same semantics as providing a callable to `meter.measure` with `int` argument:
+
+```c++
+BENCHMARK("indexed", i){ return long_computation(i); };
+```
+
+### Constructors and destructors
+
+All of these tools give you a lot mileage, but there are two things that still
+need special handling: constructors and destructors. The problem is that if you
+use automatic objects they get destroyed by the end of the scope, so you end up
+measuring the time for construction and destruction together. And if you use
+dynamic allocation instead, you end up including the time to allocate memory in
+the measurements.
+
+To solve this conundrum, Catch provides class templates that let you manually
+construct and destroy objects without dynamic allocation and in a way that lets
+you measure construction and destruction separately.
+
+```c++
+BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter)
+{
+    std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs());
+    meter.measure([&](int i) { storage[i].construct("thing"); });
+})
+
+BENCHMARK_ADVANCED("destroy", [](Catch::Benchmark::Chronometer meter)
+{
+    std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs());
+    for(auto&& o : storage)
+        o.construct("thing");
+    meter.measure([&](int i) { storage[i].destruct(); });
+})
+```
+
+`Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T`
+objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and
+create an object in that storage. So if you want to measure the time it takes
+for a certain constructor to run, you can just measure the time it takes to run
+this function.
+
+When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was
+constructed there it will be automatically destroyed, so nothing leaks.
+
+If you want to measure a destructor, though, we need to use
+`Catch::Benchmark::destructable_object<T>`. These objects are similar to
+`Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but
+it does not destroy anything automatically. Instead, you are required to call
+the `Catch::Benchmark::destructable_object::destruct` member function, which is what you
+can use to measure the destruction time.
+
+### The optimizer
+
+Sometimes the optimizer will optimize away the very code that you want to
+measure. There are several ways to use results that will prevent the optimiser
+from removing them. You can use the `volatile` keyword, or you can output the
+value to standard output or to a file, both of which force the program to
+actually generate the value somehow.
+
+Catch adds a third option. The values returned by any function provided as user
+code are guaranteed to be evaluated and not optimised out. This means that if
+your user code consists of computing a certain value, you don't need to bother
+with using `volatile` or forcing output. Just `return` it from the function.
+That helps with keeping the code in a natural fashion.
+
+Here's an example:
+
+```c++
+// may measure nothing at all by skipping the long calculation since its
+// result is not used
+BENCHMARK("no return"){ long_calculation(); };
+
+// the result of long_calculation() is guaranteed to be computed somehow
+BENCHMARK("with return"){ return long_calculation(); };
+```
+
+However, there's no other form of control over the optimizer whatsoever. It is
+up to you to write a benchmark that actually measures what you want and doesn't
+just measure the time to do a whole bunch of nothing.
+
+To sum up, there are two simple rules: whatever you would do in handwritten code
+to control optimization still works in Catch; and Catch makes return values
+from user code into observable effects that can't be optimized away.
+
+<i>Adapted from nonius' documentation.</i>
--- a/docs/command-line.md
+++ b/docs/command-line.md
@@ -20,7 +20,10 @@
 [Specify a seed for the Random Number Generator](#specify-a-seed-for-the-random-number-generator)<br>
 [Identify framework and version according to the libIdentify standard](#identify-framework-and-version-according-to-the-libidentify-standard)<br>
 [Wait for key before continuing](#wait-for-key-before-continuing)<br>
-[Specify multiples of clock resolution to run benchmarks for](#specify-multiples-of-clock-resolution-to-run-benchmarks-for)<br>
+[Specify the number of benchmark samples to collect](#specify-the-number-of-benchmark-samples-to-collect)<br>
+[Specify the number of benchmark resamples for bootstrapping](#specify-the-number-of-resamples-for-bootstrapping)<br>
+[Specify the confidence interval for bootstrapping](#specify-the-confidence-interval-for-bootstrapping)<br>
+[Disable statistical analysis of collected benchmark samples](#disable-statistical-analysis-of-collected-benchmark-samples)<br>
 [Usage](#usage)<br>
 [Specify the section to run](#specify-the-section-to-run)<br>
 [Filenames as tags](#filenames-as-tags)<br>
@@ -57,7 +60,10 @@ Click one of the following links to take you straight to that option - or scroll
 <a href="#rng-seed">                                    `    --rng-seed`</a><br />
 <a href="#libidentify">                                 `    --libidentify`</a><br />
 <a href="#wait-for-keypress">                           `    --wait-for-keypress`</a><br />
-<a href="#benchmark-resolution-multiple">               `    --benchmark-resolution-multiple`</a><br />
+<a href="#benchmark-samples">                           `    --benchmark-samples`</a><br />
+<a href="#benchmark-resamples">                         `    --benchmark-resamples`</a><br />
+<a href="#benchmark-confidence-interval">               `    --benchmark-confidence-interval`</a><br />
+<a href="#benchmark-no-analysis">                       `    --benchmark-no-analysis`</a><br />
 <a href="#use-colour">                                  `    --use-colour`</a><br />

 </br>
@@ -267,13 +273,40 @@ See [The LibIdentify repo for more information and examples](https://github.com/
 Will cause the executable to print a message and wait until the return/ enter key is pressed before continuing -
 either before running any tests, after running all tests - or both, depending on the argument.

-<a id="benchmark-resolution-multiple"></a>
-## Specify multiples of clock resolution to run benchmarks for
-<pre>--benchmark-resolution-multiple &lt;multiplier&gt;</pre>
+<a id="benchmark-samples"></a>
+## Specify the number of benchmark samples to collect
+<pre>--benchmark-samples &lt;# of samples&gt;</pre>

-When running benchmarks the clock resolution is estimated. Benchmarks are then run for exponentially increasing
-numbers of iterations until some multiple of the estimated resolution is exceed. By default that multiple is 100, but 
-it can be overridden here.
+When running benchmarks a number of "samples" is collected. This is the base data for later statistical analysis.
+Per sample a clock resolution dependent number of iterations of the user code is run, which is independent of the number of samples. Defaults to 100.
+
+<a id="benchmark-resamples"></a>
+## Specify the number of resamples for bootstrapping
+<pre>--benchmark-resamples &lt;# of resamples&gt;</pre>
+
+After the measurements are performed, statistical [bootstrapping] is performed
+on the samples. The number of resamples for that bootstrapping is configurable
+but defaults to 100000. Due to the bootstrapping it is possible to give
+estimates for the mean and standard deviation. The estimates come with a lower
+bound and an upper bound, and the confidence interval (which is configurable but
+defaults to 95%).
+
+ [bootstrapping]: http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29
+
+<a id="benchmark-confidence-interval"></a>
+## Specify the confidence-interval for bootstrapping
+<pre>--benchmark-confidence-interval &lt;confidence-interval&gt;</pre>
+
+The confidence-interval is used for statistical bootstrapping on the samples to
+calculate the upper and lower bounds of mean and standard deviation.
+Must be between 0 and 1 and defaults to 0.95.
+
+<a id="benchmark-no-analysis"></a>
+## Disable statistical analysis of collected benchmark samples
+<pre>--benchmark-no-analysis</pre>
+
+When this flag is specified no bootstrapping or any other statistical analysis is performed.
+Instead the user code is only measured and the plain mean from the samples is reported.

 <a id="usage"></a>
 ## Usage
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -149,6 +149,7 @@ by using `_NO_` in the macro, e.g. `CATCH_CONFIG_NO_CPP17_UNCAUGHT_EXCEPTIONS`.
    CATCH_CONFIG_DISABLE                    // Disables assertions and test case registration
    CATCH_CONFIG_WCHAR                      // Enables use of wchart_t
    CATCH_CONFIG_EXPERIMENTAL_REDIRECT      // Enables the new (experimental) way of capturing stdout/stderr
+    CATCH_CONFIG_DISABLE_BENCHMARKING       // Disables the compile-time heavy benchmarking features

 Currently Catch enables `CATCH_CONFIG_WINDOWS_SEH` only when compiled with MSVC, because some versions of MinGW do not have the necessary Win32 API support.