mirror of
				https://github.com/catchorg/Catch2.git
				synced 2025-10-31 12:17:11 +01:00 
			
		
		
		
	Integrate Nonius benchmark into Catch2
Changes done to Nonius: * Moved things into "Catch::Benchmark" namespace * Benchmarks were integrated with `TEST_CASE`/`SECTION`/`GENERATE` macros * Removed Nonius's parameters for benchmarks, Generators should be used instead * Added relevant methods to the reporter interface (default-implemented, to avoid breaking existing 3rd party reporters) * Async processing is guarded with `_REENTRANT` macro for GCC/Clang, used by default on MSVC * Added a macro `CATCH_CONFIG_DISABLE_BENCHMARKING` that removes all traces of benchmarking from Catch
This commit is contained in:
		 Joachim Meyer
					Joachim Meyer
				
			
				
					committed by
					
						 Martin Hořeňovský
						Martin Hořeňovský
					
				
			
			
				
	
			
			
			 Martin Hořeňovský
						Martin Hořeňovský
					
				
			
						parent
						
							00347f1e79
						
					
				
				
					commit
					ce2560ca95
				
			
							
								
								
									
										249
									
								
								docs/benchmarks.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										249
									
								
								docs/benchmarks.md
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,249 @@ | ||||
| # Authoring benchmarks | ||||
|  | ||||
| Writing benchmarks is not easy. Catch simplifies certain aspects but you'll | ||||
| always need to take care about various aspects. Understanding a few things about | ||||
| the way Catch runs your code will be very helpful when writing your benchmarks. | ||||
|  | ||||
| First off, let's go over some terminology that will be used throughout this | ||||
| guide. | ||||
|  | ||||
| - *User code*: user code is the code that the user provides to be measured. | ||||
| - *Run*: one run is one execution of the user code. | ||||
| - *Sample*: one sample is one data point obtained by measuring the time it takes | ||||
|   to perform a certain number of runs. One sample can consist of more than one | ||||
|   run if the clock available does not have enough resolution to accurately | ||||
|   measure a single run. All samples for a given benchmark execution are obtained | ||||
|   with the same number of runs. | ||||
|  | ||||
| ## Execution procedure | ||||
|  | ||||
| Now I can explain how a benchmark is executed in Catch. There are three main | ||||
| steps, though the first does not need to be repeated for every benchmark. | ||||
|  | ||||
| 1. *Environmental probe*: before any benchmarks can be executed, the clock's | ||||
| resolution is estimated. A few other environmental artifacts are also estimated | ||||
| at this point, like the cost of calling the clock function, but they almost | ||||
| never have any impact in the results. | ||||
|  | ||||
| 2. *Estimation*: the user code is executed a few times to obtain an estimate of | ||||
| the amount of runs that should be in each sample. This also has the potential | ||||
| effect of bringing relevant code and data into the caches before the actual | ||||
| measurement starts. | ||||
|  | ||||
| 3. *Measurement*: all the samples are collected sequentially by performing the | ||||
| number of runs estimated in the previous step for each sample. | ||||
|  | ||||
| This already gives us one important rule for writing benchmarks for Catch: the | ||||
| benchmarks must be repeatable. The user code will be executed several times, and | ||||
| the number of times it will be executed during the estimation step cannot be | ||||
| known beforehand since it depends on the time it takes to execute the code. | ||||
| User code that cannot be executed repeatedly will lead to bogus results or | ||||
| crashes. | ||||
|  | ||||
| ## Benchmark specification | ||||
|  | ||||
| Benchmarks can be specified anywhere inside a Catch test case. | ||||
| There is a simple and a slightly more advanced version of the `BENCHMARK` macro. | ||||
|  | ||||
| Let's have a look how a naive Fibonacci implementation could be benchmarked: | ||||
| ```c++ | ||||
| std::uint64_t Fibonacci(std::uint64_t number) { | ||||
|     return number < 2 ? 1 : Fibonacci(number - 1) + Fibonacci(number - 2); | ||||
| } | ||||
| ``` | ||||
| Now the most straight forward way to benchmark this function, is just adding a `BENCHMARK` macro to our test case: | ||||
| ```c++ | ||||
| TEST_CASE("Fibonacci") { | ||||
|     CHECK(Fibonacci(0) == 1); | ||||
|     // some more asserts.. | ||||
|     CHECK(Fibonacci(5) == 8); | ||||
|     // some more asserts.. | ||||
|  | ||||
|     // now let's benchmark: | ||||
|     BENCHMARK("Fibonacci 20") { | ||||
|         return Fibonacci(20); | ||||
|     }; | ||||
|  | ||||
|     BENCHMARK("Fibonacci 25") { | ||||
|         return Fibonacci(25); | ||||
|     }; | ||||
|  | ||||
|     BENCHMARK("Fibonacci 30") { | ||||
|         return Fibonacci(30); | ||||
|     }; | ||||
|  | ||||
|     BENCHMARK("Fibonacci 35") { | ||||
|         return Fibonacci(35); | ||||
|     }; | ||||
| } | ||||
| ``` | ||||
| There's a few things to note: | ||||
| - As `BENCHMARK` expands to a lambda expression it is necessary to add a semicolon after | ||||
|  the closing brace (as opposed to the first experimental version). | ||||
| - The `return` is a handy way to avoid the compiler optimizing away the benchmark code. | ||||
|  | ||||
| Running this already runs the benchmarks and outputs something similar to: | ||||
| ``` | ||||
| ------------------------------------------------------------------------------- | ||||
| Fibonacci | ||||
| ------------------------------------------------------------------------------- | ||||
| C:\path\to\Catch2\Benchmark.tests.cpp(10) | ||||
| ............................................................................... | ||||
| benchmark name                                  samples       iterations    estimated | ||||
|                                                 mean          low mean      high mean | ||||
|                                                 std dev       low std dev   high std dev | ||||
| ------------------------------------------------------------------------------- | ||||
| Fibonacci 20                                            100       416439   83.2878 ms | ||||
|                                                        2 ns         2 ns         2 ns | ||||
|                                                        0 ns         0 ns         0 ns | ||||
|  | ||||
| Fibonacci 25                                            100       400776   80.1552 ms | ||||
|                                                        3 ns         3 ns         3 ns | ||||
|                                                        0 ns         0 ns         0 ns | ||||
|  | ||||
| Fibonacci 30                                            100       396873   79.3746 ms | ||||
|                                                       17 ns        17 ns        17 ns | ||||
|                                                        0 ns         0 ns         0 ns | ||||
|  | ||||
| Fibonacci 35                                            100       145169   87.1014 ms | ||||
|                                                      468 ns       464 ns       473 ns | ||||
|                                                       21 ns        15 ns        34 ns | ||||
| ``` | ||||
|  | ||||
| ### Advanced benchmarking | ||||
| The simplest use case shown above, takes no arguments and just runs the user code that needs to be measured. | ||||
| However, if using the `BENCHMARK_ADVANCED` macro and adding a `Catch::Benchmark::Chronometer` argument after | ||||
| the macro, some advanced features are available. The contents of the simple benchmarks are invoked once per run, | ||||
| while the blocks of the advanced benchmarks are invoked exactly twice: | ||||
| once during the estimation phase, and another time during the execution phase. | ||||
|  | ||||
| ```c++ | ||||
| BENCHMARK("simple"){ return long_computation(); }; | ||||
|  | ||||
| BENCHMARK_ADVANCED("advanced")(Catch::Benchmark::Chronometer meter) { | ||||
|     set_up(); | ||||
|     meter.measure([] { return long_computation(); }); | ||||
| }; | ||||
| ``` | ||||
|  | ||||
| These advanced benchmarks no longer consist entirely of user code to be measured. | ||||
| In these cases, the code to be measured is provided via the | ||||
| `Catch::Benchmark::Chronometer::measure` member function. This allows you to set up any | ||||
| kind of state that might be required for the benchmark but is not to be included | ||||
| in the measurements, like making a vector of random integers to feed to a | ||||
| sorting algorithm. | ||||
|  | ||||
| A single call to `Catch::Benchmark::Chronometer::measure` performs the actual measurements | ||||
| by invoking the callable object passed in as many times as necessary. Anything | ||||
| that needs to be done outside the measurement can be done outside the call to | ||||
| `measure`. | ||||
|  | ||||
| The callable object passed in to `measure` can optionally accept an `int` | ||||
| parameter. | ||||
|  | ||||
| ```c++ | ||||
| meter.measure([](int i) { return long_computation(i); }); | ||||
| ``` | ||||
|  | ||||
| If it accepts an `int` parameter, the sequence number of each run will be passed | ||||
| in, starting with 0. This is useful if you want to measure some mutating code, | ||||
| for example. The number of runs can be known beforehand by calling | ||||
| `Catch::Benchmark::Chronometer::runs`; with this one can set up a different instance to be | ||||
| mutated by each run. | ||||
|  | ||||
| ```c++ | ||||
| std::vector<std::string> v(meter.runs()); | ||||
| std::fill(v.begin(), v.end(), test_string()); | ||||
| meter.measure([&v](int i) { in_place_escape(v[i]); }); | ||||
| ``` | ||||
|  | ||||
| Note that it is not possible to simply use the same instance for different runs | ||||
| and resetting it between each run since that would pollute the measurements with | ||||
| the resetting code. | ||||
|  | ||||
| It is also possible to just provide an argument name to the simple `BENCHMARK` macro to get  | ||||
| the same semantics as providing a callable to `meter.measure` with `int` argument: | ||||
|  | ||||
| ```c++ | ||||
| BENCHMARK("indexed", i){ return long_computation(i); }; | ||||
| ``` | ||||
|  | ||||
| ### Constructors and destructors | ||||
|  | ||||
| All of these tools give you a lot mileage, but there are two things that still | ||||
| need special handling: constructors and destructors. The problem is that if you | ||||
| use automatic objects they get destroyed by the end of the scope, so you end up | ||||
| measuring the time for construction and destruction together. And if you use | ||||
| dynamic allocation instead, you end up including the time to allocate memory in | ||||
| the measurements. | ||||
|  | ||||
| To solve this conundrum, Catch provides class templates that let you manually | ||||
| construct and destroy objects without dynamic allocation and in a way that lets | ||||
| you measure construction and destruction separately. | ||||
|  | ||||
| ```c++ | ||||
| BENCHMARK_ADVANCED("construct")(Catch::Benchmark::Chronometer meter) | ||||
| { | ||||
|     std::vector<Catch::Benchmark::storage_for<std::string>> storage(meter.runs()); | ||||
|     meter.measure([&](int i) { storage[i].construct("thing"); }); | ||||
| }) | ||||
|  | ||||
| BENCHMARK_ADVANCED("destroy", [](Catch::Benchmark::Chronometer meter) | ||||
| { | ||||
|     std::vector<Catch::Benchmark::destructable_object<std::string>> storage(meter.runs()); | ||||
|     for(auto&& o : storage) | ||||
|         o.construct("thing"); | ||||
|     meter.measure([&](int i) { storage[i].destruct(); }); | ||||
| }) | ||||
| ``` | ||||
|  | ||||
| `Catch::Benchmark::storage_for<T>` objects are just pieces of raw storage suitable for `T` | ||||
| objects. You can use the `Catch::Benchmark::storage_for::construct` member function to call a constructor and | ||||
| create an object in that storage. So if you want to measure the time it takes | ||||
| for a certain constructor to run, you can just measure the time it takes to run | ||||
| this function. | ||||
|  | ||||
| When the lifetime of a `Catch::Benchmark::storage_for<T>` object ends, if an actual object was | ||||
| constructed there it will be automatically destroyed, so nothing leaks. | ||||
|  | ||||
| If you want to measure a destructor, though, we need to use | ||||
| `Catch::Benchmark::destructable_object<T>`. These objects are similar to | ||||
| `Catch::Benchmark::storage_for<T>` in that construction of the `T` object is manual, but | ||||
| it does not destroy anything automatically. Instead, you are required to call | ||||
| the `Catch::Benchmark::destructable_object::destruct` member function, which is what you | ||||
| can use to measure the destruction time. | ||||
|  | ||||
| ### The optimizer | ||||
|  | ||||
| Sometimes the optimizer will optimize away the very code that you want to | ||||
| measure. There are several ways to use results that will prevent the optimiser | ||||
| from removing them. You can use the `volatile` keyword, or you can output the | ||||
| value to standard output or to a file, both of which force the program to | ||||
| actually generate the value somehow. | ||||
|  | ||||
| Catch adds a third option. The values returned by any function provided as user | ||||
| code are guaranteed to be evaluated and not optimised out. This means that if | ||||
| your user code consists of computing a certain value, you don't need to bother | ||||
| with using `volatile` or forcing output. Just `return` it from the function. | ||||
| That helps with keeping the code in a natural fashion. | ||||
|  | ||||
| Here's an example: | ||||
|  | ||||
| ```c++ | ||||
| // may measure nothing at all by skipping the long calculation since its | ||||
| // result is not used | ||||
| BENCHMARK("no return"){ long_calculation(); }; | ||||
|  | ||||
| // the result of long_calculation() is guaranteed to be computed somehow | ||||
| BENCHMARK("with return"){ return long_calculation(); }; | ||||
| ``` | ||||
|  | ||||
| However, there's no other form of control over the optimizer whatsoever. It is | ||||
| up to you to write a benchmark that actually measures what you want and doesn't | ||||
| just measure the time to do a whole bunch of nothing. | ||||
|  | ||||
| To sum up, there are two simple rules: whatever you would do in handwritten code | ||||
| to control optimization still works in Catch; and Catch makes return values | ||||
| from user code into observable effects that can't be optimized away. | ||||
|  | ||||
| <i>Adapted from nonius' documentation.</i> | ||||
| @@ -20,7 +20,10 @@ | ||||
| [Specify a seed for the Random Number Generator](#specify-a-seed-for-the-random-number-generator)<br> | ||||
| [Identify framework and version according to the libIdentify standard](#identify-framework-and-version-according-to-the-libidentify-standard)<br> | ||||
| [Wait for key before continuing](#wait-for-key-before-continuing)<br> | ||||
| [Specify multiples of clock resolution to run benchmarks for](#specify-multiples-of-clock-resolution-to-run-benchmarks-for)<br> | ||||
| [Specify the number of benchmark samples to collect](#specify-the-number-of-benchmark-samples-to-collect)<br> | ||||
| [Specify the number of benchmark resamples for bootstrapping](#specify-the-number-of-resamples-for-bootstrapping)<br> | ||||
| [Specify the confidence interval for bootstrapping](#specify-the-confidence-interval-for-bootstrapping)<br> | ||||
| [Disable statistical analysis of collected benchmark samples](#disable-statistical-analysis-of-collected-benchmark-samples)<br> | ||||
| [Usage](#usage)<br> | ||||
| [Specify the section to run](#specify-the-section-to-run)<br> | ||||
| [Filenames as tags](#filenames-as-tags)<br> | ||||
| @@ -57,7 +60,10 @@ Click one of the following links to take you straight to that option - or scroll | ||||
| <a href="#rng-seed">                                    `    --rng-seed`</a><br /> | ||||
| <a href="#libidentify">                                 `    --libidentify`</a><br /> | ||||
| <a href="#wait-for-keypress">                           `    --wait-for-keypress`</a><br /> | ||||
| <a href="#benchmark-resolution-multiple">               `    --benchmark-resolution-multiple`</a><br /> | ||||
| <a href="#benchmark-samples">                           `    --benchmark-samples`</a><br /> | ||||
| <a href="#benchmark-resamples">                         `    --benchmark-resamples`</a><br /> | ||||
| <a href="#benchmark-confidence-interval">               `    --benchmark-confidence-interval`</a><br /> | ||||
| <a href="#benchmark-no-analysis">                       `    --benchmark-no-analysis`</a><br /> | ||||
| <a href="#use-colour">                                  `    --use-colour`</a><br /> | ||||
|  | ||||
| </br> | ||||
| @@ -267,13 +273,40 @@ See [The LibIdentify repo for more information and examples](https://github.com/ | ||||
| Will cause the executable to print a message and wait until the return/ enter key is pressed before continuing - | ||||
| either before running any tests, after running all tests - or both, depending on the argument. | ||||
|  | ||||
| <a id="benchmark-resolution-multiple"></a> | ||||
| ## Specify multiples of clock resolution to run benchmarks for | ||||
| <pre>--benchmark-resolution-multiple <multiplier></pre> | ||||
| <a id="benchmark-samples"></a> | ||||
| ## Specify the number of benchmark samples to collect | ||||
| <pre>--benchmark-samples <# of samples></pre> | ||||
|  | ||||
| When running benchmarks the clock resolution is estimated. Benchmarks are then run for exponentially increasing | ||||
| numbers of iterations until some multiple of the estimated resolution is exceed. By default that multiple is 100, but  | ||||
| it can be overridden here. | ||||
| When running benchmarks a number of "samples" is collected. This is the base data for later statistical analysis. | ||||
| Per sample a clock resolution dependent number of iterations of the user code is run, which is independent of the number of samples. Defaults to 100. | ||||
|  | ||||
| <a id="benchmark-resamples"></a> | ||||
| ## Specify the number of resamples for bootstrapping | ||||
| <pre>--benchmark-resamples <# of resamples></pre> | ||||
|  | ||||
| After the measurements are performed, statistical [bootstrapping] is performed | ||||
| on the samples. The number of resamples for that bootstrapping is configurable | ||||
| but defaults to 100000. Due to the bootstrapping it is possible to give | ||||
| estimates for the mean and standard deviation. The estimates come with a lower | ||||
| bound and an upper bound, and the confidence interval (which is configurable but | ||||
| defaults to 95%). | ||||
|  | ||||
|  [bootstrapping]: http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29 | ||||
|  | ||||
| <a id="benchmark-confidence-interval"></a> | ||||
| ## Specify the confidence-interval for bootstrapping | ||||
| <pre>--benchmark-confidence-interval <confidence-interval></pre> | ||||
|  | ||||
| The confidence-interval is used for statistical bootstrapping on the samples to | ||||
| calculate the upper and lower bounds of mean and standard deviation. | ||||
| Must be between 0 and 1 and defaults to 0.95. | ||||
|  | ||||
| <a id="benchmark-no-analysis"></a> | ||||
| ## Disable statistical analysis of collected benchmark samples | ||||
| <pre>--benchmark-no-analysis</pre> | ||||
|  | ||||
| When this flag is specified no bootstrapping or any other statistical analysis is performed. | ||||
| Instead the user code is only measured and the plain mean from the samples is reported. | ||||
|  | ||||
| <a id="usage"></a> | ||||
| ## Usage | ||||
|   | ||||
| @@ -149,6 +149,7 @@ by using `_NO_` in the macro, e.g. `CATCH_CONFIG_NO_CPP17_UNCAUGHT_EXCEPTIONS`. | ||||
|     CATCH_CONFIG_DISABLE                    // Disables assertions and test case registration | ||||
|     CATCH_CONFIG_WCHAR                      // Enables use of wchart_t | ||||
|     CATCH_CONFIG_EXPERIMENTAL_REDIRECT      // Enables the new (experimental) way of capturing stdout/stderr | ||||
|     CATCH_CONFIG_DISABLE_BENCHMARKING       // Disables the compile-time heavy benchmarking features | ||||
|  | ||||
| Currently Catch enables `CATCH_CONFIG_WINDOWS_SEH` only when compiled with MSVC, because some versions of MinGW do not have the necessary Win32 API support. | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user