All the previous refactoring to make the assertion fast paths
smaller and faster also allows us to implement the fast paths
just with thread-local and atomic variables, without full mutexes.
However, the performance overhead of thread-safe assertions is
still significant for single threaded usage:
| slowdown | Debug | Release |
|-----------|--------:|--------:|
| fast path | 1.04x | 1.43x |
| slow path | 1.16x | 1.22x |
Thus, we don't make the assertions thread-safe by default, and instead
provide a build-time configuration option that the users can set to get
thread-safe assertions.
This commit is functional, but it still needs some follow-up work:
* We do not need full seq_cst increments for the atomic counters,
and using weaker ones can be faster.
* We brute-force updating the reporter-friendly totals from internal
atomic counters by doing it everywhere. We should properly trace
where this is needed instead.
* Message macros (`INFO`, `UNSCOPED_INFO`, `CAPTURE`, etc) are not
made thread safe in this commit, but they can be made thread safe
in the future, by building on top of this work.
* Add more tests, including with thread-sanitizer, and compiled
examples to the repository. Right now, these changes have been
compiled with tsan manually, but these tests are not added to CI.
Closes#2948