This keeps it out of the main include path when benchmarking is
enabled, somewhat reducing the compilation-time penalty.
Also moved some other functions into the .cpp file, especially
helpers that could be given internal linkage, and concretized some
iterator-templated code that only ever used
`std::vector<double>::iterator`.