Martin Hořeňovský d99eb8bec8
Optimize 64x64 extended multiplication implementation
Now we use intrinsics when possible, and fallback to optimized
implementation in portable C++. The difference is about 4x when
we can use intrinsics and about 2x when we cannot.

This should speed up our Lemire's algorithm implementation nicely.
2024-04-03 13:28:25 +02:00
..