Mastodawn

Lesley Lai Jul 11, 2024

I can't believe a simple change made my code 100x faster! I clearly did the wrong way initially. I was concatenating multiple `vector`s into one. Initially, for each vector, I used `result.reserve(result.size() + vec.size())` and then performed a bunch of `push_back`s. It took an unacceptable 10s. Then, I changed to precalculate the final size of the result vector, perform a single `.resize()`, and then do a series of copies. The same operation now takes only ~100ms.

#cpp #cplusplus

Show thread

Lesley Lai

I'm not entirely sure if I missed anything. The problem is solved, but it's interesting, so I will investigate further. I suspect that calling `vec.reserve()` repeatedly within a hot loop is the problem.

Show thread

Thomas Heller Jul 11, 2024

@lesley It is. Each new reserve has to perform an allocation plus and move the old content to the newly allocated.

Show thread

Andre Weissflog Jul 11, 2024

@lesley is the speedup also that big in release mode? IIRC allocation in the C++ stdlib is notoriously slow in MSVC in debug mode.

Also, did you try doing a single .reserve() to the final size and push_back() instead of .resize() and copy?

Show thread

Lesley Lai Jul 11, 2024

@floooh I measured in release mode. I guess thar I just hitted the ".reserve in a loop" footgun, which I should know better

Show thread

Philip Trettner Jul 11, 2024

@lesley which stdlib are you using? Reserve has a famous footgun: should it reserve exactly the given amount or at least the given amount? The first is often the user's intuition but your case has quadratic runtime (each reserve is reallocating). Many stdlibs grow to an exponential increase or the given value, whichever is higher. That fixes the problem. But it's not mandated as far as I know.

Show thread

Thomas Heller Jul 11, 2024

@lesley You should also try to not reserve at all and make use of the amortized O(1) complexity of push_back. It might lead to some memory overhead though.