I can't believe a simple change made my code 100x faster! I clearly did the wrong way initially. I was concatenating multiple `vector`s into one. Initially, for each vector, I used `result.reserve(result.size() + vec.size())` and then performed a bunch of `push_back`s. It took an unacceptable 10s. Then, I changed to precalculate the final size of the result vector, perform a single `.resize()`, and then do a series of copies. The same operation now takes only ~100ms.

#cpp #cplusplus

I'm not entirely sure if I missed anything. The problem is solved, but it's interesting, so I will investigate further. I suspect that calling `vec.reserve()` repeatedly within a hot loop is the problem.
@lesley It is. Each new reserve has to perform an allocation plus and move the old content to the newly allocated.

@lesley is the speedup also that big in release mode? IIRC allocation in the C++ stdlib is notoriously slow in MSVC in debug mode.

Also, did you try doing a single .reserve() to the final size and push_back() instead of .resize() and copy?

@floooh I measured in release mode. I guess thar I just hitted the ".reserve in a loop" footgun, which I should know better
@lesley which stdlib are you using? Reserve has a famous footgun: should it reserve exactly the given amount or at least the given amount? The first is often the user's intuition but your case has quadratic runtime (each reserve is reallocating). Many stdlibs grow to an exponential increase or the given value, whichever is higher. That fixes the problem. But it's not mandated as far as I know.
@lesley You should also try to not reserve at all and make use of the amortized O(1) complexity of push_back. It might lead to some memory overhead though.