Working on an alternative std::vector implementation that's not contiguous and can be the base for my ankerl::unordered_dense::map. https://github.com/martinus/unordered_dense

The big advantage is it doesn't have allocation spikes. Surprisingly, insertion is faster because no elements have to be moved around #cpp

GitHub - martinus/unordered_dense: A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion

A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion - martinus/unordered_dense

GitHub
@martinus What's your implementation strategy? I'm using a segmented vector for the ability to safely store pointers to elements, that uses increasing powers of two as blocks.

@foonathan I tried that too, even with CLZ intrinsics to find the index. For some reason that was still not as fast as this approach:

I allocate fixed sized segments, e.g. 4096 bytes (it's a template argument), and make sure that exactly 2^x elements fit into it. The segments are stored in an std::vector. Indexing then becomes

return m_segments[idx >> num_bits][idx & mask];

where num_bits and mask are constexpr members.

@martinus Ah right, I've abandoned contiguous indices and use num_bits == 21 (2 MiB) regardless of actual block size.
@foonathan I might change my implementation so that it behaves like std::vector until the block size is reached (doubling size and moving elements), and then switch to always allocating same size segments. Then it's not a problem to have big segment sizes even when containers might not need to hold many elements, and indexing is still the same speed
@martinus Wait, if you don't actually care about address stability, why have a segmented implementation? push_front?
@foonathan I don't want to have the memory allocation spikes when the vector gets full and relocates
@martinus Ah, I see. Yeah, then I reckon a hybrid approach might work.

> @foonathan Ah right, I've abandoned contiguous indices and use num_bits == 21 (2 MiB) regardless of actual block size.

I don't think I understand this. What I did previously was basically implement this here: https://plflib.org/matt_bentley_-_pot_blocks_bit_tricks.pdf

That way you can have contiguous indices. Finding the index of the actual bucket is just more or less a CLZ operation, finding index within the bucket means removing the highest set bit

@martinus Yes, I had that before as well, but I also found it too slow. Since I don't need contigouous indices I basicially treat every block as-if it was 2MiB - so the indices are something like 0, 1, 2, 3, 2097152, ...
@foonathan Ah, thanks for explaining, that's a neat trick! I didn't think about just skipping indices
@martinus It is weird, especially when you're debugging. You normally don't see numbers that big in a debugging session for a small example... :D