Mastodawn

David Martínez Martí Nov 30, 2022

@nosherwan @xblasco we are still discussing... It's not an easy topic.

@nosherwan @xblasco This article for #Python #dicts is interesting: https://tenthousandmeters.com/blog/python-behind-the-scenes-10-how-python-dictionaries-work/

For one, it reiterates the time complexity as O(1), but then you can see the graph where it doesn't follow theory when benchmarked.

The article explains why. CPU Caches.

But the underlying question is, should be considered O(1) if in practice doesn't follow it?

Python behind the scenes #10: how Python dictionaries work

Python dictionaries are an extremely important part of Python. Of course they are important because programmers use them a lot, but that's not the...

Show thread

Erik Nov 30, 2022

@deavid @nosherwan @xblasco

The algorithm is O(1) in theory and practice.

If we say that an algorithm is O(1) that for any number of items in the dictionary, we can find a constant C which is an upperbound on the time it takes to to do the look-up.

As n grows we get more cache-misses and 100% of look-ups will be cache-misses.

In that case I would expect the lookup time to be capped by the time it takes to find the key from RAM. (a little bit below 500 ns)

For larger dictionaries you might even find a second jump. This will happen when the dictionary doesn't fit into RAM anymore. Most modern operating systems will start writing parts of the dictionary to the hard-drive. This is called swapping.

Show thread

David Martínez Martí Nov 30, 2022

@erikdesmedt @nosherwan @xblasco I think you're right but the upperbound on time doesn't seem good enough reason to prove, as with a high upper bound you could claim O(log n) to be O(1). For example, we could compare a bad hashmap implementation vs a specialized btree, in a scenario where btree excels. We put the same upper bound and btree looks like is O(1) under the same rules. Which is wrong.

I'd like to hear more ideas on how to prove hashmaps are O(1) in practice

Show thread

David Martínez Martí Nov 30, 2022

@erikdesmedt @nosherwan @xblasco wait a second... If we make a btree, several accesses would be out of cache, while hashmaps would have only one ram access. There's no way to make a tree perform around a single memory access. I guess we would always get an order of magnitude higher timings for the right side of the graph

Show thread

David Martínez Martí Nov 30, 2022

@erikdesmedt @nosherwan @xblasco And circling more, now it makes even more sense why B-Trees (not binary, just B) are used in #Postgres https://en.wikipedia.org/wiki/B-tree

It seems to me that they will perform not that bad compared to hashmaps in real life even being O(log n) for access instead of O(1).

https://www.enterprisedb.com/postgres-tutorials/are-hash-indexes-faster-btree-indexes-postgres

(couldn't find a better article for PostgreSQL comparing both indexes, client count is not the metric I want)

B-tree - Wikipedia

Show thread

Ned Batchelder Nov 30, 2022

@deavid @nosherwan @xblasco Real timing graphs will never follow the mathematical theory. Looks like your graph is essentially flat (between 400ns and 500ns from 500k to 50M items), so yes: O(1)

Show thread

David Martínez Martí Nov 30, 2022

@nedbat @nosherwan @xblasco totally agree. However I fail to find where is stated that Big-O is theory, only formal, and not practice. I was hoping for wikipedia, but if it's there I cannot find or understand it