Mastodawn

rain 🌦️May 20, 2024

Got unreasonably excited about this new, incredibly straightforward count-distinct algorithm. The CVM algorithm is a direct replacement for HyperLogLog, it nerd-sniped Donald Knuth for weeks, *and* it can easily be taught in an entry-level CS course.

h/t @munin
https://www.quantamagazine.org/computer-scientists-invent-an-efficient-new-way-to-count-20240516/

Computer Scientists Invent an Efficient New Way to Count | Quanta Magazine

By making use of randomness, a team has created a simple algorithm for estimating large numbers of distinct objects in a stream of data.

Quanta Magazine

Show thread

Alastair Reid

@rain @munin thank you. I love approximate data structures and that is much simpler than the version of this that I had seen before.

(Sadly, I think I have only had a chance to use approximate data structures once in “real life”: approximate counting up to N using log2(log2(N)) bits.)