Mastodawn

rain 🌦️May 20, 2024

Got unreasonably excited about this new, incredibly straightforward count-distinct algorithm. The CVM algorithm is a direct replacement for HyperLogLog, it nerd-sniped Donald Knuth for weeks, *and* it can easily be taught in an entry-level CS course.

h/t @munin
https://www.quantamagazine.org/computer-scientists-invent-an-efficient-new-way-to-count-20240516/

Computer Scientists Invent an Efficient New Way to Count | Quanta Magazine

By making use of randomness, a team has created a simple algorithm for estimating large numbers of distinct objects in a stream of data.

Quanta Magazine

Show thread

rain 🌦️

It is a VERY simple algorithm. You can code it up in an hour or two, including tests.

Knuth's paper is a bit more dense, but still shows how utterly fascinated he was by it

https://www-cs-faculty.stanford.edu/~knuth/papers/cvm-note.pdf

Show thread

rain 🌦️May 20, 2024

Note: "count-distinct" means count the distinct number of elements with a stream of data, with less memory available than it would take to maintain an exact count. So it's more of an estimate, and CVM produces an estimate with really good bounds