String interning (deduplication) statistics for Adalanche when running on an AD with ~2K users, ~2K computers and local machine data from all of them (resulting in 50K AD objects and ~1.4M objects from local machine data).

It's a 10X saving, pretty damned proud of this. It uses my Go string interning module https://github.com/lkarlslund/stringdedup

GitHub - lkarlslund/stringdedup: String deduplication package for Go

String deduplication package for Go. Contribute to lkarlslund/stringdedup development by creating an account on GitHub.

GitHub
@lkarlslund
Impressive! Does dedup do better than compression with this data set, or is compression too expensive?
@FritzAdalis everything here is in memory, and it doesn't make sense to compress 10x the same immutable data, much better just to point it all at the same location in memory. For the AD dump binary files Adalanche generates, I just LZ4 all of it, which is very efficient.