String interning (deduplication) statistics for Adalanche when running on an AD with ~2K users, ~2K computers and local machine data from all of them (resulting in 50K AD objects and ~1.4M objects from local machine data).

It's a 10X saving, pretty damned proud of this. It uses my Go string interning module https://github.com/lkarlslund/stringdedup

GitHub - lkarlslund/stringdedup: String deduplication package for Go

String deduplication package for Go. Contribute to lkarlslund/stringdedup development by creating an account on GitHub.

GitHub
@lkarlslund will this be pushed to the open source version (or is it already). At conf so lazy question..
@itisiboller not a crazy question. No, I think this will go into the Professional edition. Open Source is Active Directory ACLs and local machines only. There has to be a differentiator.
@lkarlslund Very fair, just wanted to play with it against my testlab.
@itisiboller I might do a demo version of the Professional version against a "known" lab or something, if there's interest. For instance the GOAD one from @Mayfly
@lkarlslund
Impressive! Does dedup do better than compression with this data set, or is compression too expensive?
@FritzAdalis everything here is in memory, and it doesn't make sense to compress 10x the same immutable data, much better just to point it all at the same location in memory. For the AD dump binary files Adalanche generates, I just LZ4 all of it, which is very efficient.