Reproducing Hacker News writing style fingerprinting

https://antirez.com/news/150

Reproducing Hacker News writing style fingerprinting - <antirez>

This is an interesting and well-written post but the data in the app seems pretty much random.

Thank you, tptacek. I was able to verify, thanks to the Internet Archive caching of "pg" for the post of 3 years ago, that the entries are quite similar in the case of "pg". Consider that it captures just the statistical patterns in very common words, so you are not likely to see users that you believe are "similar" to yourself. Notably: montrose may likely be a really be a secondary account of PG, and was also found as a cross reference in the original work of three years ago.

Also note that vector similarity is not reciprocal, one thing can have a top scoring item, but such item may have much more items nearer, like in the 2D space when you have a cluster of points and a point nearby but a bit far apart.

Unfortunately I don't think this technique works very well for actual duplicated accounts discovery because often times people post just a few comments in fake accounts. So there is not enough data, if not for the exception where one consistently uses another account to cover their identity.

EDIT: at the end of the post I added the visual representations of pg and montrose.