I wonder if anyone has made any software to detect keysmashing

(I mean, besides Pawsense, kind)

I mean, like, I could run it on my chat logs and see which lines were keysmashes

this chatlog is in... random order? huh?

I used Discrub to export a Discord chatlog for analysis, and it gave me a JSON of all the messages in that chat, but in random order.

weird! I mean I can just sort them by timestamp or ID (maybe?) but that's just the first time I've ever seen chatlogs that are not ordered

yep, sorting them by ID puts them in the right order.

so weird

I also need a way to detect "awawas" and the single key braincrash, like:
<kitten> hhhhhhhhhhhhhhhhhhhh
pip install bottom-chat

hey look a keysmash generator, that's something:

https://www.dcode.fr/keyboard-smash

Keysmash Generator - Online Keyboard Key Smashing Simulator

Tool for generating 'keysmashes', random keystrokes on the keyboard, keyboard smashing style, sometimes used as a way to show emotion (anger, frenzy, hilarity, etc.)

there's also a thing where people say lots of quick lines in a row, like:
<kitten> as
<kitten> gh
<kitten> buh
<kitten> jg

So I need to code it to look at timestamps too, and define some kind of threshold for sending a bunch of nonsensical mini-keysmashes in a row

I'm embracing data-driven TDD:

FAILED test_lines.py::test_detectors[ghasdlfgjk-output4] - AssertionError: detect('ghasdlfgjk') is [], expected ['is_keysmash']

@kawa do you happen to have a handy regex for meows? or some examples of meows?

I discovered another platform doing this weird.
When you do an account export from tumlbr, they give you a conversations.zip file which has HTML (not JSON or XML, HTML) for each person you've messaged.

and the messages are in reverse chronological order, with the most recent messages at the top

why

I'm going to have to parse this HTML and then parse the dates in it and then re-sort it!

ugh

one of the biggest problems facing the keymash detector project is the fact that I'm doing it across tumblr/discord/matrix/mastodon/blusky/irc.

That's a lot of different log types to need to ingest

@foone oh I was thinking it interleaved all the tinelines, so you could have multiplatform keysmash drifting
@arrjay
If someone in the nineties had told me that some day someone would post the phrase "multiplatform keysmash drifting" on the Internet and all readers of that post would know what that meant, I would've been skeptical...
@foone
@foone at least you don't have to contend with Skype (the only platform I've ever seen which detected keysmashing/facerolling on the fly: the 'someone is typing' pencil animation would turn into 'angry breaking a pencil') which would get flummoxed on DST Fall Back, interleaving your post-jump messages with the clocktime of the pre-jump conversation, rather than keeping them in proper ULT order
@foone Discord's 'several people are typing' just doesn't hit the same...
@hyratel
'several people are typing ON THE SAME KEYBOARD"
@foone
@foone use lynx to parse it for you?
@RueNahcMohr I was gonna use Beautiful Soup for this, it's usually my go-to for extracting structured data from HTML
@foone
One tag makes you early
And one tag makes you late
And the one the lexer gave you
Don't fit any rules for dates
Go ask Alice
When her reader waits...
@foone sounds like the display engine with output to file
@RandomDamage I mean I guess? but it's displayed chronological in the app, it just starts from the latest-end.

@foone

Do you really have to ask at this point?

Malicious compliance. They've to provide these exports by law. Not just EU law, other countries outside of the EU implemented similar ones. But they do not want to see their product leave their platform. So they're making it as inconvenient and as useless as they can possibly get away with.

Same for why we have these dumb cookie banners. The simplest solution was always to stop the tracking as you'd no longer need any cookie banner then...

@agowa338 @foone is it though? It’s not like writing the parser is hard, it’s just annoying. Another platform wanting to ingest that data could do that without much more trouble than a sigh from the engineer getting the ticket.

@h5e @foone

But it being annoying is the most they can do. The law says it needs to be in "a industry standard format" or something like that. So they'll make it as annoying and as difficult as they can. Which is exactly what they did here...

@agowa338 @foone indeed they did, but I’m not sure that is the motivation. It could be, but it doesn’t seem worth the effort imo. I think they just whipped up the easiest way to get it done (for them).

@h5e @foone

if it was the easiest way for them then you'd get unformatted database dump files.

@foone So you can’t really do anything with the exported data. Nice. 
@foone HackerOne does this too when you export a report to PDF. Totally baffling.
@foone mew mrw mrrp mlem mowwr
@kawa thanks I'll stick all these in the corpus
@foone @kawa I’ve been told this is missing meow and mrraowww.
@foone @kawa this mostly works though it won't detect mlem: /(^|(?<=[ \n]))mr*[eao]*p?[wp]+/gmi
@foone one i wrote a while back: /(m[reaou]{2,}w|m[re]+w|mr+p)/ (though it has some false positives, notably "tomorrow")