sometimes you see a piece of code so beautiful you simply must commit it to physical media in the most elegant way you know

(source, context)

#calligraphy

Putting aside the sheer absurdity of a multibillion-dollar machine learning company using regex to do sentiment analysis, an actual legit application for machine learning, it is so fantastically inefficiently written. Every time I wrote this out (like 2½ times) I see something new and horrible, and I'm not even what you'd call a regex wizard. Like, why wtf|wth and not wt(f|h)? Why are "horrible" and "awful" in there twice? Why fuck you|screw (this|you) and not (screw|fuck) (this|(yo)?u) ((bull)?shit|crap)? But there is no use trying to understand it because there is no thought or intention behind it. Simply a mindless word generator that got tossed at a problem again and again until the droppings accreted into something that worked well enough.

@nev i agree with you, but

i authored a system that used regex for sentiment analysis in english language pathology reports. we achieved an incredibly high sensitivity and specificity (precision and recall) and i'm really proud of the work

i think because of the very narrow problem space, and the ways in which pathologists tended to hedge their bets on diagnoses that might or might not be cancer, it was more straightforward.

i'd never attempt to use it writ large like this.

@wohali yeah, I don't think regex for sentiment analysis is an inherently terrible idea! I would think it a pretty clever, cheap solution in most other situations. It's just absurd when it's 1) an actual ML company that has basically unlimited resources to throw at the problem and 2) a tool actively used by tons and tons of people around the world.