Alright I'm speeding up my review of the phishing kits that my project, Phossil, collected. My current estimate is that over the last 5 years I've collected about 1,200 kits. I'm going to review these in two passes - first, just to identify whether they're legit phishing kits (not other malware, webshells, etc.). Later I'll review them for contents - leaked info, trends, targeted companies, etc. I'll be shitposting in this thread with anything funny that I see.
22 files worth of bot detection, but they still didn't detect mine 💅
Big merger just announced via the phishing world, just tremendous

My goofus partner is around and we're going to watch some TV, so will resume this thread shortly.

I think this is probably among the most phishing kits amassed by a private individual?

Obviously I'm going to make these public via torrent once I finish filtering them for all y'all. Phossil also picked up a lot of crap (about 70-75GB worth!) so I don't want to pollute the dataset with unrelated files.

I *promise* you do not need to make phishing kits this way, we have the technology, you don't need to clone an entire website's worth of assets and load them inline in *every* file.
10% of the way there! 900 files reviewed out of 8934, via automatic filtering (rules, not an LLM) + + automatically unpacking + manual review of candidate files. Of these, 133 are confirmed phishing kits. Should be 1250-1450 kits total after reviewing the rest.
Listening to SEX-FM (https://www.youtube.com/watch?v=3YaWnbKFJro) while rooting through some of the goofiest PHP I've ever seen is an unbelievable vibe
Tolerant Tape "Two" - SEX-FM Archive

YouTube
yes. the bee. they make ... hony

Fifteen percent done, fuck me. I'm doing 5% every 2-3h, so the reamining work is about 34-51 more hours of tilling at this wheel. Want to make sure the dataset I'm creating is 100% true positives findings, obviously.

Definitely found some people's social security numbers and purged those. I'll need to do another review pass to look for any information which is not replaceable, though that will have to be tool-assisted because I don't have 100h left in the tank to do this hah.

Oh hey fuck you to whoever left a zip bomb around for me, lmfao, good attempt

Reviewed 20.9% of files now, signing off for the night!

Been adding more and more little tools to help approve/deny whether a file is a phishing kit automatically with ... mixed success. it looks like long, convenient strings to search for like Telegram API keys are very rarely reused (damn), but signatures from kit authors are often left alone. Used that to expedite triage for 243 kits automatically.

Total so far is now 531 distinct kits, which are ~1.4GB compressed.