My goofus partner is around and we're going to watch some TV, so will resume this thread shortly.
I think this is probably among the most phishing kits amassed by a private individual?
Obviously I'm going to make these public via torrent once I finish filtering them for all y'all. Phossil also picked up a lot of crap (about 70-75GB worth!) so I don't want to pollute the dataset with unrelated files.

Fifteen percent done, fuck me. I'm doing 5% every 2-3h, so the reamining work is about 34-51 more hours of tilling at this wheel. Want to make sure the dataset I'm creating is 100% true positives findings, obviously.
Definitely found some people's social security numbers and purged those. I'll need to do another review pass to look for any information which is not replaceable, though that will have to be tool-assisted because I don't have 100h left in the tank to do this hah.
Reviewed 20.9% of files now, signing off for the night!
Been adding more and more little tools to help approve/deny whether a file is a phishing kit automatically with ... mixed success. it looks like long, convenient strings to search for like Telegram API keys are very rarely reused (damn), but signatures from kit authors are often left alone. Used that to expedite triage for 243 kits automatically.
Total so far is now 531 distinct kits, which are ~1.4GB compressed.