@mekkaokereke @markallerton @ks @allenpg @evan
I just left a position where I was working with the teams that manage the software for tens of thousands of manual reviewers.
Even reports scheduled for manual review are ranked according to potential harm, and anything below the threshold isn't handled at all. We're talking millions of potentially actionable events per minute across the network.
It definitely doesn't scale.
And that's before you consider cultural issues. You've got English speakers in India reviewing AAVE conversations in the US. Arabic speakers in Indonesia reviewing Arabic posts in the Middle East. Items being reviewed through translations. ML doesn't handle context well, but neither do people from different cultures. Ask anyone on Facebook and they'll tell you about some friend, or themselves, getting warnings for things which are clearly false positives.
I honestly don't know if it's a solvable problem. Although I do think that ML-assisted reviewing could help.