Mastodawn

Joe Cooper 🇺🇦 🍉Apr 15, 2023

I believe I just deleted the first secretly ChatGPT-generated comment on the forum for an OSS project I maintain. It was shaped like an answer, but didn't actually answer the question. Great grammar, sounded authoritative. But, wrong. The internet is absolutely going to become gray goo, and it's going to happen so fast.

Show thread

Daniel Micay Apr 16, 2023

@swelljoe @dalias We're unfortunately seeing a lot of this on https://discuss.grapheneos.org/. It's being used by spammers to give accounts a history which is hard to differentiate from a legitimate user. They eventually start posting links. Some of them are posting links to fancy web design sites for Android apps but they aren't actually the real site for the app and are probably malware. In many cases the site looks much better than the real app and is better at describing features of the real app.

GrapheneOS Discussion Forum

GrapheneOS discussion forum

GrapheneOS Discussion Forum

Show thread

Daniel Micay Apr 16, 2023

@swelljoe @dalias I don't think they've been sneaky enough to edit the links into previous posts but that's a possibility. Even the very lazy spam is seemingly doing this to fit into the GrapheneOS forum, although that's still usually easy to spot. They've also been posting in old threads which makes it more painful. It's very difficult to deal with it as an open source project with only community moderators. We have funding to hire people but hiring moderators to deal with this seems wasteful.

Show thread

Michael K Johnson Apr 16, 2023

@DanielMicay @swelljoe @dalias I had to turn off self-post-edit for low-utilization users on Maker Forums because of so many low-or-zero-value posts being edited into spam later. They would come back after 1-3 months to edit low-value posts into spam links. Ironically, because they were low-trust users, those links would be rendered as rel="nofollow" to explicitly deny SEO-juice from the links, but that didn't seem to matter to them.

I used other tools to identify likely spammers and audited a lot of posts to confirm that they were doing this before tightening controls.

I explicitly don't close down old threads because there's so much legitimate usage there; helpful updates on "how well this worked" after two years and such. But I could imagine auto-moderating necro-posting for low-trust users being helpful here.

Show thread

Daniel Micay

@mcdanlj @swelljoe @dalias Links in forum posts and user bios are entirely marked as rel="ugc nofollow" by Flarum which explicitly marks them as user generated content. The issue is that ugc and nofollow links still do help with SEO because the majority of links are marked that way at this point due to the prominence of social media sites, Wikipedia, etc. and search engines can't realistically consider them as not contributing any ranking power.

Show thread

Michael K Johnson Apr 16, 2023

@DanielMicay @swelljoe @dalias TIL.

Fortunately, I haven't ignored those spam links; they are live for a few hours at most. But now I know that they may be evil but perhaps are less stupid than I thought. ☹

Show thread

Daniel Micay Apr 17, 2023

@mcdanlj @swelljoe @dalias We've found that most of the emails they use are in the https://www.stopforumspam.com/ database but we can't currently use it because we don't want to leak information about people registering to a service. They offer the whole database for download but we don't have resources to spare improving the existing extension (https://discuss.flarum.org/d/17846-friendsofflarum-stopforumspam) for the forum software we use to support using a local database generated from these.

Stop Forum Spam

StopForumSpam - a database of known forum and blog spam, its sources and the email addresses reported

Show thread

Cassandrich Apr 17, 2023

@DanielMicay @mcdanlj @swelljoe Could you modify it to just do a service query to a different address you control, and make a minimal API frontend for the database to answer those queries.

Show thread

Michael K Johnson Apr 17, 2023

@dalias @DanielMicay @swelljoe Their API takes the whole email address and IP address and returns an answer. Wrapping it in a separate service wouldn't make a difference. You can also download complete lists up to twice per day if you want to do the checking locally.

It would be interesting to have a service based on one-way hashes of bits if data that return all possible matches based on those hashes to then make decisions locally, still reporting new spammers to contribute actual matches to the database.

They don't list a plug-in for discourse, so I'd end up using it manually anyway. 🤷

Show thread

Joe Cooper 🇺🇦 🍉Apr 17, 2023

@mcdanlj @dalias @DanielMicay I found an old Discourse plugin, but it seems, at first glance, to only use the API, and it's been untouched for four years...the Discourse plugin API hasn't changed much AFAIK, so maybe it still works, but not being able to use a local copy of the DB is a problem. Would need work, I guess. https://meta.discourse.org/t/stop-forum-spam-plugin-auto-silence-known-spammers/121037/1

Stop Forum Spam Plugin (auto silence known spammers)

Overview The Stop Forum Spam plugin (unofficial) can help weed out human spammers who are able to bypass Discourse’s built-in spam tools (thanks to their awesome human powers). Right after a new user signs up on your forum (before they have time to post), this plugin will check the user’s email address, forum username, and/or IP address (depending on your plugin settings) against the Stop Forum Spam database. If the user is found in this database of known spammers, their user account will be im...

Discourse Meta

Show thread

Cassandrich Apr 17, 2023

@mcdanlj @DanielMicay @swelljoe Not wrapping their service, wrapping your local copy of the DB with an API to emulate their service

Show thread

Daniel Micay Apr 18, 2023

@dalias @mcdanlj @swelljoe Would need to write scripting to download their database, parse it and turn it into an SQLite database and then make a small web service out of it. It would be quite straightforward but I don't really want to spend a day on this.

They reuse the same IP addresses and emails for months or even years across many forums. They mainly seem to use gmail addresses. It seems they index all the forums using each forum software and spam them systematically only partly automated.