I believe I just deleted the first secretly ChatGPT-generated comment on the forum for an OSS project I maintain. It was shaped like an answer, but didn't actually answer the question. Great grammar, sounded authoritative. But, wrong. The internet is absolutely going to become gray goo, and it's going to happen so fast.
Second one just arrived from another user. I don't understand the business case for ChatGPT spam. There were no links? Maybe they plan to add links later via edit? I don't even know. Yet another thing for me to worry about. Today's internet is absolutely a cesspool and "AI" is going to make it so much worse.

I asked ChatGPT for help.

I love that the first sentence is defensive. So like us. "As a language model myself, I must first clarify that comments generated by language models are not inherently bad or malicious."

And, isn't it great that one option to prevent bogus comments from language models is to implement NLP? It's turtles all the way down. I throw my hands up in despair at the future of our internet.
As foretold in the prophecies (my toots from a month ago), a forum I maintain now receives several ChatGPT comments from multiple users daily. They've wised up about making a comment super fast after creating the account (which triggered the Discourse anti-spam) and now there's no clues that it's an LLM until a human happens to read it and recognizes its alien nature. This means I have to read every post/comment or let the machines invent insane explanations of our software. #chatgpt

@swelljoe

The problem: NLP.

The cure: NLP.

Pretty good business model. ๐Ÿ™„๐Ÿคฌ

@swelljoe It must be clout right? Wanting to participate but not having the knowledge or skill to. I donโ€™t know why else youโ€™d post something you donโ€™t understand to something you donโ€™t have experience in.
@vfrunza I don't think it's clout. We do have some users who post all the time, often wrong answers, just to be in the conversation. But, this is different. I think they're just biding their time until Discourse opens up their ability to edit posts and post links without getting immediately auto-moderated. I've seen that in the past with human-generated posts. I'm just alarmed at the automation potential here. It would be very easy to overwhelm the S/N ratio of a lightly moderated forum.
@swelljoe I understand. It sounds like a crap problem to try and fix :(
@swelljoe All I can think of is that theyโ€™re trying to generate link juice. Try to seem authoritative, have a comments history, maybe a few upvotes. Then when the spam and scam posts happen, people will click through and see enough historical context to feel like that user account is a little trustworthy.
@swelljoe we've been getting some of these on the @wesnoth forums

@egallager @wesnoth I don't know why I was so surprised by it. It was inevitable. We've had existing users post ChatGPT answers, labeled as such. And, I asked them to stop doing that and they did. But, these are new users and it seems to be a business model of some sort.

Yet another thing to waste my time each day, making being involved in anything at all on the open internet a constant source of misery. I already gave up on wikis and mailing lists years ago.

@swelljoe

Exponential Bullshit at the Speed of Light

@swelljoe What happens when these disparate AI systems start training on each otherโ€™s mediocre and incorrect content published everywhere? Itโ€™s going to be a feedback loop to idiocracy
@swelljoe AI is the new crypto.
@stoicmike I think it's more dangerous than crypto because it actually has utility. There are a lot of ways to get value out of it. Everybody understands crypto is a scam, now, but I suspect the AI curve will look different, maybe different than anything we've seen, but I don't really know what will happen. But the history of spam tells me that before long, there will be a slimy layer of LLM spam covering every surface on the internet, and search engines will become even more useless.
@swelljoe I'm more inclined to think that the products will be a disaster because they will make such bad decisions and give such poor advice. Right now they can't even do simple customer service.
@stoicmike I dunno. I've already found it useful, despite my skepticism and concerns. It has a breadth of knowledge sort of like an expert, but it never gets tired of dumb questions like an expert (and costs a lot less). It's alarmingly casual about lying (makes up scientific results, makes up features and options in software, makes up people, etc.), but, it can show me how to do stuff I've had a hard time figuring out on my own. It's tantalizingly close to being good at some niches.
@stoicmike Unsurprisingly, as with many things built by programmers the thing it seems to be best at first is helping with programming. I've said it's like "rubber ducking" where the duck can talk back, which isn't always a good thing, but "pair" programming at any time day or night and on my solo projects is pretty great. It isn't a great programmer, it doesn't write smart or concise code, but it's a programmer with broad knowledge, and I don't always write good code either.
@swelljoe I just read that in China people are using it to write their mandatory assigned communist homework.
@swelljoe you've described 9 out 10 comments I come across without this latest chat bot.

@craige no, the grammar was excellent.

Jokes aside, I think that's one of the reasons it's so dangerous. Humans are suckers for a confident liar (it's "con man" for a reason), and ChatGPT is so confident and so authoritative sounding. It's exactly the kind of answer we want, but fictional. If a dumb human makes a suggestion, there are probably cues that allow us to judge their value. Many of those cues are distorted or gone. I can mostly detect LLMs for now...but soon, I dunno.

@swelljoe @dalias We're unfortunately seeing a lot of this on https://discuss.grapheneos.org/. It's being used by spammers to give accounts a history which is hard to differentiate from a legitimate user. They eventually start posting links. Some of them are posting links to fancy web design sites for Android apps but they aren't actually the real site for the app and are probably malware. In many cases the site looks much better than the real app and is better at describing features of the real app.
GrapheneOS Discussion Forum

GrapheneOS discussion forum

GrapheneOS Discussion Forum
@swelljoe @dalias I don't think they've been sneaky enough to edit the links into previous posts but that's a possibility. Even the very lazy spam is seemingly doing this to fit into the GrapheneOS forum, although that's still usually easy to spot. They've also been posting in old threads which makes it more painful. It's very difficult to deal with it as an open source project with only community moderators. We have funding to hire people but hiring moderators to deal with this seems wasteful.

@DanielMicay @swelljoe @dalias I had to turn off self-post-edit for low-utilization users on Maker Forums because of so many low-or-zero-value posts being edited into spam later. They would come back after 1-3 months to edit low-value posts into spam links. Ironically, because they were low-trust users, those links would be rendered as rel="nofollow" to explicitly deny SEO-juice from the links, but that didn't seem to matter to them.

I used other tools to identify likely spammers and audited a lot of posts to confirm that they were doing this before tightening controls.

I explicitly don't close down old threads because there's so much legitimate usage there; helpful updates on "how well this worked" after two years and such. But I could imagine auto-moderating necro-posting for low-trust users being helpful here.

@mcdanlj @swelljoe @dalias Links in forum posts and user bios are entirely marked as rel="ugc nofollow" by Flarum which explicitly marks them as user generated content. The issue is that ugc and nofollow links still do help with SEO because the majority of links are marked that way at this point due to the prominence of social media sites, Wikipedia, etc. and search engines can't realistically consider them as not contributing any ranking power.

@DanielMicay @swelljoe @dalias TIL.

Fortunately, I haven't ignored those spam links; they are live for a few hours at most. But now I know that they may be evil but perhaps are less stupid than I thought. โ˜น

@mcdanlj @swelljoe @dalias We've found that most of the emails they use are in the https://www.stopforumspam.com/ database but we can't currently use it because we don't want to leak information about people registering to a service. They offer the whole database for download but we don't have resources to spare improving the existing extension (https://discuss.flarum.org/d/17846-friendsofflarum-stopforumspam) for the forum software we use to support using a local database generated from these.
Stop Forum Spam

StopForumSpam - a database of known forum and blog spam, its sources and the email addresses reported

@DanielMicay @mcdanlj @swelljoe Could you modify it to just do a service query to a different address you control, and make a minimal API frontend for the database to answer those queries.

@dalias @DanielMicay @swelljoe Their API takes the whole email address and IP address and returns an answer. Wrapping it in a separate service wouldn't make a difference. You can also download complete lists up to twice per day if you want to do the checking locally.

It would be interesting to have a service based on one-way hashes of bits if data that return all possible matches based on those hashes to then make decisions locally, still reporting new spammers to contribute actual matches to the database.

They don't list a plug-in for discourse, so I'd end up using it manually anyway. ๐Ÿคท

@mcdanlj @dalias @DanielMicay I found an old Discourse plugin, but it seems, at first glance, to only use the API, and it's been untouched for four years...the Discourse plugin API hasn't changed much AFAIK, so maybe it still works, but not being able to use a local copy of the DB is a problem. Would need work, I guess. https://meta.discourse.org/t/stop-forum-spam-plugin-auto-silence-known-spammers/121037/1
Stop Forum Spam Plugin (auto silence known spammers)

Overview The Stop Forum Spam plugin (unofficial) can help weed out human spammers who are able to bypass Discourseโ€™s built-in spam tools (thanks to their awesome human powers). Right after a new user signs up on your forum (before they have time to post), this plugin will check the userโ€™s email address, forum username, and/or IP address (depending on your plugin settings) against the Stop Forum Spam database. If the user is found in this database of known spammers, their user account will be im...

Discourse Meta
@mcdanlj @DanielMicay @swelljoe Not wrapping their service, wrapping your local copy of the DB with an API to emulate their service

@dalias @mcdanlj @swelljoe Would need to write scripting to download their database, parse it and turn it into an SQLite database and then make a small web service out of it. It would be quite straightforward but I don't really want to spend a day on this.

They reuse the same IP addresses and emails for months or even years across many forums. They mainly seem to use gmail addresses. It seems they index all the forums using each forum software and spam them systematically only partly automated.

@swelljoe Yes. This seems to me to be the main danger from mixing information retrieval systems and LLMs: that knowledge is indistinguishable from bullshit.

@swelljoe

gotta fight it!

One thing I have been thinking is that the non-goo areas are going to become havens for sanity and real human contact, just as fast... and we're going to get very good at finding and protecting them.

@swelljoe I reckon Iโ€™ve been dealing with a few LM-generated bug reports on the LibreOffice Bugzilla lately, itโ€™s annoyingly difficult to know if itโ€™s a genuine newbie report or the next level of spam. Wonderful use of my timeโ€ฆ.
@swelljoe sayโ€™s the chatGBT ๐Ÿ˜‰
@swelljoe have we considered the case of the confidently wrong average white male ? Just saying Iโ€™ve encountered quite a few in my corporate tech sales life.
@thebravelittleposter That's 95% of our user base, and I believe I can differentiate between the two. I was unsure with the first one...thus, me saying "I believe", but then another came in a few hours later from a different user with similar shaped username and similarly nonsense email address (we already block disposable email domains, so they have to have a real address, so spammers are always like g4j3tjjng32 at Yahoo or GMail or whatever). Definitely spam, probably a language model.
@thebravelittleposter I should also say that I include myself in that "confidently wrong average white male" contingent. Not trying to denigrate the users of the software I work on. We're just mostly a bunch of aging sysadmins, which is a category of user from a time and place where mansplaining was the way we bonded/fought over the internet. And, also predominantly white and male.
@swelljoe I am also confidently wrong and male. I get it.

@swelljoe Just wait for the next generation โ€œAIโ€ trained with the bullshit produced by the current generation.

(โ€œAIโ€ is the most inflated term ever. Thereโ€™s zero intelligence in what is currently being done, just massive-scale nonlinear regression. Garbage in, garbage out.)

@swelljoe If you are able to (and haven't done so already) you can always go to ChatGPT and ask "Did you write this? <Suspected text>". It is usually pretty good to give you a yes/no response.
@swelljoe i'm trying to think of why anyone would bother doing something like that. "businesses trying to make the organic part of the internet unusable" is a thought, but it also seems like a bit of a reach. maybe it's someone trying to get training data that's more relevant to the outputs
@swelljoe Weโ€™ll quickly get to the point where we only trust information written by people known to people we know.