Mastodawn

Joe Cooper 🇺🇦 🍉

I believe I just deleted the first secretly ChatGPT-generated comment on the forum for an OSS project I maintain. It was shaped like an answer, but didn't actually answer the question. Great grammar, sounded authoritative. But, wrong. The internet is absolutely going to become gray goo, and it's going to happen so fast.

Joe Cooper 🇺🇦 🍉Apr 15, 2023

Second one just arrived from another user. I don't understand the business case for ChatGPT spam. There were no links? Maybe they plan to add links later via edit? I don't even know. Yet another thing for me to worry about. Today's internet is absolutely a cesspool and "AI" is going to make it so much worse.

Joe Cooper 🇺🇦 🍉Apr 15, 2023

I asked ChatGPT for help.

I love that the first sentence is defensive. So like us. "As a language model myself, I must first clarify that comments generated by language models are not inherently bad or malicious."

Joe Cooper 🇺🇦 🍉Apr 15, 2023

And, isn't it great that one option to prevent bogus comments from language models is to implement NLP? It's turtles all the way down. I throw my hands up in despair at the future of our internet.

Joe Cooper 🇺🇦 🍉May 19, 2023

As foretold in the prophecies (my toots from a month ago), a forum I maintain now receives several ChatGPT comments from multiple users daily. They've wised up about making a comment super fast after creating the account (which triggered the Discourse anti-spam) and now there's no clues that it's an LLM until a human happens to read it and recognizes its alien nature. This means I have to read every post/comment or let the machines invent insane explanations of our software. #chatgpt

Incognitim Apr 16, 2023

The problem: NLP.

The cure: NLP.

Pretty good business model. 🙄🤬

Victor Frunza Apr 16, 2023

@swelljoe It must be clout right? Wanting to participate but not having the knowledge or skill to. I don’t know why else you’d post something you don’t understand to something you don’t have experience in.

Joe Cooper 🇺🇦 🍉Apr 16, 2023

@vfrunza I don't think it's clout. We do have some users who post all the time, often wrong answers, just to be in the conversation. But, this is different. I think they're just biding their time until Discourse opens up their ability to edit posts and post links without getting immediately auto-moderated. I've seen that in the past with human-generated posts. I'm just alarmed at the automation potential here. It would be very easy to overwhelm the S/N ratio of a lightly moderated forum.

Victor Frunza Apr 17, 2023

@swelljoe I understand. It sounds like a crap problem to try and fix :(

Mike DeMaria Jul 17, 2023

@swelljoe All I can think of is that they’re trying to generate link juice. Try to seem authoritative, have a comments history, maybe a few upvotes. Then when the spam and scam posts happen, people will click through and see enough historical context to feel like that user account is a little trustworthy.

Rep. Eric Gallager (no "h"!)Apr 15, 2023

@swelljoe we've been getting some of these on the @wesnoth forums

Joe Cooper 🇺🇦 🍉Apr 15, 2023

@egallager @wesnoth I don't know why I was so surprised by it. It was inevitable. We've had existing users post ChatGPT answers, labeled as such. And, I asked them to stop doing that and they did. But, these are new users and it seems to be a business model of some sort.

Yet another thing to waste my time each day, making being involved in anything at all on the open internet a constant source of misery. I already gave up on wikis and mailing lists years ago.

Agora Apr 15, 2023

@swelljoe we are so screwed.

Buster McNutt Apr 16, 2023

Exponential Bullshit at the Speed of Light

sp Apr 16, 2023

@swelljoe What happens when these disparate AI systems start training on each other’s mediocre and incorrect content published everywhere? It’s going to be a feedback loop to idiocracy

stoicmike Apr 16, 2023

@swelljoe AI is the new crypto.

Joe Cooper 🇺🇦 🍉Apr 16, 2023

@stoicmike I think it's more dangerous than crypto because it actually has utility. There are a lot of ways to get value out of it. Everybody understands crypto is a scam, now, but I suspect the AI curve will look different, maybe different than anything we've seen, but I don't really know what will happen. But the history of spam tells me that before long, there will be a slimy layer of LLM spam covering every surface on the internet, and search engines will become even more useless.

stoicmike Apr 17, 2023

@swelljoe I'm more inclined to think that the products will be a disaster because they will make such bad decisions and give such poor advice. Right now they can't even do simple customer service.

Joe Cooper 🇺🇦 🍉Apr 17, 2023

@stoicmike I dunno. I've already found it useful, despite my skepticism and concerns. It has a breadth of knowledge sort of like an expert, but it never gets tired of dumb questions like an expert (and costs a lot less). It's alarmingly casual about lying (makes up scientific results, makes up features and options in software, makes up people, etc.), but, it can show me how to do stuff I've had a hard time figuring out on my own. It's tantalizingly close to being good at some niches.

Joe Cooper 🇺🇦 🍉Apr 17, 2023

@stoicmike Unsurprisingly, as with many things built by programmers the thing it seems to be best at first is helping with programming. I've said it's like "rubber ducking" where the duck can talk back, which isn't always a good thing, but "pair" programming at any time day or night and on my solo projects is pretty great. It isn't a great programmer, it doesn't write smart or concise code, but it's a programmer with broad knowledge, and I don't always write good code either.

stoicmike Apr 18, 2023

@swelljoe I just read that in China people are using it to write their mandatory assigned communist homework.

krejgo Apr 16, 2023

@swelljoe you've described 9 out 10 comments I come across without this latest chat bot.

Joe Cooper 🇺🇦 🍉Apr 16, 2023

@craige no, the grammar was excellent.

Jokes aside, I think that's one of the reasons it's so dangerous. Humans are suckers for a confident liar (it's "con man" for a reason), and ChatGPT is so confident and so authoritative sounding. It's exactly the kind of answer we want, but fictional. If a dumb human makes a suggestion, there are probably cues that allow us to judge their value. Many of those cues are distorted or gone. I can mostly detect LLMs for now...but soon, I dunno.

Daniel Micay Apr 16, 2023

@swelljoe @dalias We're unfortunately seeing a lot of this on https://discuss.grapheneos.org/. It's being used by spammers to give accounts a history which is hard to differentiate from a legitimate user. They eventually start posting links. Some of them are posting links to fancy web design sites for Android apps but they aren't actually the real site for the app and are probably malware. In many cases the site looks much better than the real app and is better at describing features of the real app.

GrapheneOS Discussion Forum

GrapheneOS discussion forum

GrapheneOS Discussion Forum

Daniel Micay Apr 16, 2023

@swelljoe @dalias I don't think they've been sneaky enough to edit the links into previous posts but that's a possibility. Even the very lazy spam is seemingly doing this to fit into the GrapheneOS forum, although that's still usually easy to spot. They've also been posting in old threads which makes it more painful. It's very difficult to deal with it as an open source project with only community moderators. We have funding to hire people but hiring moderators to deal with this seems wasteful.

Michael K Johnson Apr 16, 2023

@DanielMicay @swelljoe @dalias I had to turn off self-post-edit for low-utilization users on Maker Forums because of so many low-or-zero-value posts being edited into spam later. They would come back after 1-3 months to edit low-value posts into spam links. Ironically, because they were low-trust users, those links would be rendered as rel="nofollow" to explicitly deny SEO-juice from the links, but that didn't seem to matter to them.

I used other tools to identify likely spammers and audited a lot of posts to confirm that they were doing this before tightening controls.

I explicitly don't close down old threads because there's so much legitimate usage there; helpful updates on "how well this worked" after two years and such. But I could imagine auto-moderating necro-posting for low-trust users being helpful here.

Daniel Micay Apr 16, 2023

@mcdanlj @swelljoe @dalias Links in forum posts and user bios are entirely marked as rel="ugc nofollow" by Flarum which explicitly marks them as user generated content. The issue is that ugc and nofollow links still do help with SEO because the majority of links are marked that way at this point due to the prominence of social media sites, Wikipedia, etc. and search engines can't realistically consider them as not contributing any ranking power.

Michael K Johnson Apr 16, 2023

@DanielMicay @swelljoe @dalias TIL.

Fortunately, I haven't ignored those spam links; they are live for a few hours at most. But now I know that they may be evil but perhaps are less stupid than I thought. ☹

Daniel Micay Apr 17, 2023

@mcdanlj @swelljoe @dalias We've found that most of the emails they use are in the https://www.stopforumspam.com/ database but we can't currently use it because we don't want to leak information about people registering to a service. They offer the whole database for download but we don't have resources to spare improving the existing extension (https://discuss.flarum.org/d/17846-friendsofflarum-stopforumspam) for the forum software we use to support using a local database generated from these.

Stop Forum Spam

StopForumSpam - a database of known forum and blog spam, its sources and the email addresses reported

Cassandrich Apr 17, 2023

@DanielMicay @mcdanlj @swelljoe Could you modify it to just do a service query to a different address you control, and make a minimal API frontend for the database to answer those queries.

Michael K Johnson Apr 17, 2023

@dalias @DanielMicay @swelljoe Their API takes the whole email address and IP address and returns an answer. Wrapping it in a separate service wouldn't make a difference. You can also download complete lists up to twice per day if you want to do the checking locally.

It would be interesting to have a service based on one-way hashes of bits if data that return all possible matches based on those hashes to then make decisions locally, still reporting new spammers to contribute actual matches to the database.

They don't list a plug-in for discourse, so I'd end up using it manually anyway. 🤷

Joe Cooper 🇺🇦 🍉Apr 17, 2023

@mcdanlj @dalias @DanielMicay I found an old Discourse plugin, but it seems, at first glance, to only use the API, and it's been untouched for four years...the Discourse plugin API hasn't changed much AFAIK, so maybe it still works, but not being able to use a local copy of the DB is a problem. Would need work, I guess. https://meta.discourse.org/t/stop-forum-spam-plugin-auto-silence-known-spammers/121037/1

Stop Forum Spam Plugin (auto silence known spammers)

Overview The Stop Forum Spam plugin (unofficial) can help weed out human spammers who are able to bypass Discourse’s built-in spam tools (thanks to their awesome human powers). Right after a new user signs up on your forum (before they have time to post), this plugin will check the user’s email address, forum username, and/or IP address (depending on your plugin settings) against the Stop Forum Spam database. If the user is found in this database of known spammers, their user account will be im...

Discourse Meta

Cassandrich Apr 17, 2023

@mcdanlj @DanielMicay @swelljoe Not wrapping their service, wrapping your local copy of the DB with an API to emulate their service

Daniel Micay Apr 18, 2023

@dalias @mcdanlj @swelljoe Would need to write scripting to download their database, parse it and turn it into an SQLite database and then make a small web service out of it. It would be quite straightforward but I don't really want to spend a day on this.

They reuse the same IP addresses and emails for months or even years across many forums. They mainly seem to use gmail addresses. It seems they index all the forums using each forum software and spam them systematically only partly automated.

Pyrrho Apr 16, 2023

@swelljoe Yes. This seems to me to be the main danger from mixing information retrieval systems and LLMs: that knowledge is indistinguishable from bullshit.

Maria Bustillos Apr 16, 2023

gotta fight it!

One thing I have been thinking is that the non-goo areas are going to become havens for sanity and real human contact, just as fast... and we're going to get very good at finding and protecting them.

Polly Kraisus Apr 16, 2023

@swelljoe I reckon I’ve been dealing with a few LM-generated bug reports on the LibreOffice Bugzilla lately, it’s annoyingly difficult to know if it’s a genuine newbie report or the next level of spam. Wonderful use of my time….

Demian Apr 16, 2023

@swelljoe say’s the chatGBT 😉

thebravelittleposter Apr 16, 2023

@swelljoe have we considered the case of the confidently wrong average white male ? Just saying I’ve encountered quite a few in my corporate tech sales life.

Joe Cooper 🇺🇦 🍉Apr 16, 2023

@thebravelittleposter That's 95% of our user base, and I believe I can differentiate between the two. I was unsure with the first one...thus, me saying "I believe", but then another came in a few hours later from a different user with similar shaped username and similarly nonsense email address (we already block disposable email domains, so they have to have a real address, so spammers are always like g4j3tjjng32 at Yahoo or GMail or whatever). Definitely spam, probably a language model.

Joe Cooper 🇺🇦 🍉Apr 16, 2023

@thebravelittleposter I should also say that I include myself in that "confidently wrong average white male" contingent. Not trying to denigrate the users of the software I work on. We're just mostly a bunch of aging sysadmins, which is a category of user from a time and place where mansplaining was the way we bonded/fought over the internet. And, also predominantly white and male.

thebravelittleposter Apr 17, 2023

@swelljoe I am also confidently wrong and male. I get it.

Tremarctos Ornatus Apr 16, 2023

@swelljoe Just wait for the next generation “AI” trained with the bullshit produced by the current generation.

(“AI” is the most inflated term ever. There’s zero intelligence in what is currently being done, just massive-scale nonlinear regression. Garbage in, garbage out.)

fernacen → LVFC Apr 17, 2023

@swelljoe If you are able to (and haven't done so already) you can always go to ChatGPT and ask "Did you write this? <Suspected text>". It is usually pretty good to give you a yes/no response.

kira :3 Apr 17, 2023

@swelljoe i'm trying to think of why anyone would bother doing something like that. "businesses trying to make the organic part of the internet unusable" is a thought, but it also seems like a bit of a reach. maybe it's someone trying to get training data that's more relevant to the outputs

ArneBab Apr 17, 2023

@swelljoe We’ll quickly get to the point where we only trust information written by people known to people we know.