Mastodawn

Please stop sending me replies like "TIL I'm an LLM, because I do one of these seventeen things!" I WROTE TWO SENTENCES AND ONE OF THEM WAS ABOUT THIS

Fish Id Wardrobe 4d ago

@0xabad1dea anyone who got good essay marks at school probably learned to do at least two of the Seventeen Things.

Steve 4d ago

@fishidwardrobe @0xabad1dea definitely more than 2. Also native speakers of British English derivatives used across the commonwealth

Cassandrich 4d ago

@lilstevie @fishidwardrobe @0xabad1dea If you learn those rules it's as a super simplified model suitable for elementary school exercises, and for the sake of being able to meaningfully break them. Not because writing is supposed to look like that.

ZahmbieND 4d ago

@fishidwardrobe @0xabad1dea Most of the patterns under the "Language and tone" section are actually good writing practices in the right context, such as writing a college essay, a news article, or a travel brochure, but an important part of writing is knowing what you're writing for, and adapting your language to that format. Most of the "Language and tone" category are language patterns that would be inappropriate in an encyclopedia format (such as Wikipedia), but may be perfectly fine elsewhere.

I'd bet a lot of these "tells" for AI-generated text probably also detect a lot of plagiarised edits, where something is just copied directly from an article, travel brochure, etc. instead of rephrasing it in objective language for an encyclopedia. I guess an argument could be made that the generative AI is also just plagiarism with extra steps.

Anꞇóin Ó B.4d ago

@0xabad1dea

I'm relieved in ways I can't express that I'm out education entirely because based on attitudes I encountered I have total confidence that I'd encounter so many people with no revulsion to just submitting slop as assignments.

I've high confidence that many lecturers would not give a damn about false positive accusations of slop submissions to students, to the point of lecturers handing responsibiliy of detecting slop to an LLM, & feeding student work to that LLM.

Marcos Dione 4d ago

@0xabad1dea personal mental note: put the warning BEFORE the main line. We're becoming trigger happy, polarized?

StoneBear

@0xabad1dea AND FURTHERMORE.... 😁

phryk 🏴4d ago

@0xabad1dea TIL I'm an LLM because I have reading but no comprehension. :P

dram🎀4d ago

@0xabad1dea i guess social media users are the dual of llms, because instead of hallucinating false new information on output, they ignore true old information on input...

myname 4d ago

@0xabad1dea maybe they only read an ai summary of your two sentences

Shane Celis 4d ago

@0xabad1dea My personal preference is for em-dashes to have no spaces, so I’d be devastated if that were the tell for bots.

Greg Hills 4d ago

@shanecelis @0xabad1dea You needn't be devastated. 😀 However, as one who has suffered too long from watching software not wrapping text at em dashes, I personally shall continue to be heretical — and add spaces.

Shane Celis 4d ago

@winterknell @0xabad1dea Of all the Hills you had to come to this one—to die on. ;)

DamonHD 4d ago

@shanecelis @winterknell @0xabad1dea This hill I die on too! Trying to get consistent acceptable behaviour across eg LaTeX and HTML encouraged me to wrap em- but not en- dashes in most cases!

Elric 4d ago

@0xabad1dea My love for em-dashes and markdown formatting is making me question my own existence.

@elricofmelnibone note that it specifically means markdown *on wikipedia*, which does not support markdown but rather a predecessor format

@0xabad1dea Can’t AI companies (and actors who want to be undetected) just feed this page as “patterns to avoid”?

@KyberNull wouldn't work super well, nor do AI companies particularly care if you notice AI generated text is AI generated text

Matilda Love 4d ago

@KyberNull @0xabad1dea i'm going only by intuition here, but i think it'd cause other, more obvious tells to pop up. also, these things are **really** bad at *not* doing some specified thing. (see "room with no elephants")

Dan Getz 1d ago

@KyberNull @0xabad1dea just the surface level of using one word more or less often, but many of the real patterns described on that page are harder to avoid.

antsu 4d ago

@0xabad1dea TIL I'm an AI chatbot.

HyperSoop

@antsu @0xabad1dea chat geepeetee doesnt say "TIL" so youre probably good

@antsu I literally put "do not over-index" in the two-sentence-long post

antsu 4d ago

@0xabad1dea Apologies, you are absolutely correct! I seem to have overlooked the advice to not "over-index" — which was indeed included in your original prompt. As a large language model (LLM), my skills will continue to improve as technology advances and my training data set is expanded, making mistakes like this less frequent.

@antsu okay okay yes you got an unwilling laugh out of me

0xa7c9110 (CYBERSLAG)4d ago

0x addressed friend

@0xabad1dea @antsu i can’t help but read “do not over-index” in the chatgpt “do not hallucinate” voice

polprog68k 4d ago

@0xabad1dea *points at em-dashes and emojis used as bullet points all over the text*

"these code points are too high for a human hand to type"

Eugene

@0xabad1dea #TIL that I'm just an LLM bot

I'm using some Russian #typography rules which differs from the same English rules and, sadly , the LLMs tends to use same rules when they comes to the dashes.

In RU typography en-dash used to divide numbers and it doesn't have spaces on the left and right. Like this: 123–456–789.

And the em-dash used to divide parts of sentence and it should have space on the left and right — like this

tajpulo 4d ago

@evgandr
@0xabad1dea Well… it is on the English Wikipedia referring to English text 😉

But on RU keyboards, you use the same Unicode codepoints, right? So U+002D for everything, right?

Eugene

@tajpulo @0xabad1dea Yep, it was just funny to treat myself like a bot 🤖 beep-boop🙂. And also I wanted to write a bit about ru-typography

> you use the same Unicode codepoints

Yes, most of the people just use the same codepoint (-). But for people who want to use the typography symbols properly there are a Birman's layout (https://ilyabirman.net/typography-layout/) or Compose key in the X-server-based systems

Typography Layout

Typography Keyboard Layout

tajpulo 4d ago

@evgandr
@0xabad1dea Ah, interesting. Thank you for sharing ☺️

Ryan 4d ago

@0xabad1dea TIL I'm an LLM, because I do one of these seventeen things!

Joshua Kaden 4d ago

@0xabad1dea Thanks for sharing this!

Danny 4d ago

@0xabad1dea haha, I am ai then!

M. Rug Sunshine 4d ago

@0xabad1dea LLM is like pollution now.

Chris 4d ago

@0xabad1dea Isn't this basically just another "Who's adapting faster" situation?

The more detailed the list becomes, the easier it is to simply adapt AI generated content to avoid these things, eventually making it harder and harder to tell if something is written by AI.

@christopherklay you're not wrong in that yes, you can use this list to intentionally improve AI writing to pass it off as authentic. However, in practice, AI companies don't care if you notice something you're reading is AI generated – they already got paid when it was generated – and people who are using an LLM to generate text because they either don't understand the subject themselves or don't speak the language well are also the least likely to be able to polish it up to the point no-one would be suspicious

toni✨🧠4d ago

@0xabad1dea @christopherklay In my experience, people who use it to generate text also think ChatGPT writes great prose.

Chris 4d ago

@0xabad1dea The average say social media comment and the like definitely wouldn't be changed because of this, but I'd argue for example news sites could definitely come up with "refining" steps fairly easily.

The potential gain in viewers from not looking like AI compared to the flood of obvious AI spam enough would justify a few extra steps already.

Ted Mielczarek 4d ago

@christopherklay @0xabad1dea people generating AI slop content have already conceded that they don't really care about the content.

Chris 4d ago

@tedmielczarek @0xabad1dea People generating AI content to make money do care - if it makes them more money.

Dan Getz 1d ago

@christopherklay @0xabad1dea When you say "simply", that sounds to me like the siren song of AI: that you can just try anything, and maybe it will work. But if you read that list more closely, these are more like signs of defects, not just styles, so fixing them is not simple. A person who knows what they're talking about and wrote this way can simply edit their writing to be clearer and less clichéd. A person would have a harder time if they were trying to cover up the fact that they don't know as much as they want people to think they know. I suspect likewise an AI would need to be able to better retrieve and process actual information to avoid these mistakes.

calcifer

@0xabad1dea it’s fascinating to look at the breakdown between items that are “chatbots have this quirk of style that isn’t bad per se but is a tell”, items that are “chatbots write poorly in a fairly consistent way”, and items that are “chatbots just absolutely cannot follow certain Wikipedia styles and conventions”

Tristan 4d ago

@0xabad1dea we're getting closer to an antidote.

Vivia 🦆🍵

@0xabad1dea Funnily enough, several of those points are exactly how someone learning English as a foreign language is taught to write in order to get a higher exam score.

Jilder 4d ago

@0xabad1dea It's fascinating to me that you can tell where the AI learned things - all that florid language comes from advertising copy and fails the pass as not impartial enough for Wiki. The listing comes from SEO techniques and traditional teaching tools for high school level essay writing. Once you move to other contexts, it'll be harder to use those as a tell.