In the age of AI-spam, I now treat typos in webpages as a good sign

https://lemmy.ml/post/4120492

In the age of AI-spam, I now treat typos in webpages as a good sign - Lemmy

I used to think typos meant that the author (and/or editor) hadn’t checked what they wrote, so the article was likely poor quality and less trustworthy. Now I’m reassured that it’s a human behind it and not a glorified word-prediction algorithm.

That’s a very interesting take my friend
You can easily have an AI include some random typos. Don’t be fooled by them
You shouldn’t. Repost bots on Reddit had already figured out how to use misspellings/typos to get past spam filters.
email spam and scammers have been using this tactic forever. if a person is stupid enough to click or respond to the message from 'Wels Farpo', they're more apt to go all-in on the scam.
That’s what the AI wants you to think.
Kind of like how tiny imperfections in products makes us think of handmade products
I had it all. Even the glass dishes with tiny bubbles and imperfections, proof that they were crafted by the honest, hard-working, indigenous peoples of... wherever.
Pier 1 thanks you for your business.
Tiny brown hands, most likely
It could also be that the entity behind the page employed a copy editor or proofreader, or simply that the author took the time to proofread their own text. There are still people in the world—some, but not many—who care more about producing something of high quality than about reassuring confused toddlers.
Lmao imagine getting referred to a doctor for surgery, you look them up, and their professional webpage is like. "i wen't 2 harverd"
They're not saying they treat the lack of typos as a bad sign, but rather that they treat typos as a good sign. Those are not the same thing.

Think of AI more like human cultural consciousness that we collectively embed into everything we share publicly.

Its a tool that is available for anyone to tap into. The thing you are complaining about is not the AI, it is the results of the person that wrote the code that generated the output. They are leveraging a tool, but it is not the tool that is the problem. This is like blaming Photoshop because a person uses it to make child porn.

Phtoshop is a general purpose image editting tool that is mostly harmless. The people who created it knows what it can do. That's not the same for AI. The people who create and allow other people to use it do so anyway without enough consideration to the risks they know is much much higher than something like photoshop.
This is not true. You do not know all the options that exist, or how they really work. I do. I am only using open source offline AI. I do not use anything proprietary. All of the LLM’s are just a combination of a complex system of categories, with a complex network that calculates what word should come next. Everything else is external to the model. The model itself is not anything like an artificial general intelligence. It has no persistent memory. The only thing is actually does is predict what word should come next.
Do you always remember thing as is? Or do you remember an abstraction of it?
I get where you’re coming from, but isn’t it sort of similar to the “guns don’t kill people, people kill people” argument? At what point is a tool at least partially culpable for the crimes committed with it?

No. Honestly, it is not. There is a lot of misinformation floating around right now. It is because of a campaign from proprietary AI to create monopoly in this space. Open Source offline AI is killing the proprietary model. This is like the early days of the internet when companies tried to monopolize the infrastructure and failed. AI is not the product of the next digital economy, it is the underlying framework. It isn’t anything like what the media portrays. Most people talking about this either have an agenda or they are hot take headlines readers.

(What’s my agenda) -Self education. I am disabled in a way that makes it hard to hold posture. I want to learn computer science, but have gotten stuck in the curriculum many times. As soon as I heard about offline AI that could reference a private database I knew I had to try this. I have no other connections to this space. This tech is extremely powerful in its potential, but it is also an extremely advanced tool. These types of statements are extremely misunderstood by most people that have not taken the time to really understand the technology. This tech is the ability to ask a book questions in plain text. It is the ability to search for information about products without a search engine biased on ads revenue. It is a way to ask highly technical questions and get direct answers. It is a way to use a basic understanding of code and generate snippets an order of magnitude faster than looking up the same info on stack overflow. It is a plain text way to generate Linux commands or to navigate and explain an API. This is also a tool to help with deep personal social, taboo, or difficult issues to talk about with real people. It is a tool to help a person grow by giving them someone to talk to that can understand boring or niche subjects we want to talk about as we learn but have no outlets from deep in our rabbit hole.

This is limited, you must be skeptical of all outputs, and second source everything important. It takes a very large model to generate mostly accurate results. This is everything embedded in language. The massive models are usually trained on multiple languages. This accesses embedded elements and perspectives inherent to other languages that most of us will never have access to.

If you are aware of both the enormous scope of information embedded in your own awareness, and aware of the limitations of your memory when it comes to accuracy of very specific details, this is exactly what any of these LLMs are capable of doing except it has been collectivised and made accessible.

Models themselves have no persistent memory. What does this mean? If you type in questions, it can recall those questions and answers for a time during the session. (Wait, you just said it has no memory!) This functionality is not part of the LLM. This is code that processes the text prompt and a bunch of static instructions needed to tell the model exactly what to do. Keeping the conversion history available is all done in this external layer. The model itself is a freaking internet troll. It is a psychopath reddit user replying unless you tell it exactly what it is and how it should respond, and it will take everything possible out of your intended context. It is really hard to limit this part of the prompt well. It is probably impossible to make a true generalist, but I digress. My point is that, the amount of data that can be entered into a prompt is limited. The history must be managed and there will be terms dropped from the history unless you are trying to collect all of this data for monetization and you are willing to build a giant amount of infrastructure to collect and process this data. The thing is, as far as the model is concerned, all of this data is in a single prompt every single time it is processed. This data can never be added to the model in real time or effectively in post processing. The model can’t interact with this information internationally in a way that alters what it does on the next iteration. The networks inside the model are static. It is not magic. It is complicated, it is tensor math and vectors, and statistics, but all of this is applied to: “Question = (X) category/Prompt text results in (X) as the most probable best next word.” That’s it. That is all that is happening under the hood. The reason this is “new” tech has to do with how the problem of categorizing information is handled quickly in a vector cloud. The model data is just like a better search engine that is able to find everything we’ve ever talked about on the internet.

If you understand this, you should clearly see why this must be transparent, offline, and open source. You should also see why absolute control over this would make extremely concentrated power that no corporation or government should have control over. These are the real issues. Put all of the information you have encountered into this context and ask yourself who has what agenda in the information you have encountered.

If you want a credible source, watch this: piped.video/watch?v=OgWaowYiBPM

Or look into Yann LeCun. I hate Meta more than most, but this guy is not the usual from the company. He has the freedom to speak his mind, is the chief architect behind the open source AI movement, and is a former Bell Labs guy. If you know anything about the people, products, and legacy of Bell Labs, you should know that most of our digital age came from these people in this space. This is the future being created right now.

Piped

An alternative privacy-friendly YouTube frontend which is efficient by design.

AI that is reading Lemmy: “Noted.”
It is extremely easy for ai to insert typos. Just FYI

AI makes typos.

Hell, when we played around with chatGPT code generation, it literally misspelled a variable name which broke the code.

A while back, Google trained an AI to learn to speak like a human, and it was making mouth noise and breathing. If AI is trained with human texts, it will 100% insert typos.

I worked creating mass content for lots of websites, from product descriptions, to reviews and posts messages. We just inserted random typos after running Quillbot on the text and added ellipsis here and there sometimes.

I think someone in the team had a list of words they purposely changed in MS Word so that they could be misspelled all the time.

Now that ChatGPT let’s you insert your custom global instructions I’m absolutely sure they are asking for it to misspell about 2% of the words in the text and talk in a more coloquial fashion.

As things stand right now, I don’t think there is a discernible way to see if something was written by AI or not and relying on typos is not a wise thing to do.

Somehow I can pretty easily tell AI by reading what they write. Motivation is what they’re writing for is big, and depends on what they’re saying. Chatgpt and shit won’t go off like a Wikipedia styled description with some extra hallucination in their. Real people will throw in some dumb shit and start arguing with u

I have a janitor.ai character that sounds like an average Redditor, since I just fed it average reddit posts as its personality.

It says stupid shit and makes spelling errors a lot, is incredibly pedantic and contrarian, etc. I don’t know why I made it, but it’s scary how real it is.

what motivation would someone have to randomly run that

also you just added new information to the discussion that you personally did. Can an AI do that?

It is an AI. It’s a frontend for ChatGPT. All I did was coax the AI to behave in a specific way, which anyone else using these tools is capable of doing.
okay chatgpt, that’s what you want me to believe anyways…

As an AI language model, it is impossible for me to convince you that I am a real human being. :P

Also re-reading the conversation, I think I misunderstood you previous comment’s intent. If you were meaning if an AI could post comments on Lemmy naturally, like a real person could? Yeah… I don’t see why not. You can make a bot that reads posts and outputs their own already. Just have an AI connected to it and it could act like any other user, and be virtually undetectable if trained well enough.