People trying to train AIs are now complaining that all of the AI data on the internet are making it hard for them to get quality training sets of natural language and images.

*bitter snickering*

@futurebird The main players have a big advantage, #Google can already detect #AI #content because they have been training #algorithms for so long the small players don't have that advantage. I would suggest using data from before #ChatGPT became popular with end-consumers. The good thing for small AI companies is, they don't get Robot.txt & #Ip blocked (i think >15% of major sites are blocking main AI scrapers) so they still have access to those data pools which are also guaranteed not to be AI
@madeindex @futurebird
Afaik chatgpt content can't be discriminated from "natural content". I would like a source for that. Also having a model for that would be quite energy consuming/raise costs for just indexing/finding proper training data. Also lots of content is also hybrid. So Im not convinced by that argument.

@Zeugs @futurebird

#Google already indexes and analyzes the Internet's content for their search engine.
Their algorithm has already been detecting #AI content for years, but their terms are not strictly against it, as long as the quality is good, that is why you will sometimes find AI content in the #search results.

Read about it a couple times here is a source:
https://contenthacker.com/can-google-detect-ai-content/

Can Google Detect AI Content? Here's What You Need to Know

Can Google detect AI content? Yes - but Google's revised E-E-A-T guidelines and Danny Sullivan's take on AI content creation have changed the game.

Content Hacker
@madeindex @futurebird
The open ai classifier is no more and the article is from Feb 2023! There has not been that much improvement that's true.
But just because google says you can use AI content it does not mean that they can detect it. With this definition they don't have to even check. If you would put all Google crawled content through GPT detector it would be quite expensive and these classifiers things never worked reliable.
@Zeugs @futurebird
I would argue #AI detection is already a part of what #Google does and requires no extra step, as they do Natural Language Processing to understand the content anyway.
They can even understand text in images and the images themselves via #OCR (which requires much more computing power).
Article #Spinning (AI rewritten content) has been around for long & i think they are very good at detecting it, they just don't seem to want to remove it.
April 3 2024:
https://nealschaffer.com/can-google-detect-ai-content/