People trying to train AIs are now complaining that all of the AI data on the internet are making it hard for them to get quality training sets of natural language and images.

*bitter snickering*

@futurebird The main players have a big advantage, #Google can already detect #AI #content because they have been training #algorithms for so long the small players don't have that advantage. I would suggest using data from before #ChatGPT became popular with end-consumers. The good thing for small AI companies is, they don't get Robot.txt & #Ip blocked (i think >15% of major sites are blocking main AI scrapers) so they still have access to those data pools which are also guaranteed not to be AI
@madeindex @futurebird That's my concern, too. The internet could (should) have been all of us collectively participating for fun, our shared experiences providing the growth medium to make useful tools for everyone. If the internet itself isn't a viable place for shared data, then we're beholden to large companies that can afford to make their own data. The internet has unleashed nightshade in the communal garden, but the giants can afford to move to their own greenhouses.
@josephc @futurebird
The #internet is being taken over by the big Corps, most people > spent most time browsing their products already > seeing & acquiring the information that they want them to. The only solution seems to be #decentralization but I'm not sure on the future there, as the corporations are replacing the computing power on peoples devices with cloud power from their servers. If they control most computing power they they will also control this. Are most #Mastodon on own #hardware?