building a little log analyser thing and testing it out on my site and blog

holy shit people are not joking about the AI bots. 92.2% of my total site traffic is from eight user agents directly affiliated with AI. another 6.1% is from SEO related companies that have some sort of AI offering.

only 0.3% of my traffic comes from a regular browser. most of the remaining 1.4% is fedi servers pulling previews, plus some RSS readers grabbing posts.

I don't get a ton of traffic there anyway, it's not like thousands of people read my blog, but wow is that far more bleak than I ever imagined.
one thing I did notice is that there are quite a few user agents that point to bots that purport to be news aggregators or similar types of sites, and when you check the site out they have a flashy facade that looks like some sort of "collect your favourite stories and news sources" kind of thing, except there's no signup, no login, nothing, and when you look up the company owners they've got AI stuff all over their linkedin. almost certain these are just fronts for training data collection.
@gsuberland your stats track what I’m seeing

@fbarton @gsuberland Ditto. They don;t even seem to check the stuff they are pulling down has changed since last time. I was burning 10 gig a month serving exactly the same stuff to the same bots.

I've got some mitigation in place now.

@dtl @gsuberland ya’know… if you’re lying for egress, that’s very fair. Self hosting means that that is a lot less important to me