Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later
Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later
If you want to avoid Reddit tracking altogether a redlib instance will let you do that https://libreddit.projectsegfau.lt/r/conspiracy/comments/170e8dp/reddit_comments_are_full_of_bots_reupload/
Yes that’s a concern here as well but it’s pretty easy to run your own instance in Docker or whatever
Oh man, I would browse while on the shitter at work. It used to be one of my OGs. A lot of tinfoil. And you’d get the deep dives that didn’t feel politically motivated (compared to today).
Then, the Trumpeting.
Like everything else not stapled down circa 2016, it was an easy target for the Russian firehose of falsehood: an entire community of people wanting to believe some alternative bullshit.
My understanding of how this works is that that left one is real accounts making real comments, at least in the majority.
Then when the link gets reposted, either by a bot or naturally, potentially depending on the title, the bots scrape the old comments and post them.
It’s content farming. And Reddit is probably okay with this.
import difflib in python)
To compare every comment on reddit to every other comment in reddit’s entire history would require an index
You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads? A cursory glance at their engineering blog indicates they perform much more computationally demanding tasks on comment data already for purposes of content filtering
you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much
Analytics workflows are never run on the production database, always on read replicas which are taken asynchronously and built from the transaction logs. They likely have an ETL tool
Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.
Reddit’s entire monetization strategy is collecting user data and selling it to advertisers - It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement
You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads?
I’m sure they have, but an index doesn’t have anything to do with the python library you mentioned.
Analytics workflows are never run on the production database, always on read replicas
Sure , either that or aggregating live streams of data so I don’t even need a read replica, but either way it doesn’t have anything to do with ElasticSearch.
It’s still totally possible to sync things to ElasticSearch in a way that won’t affect performance on the production servers, but I’m just saying it’s not entirely trivial, especially at the scale reddit operates at, and there’s a cost for those extra servers and storage to consider as well.
It’s hard for us to say if that math works out.
It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement
You would think, but you could say the same about Facebook and I know from experience that they don’t give a fuck about bots. If anything they actually like the bots because it looks like they have more users.
Look at the picture above - this is trivially easy. We are talking about deduplicating reposts, not running every user account through the Turing test
If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot
Of course it's not. Nor do they want to.
I think the person you're talking to thinks all bots are like the easy ones in this screenshot.
The low level bots in OPs screenshot, sure, because it's identical. Not the rest.
I used to hunt bots on reddit for a hobby and give the results to Bot Defense.
Some of them use rewrites of comments with key words or phrases changed to other words or phrases from a thesaurus to avoid detection. Some of them combine elements from 2 comments to avoid detection. Some of them post generic comments like 💯. Doubtless there are some using AI rewrites of comments now.
My thought process is if generic bots have been allowed to go so rampant they fill entire threads that's an indication of how bad the more sophisticated bot problem has become.
And I think @phdepressed is right, no one at reddit is going to hunt these sophisticated bots because they inflate numbers. Part of killing the API use was to kill bot detection after all.
Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.
no one at reddit is going to hunt these sophisticated bots because they inflate numbers
You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and advertising decisions they have a vested financial interest in making sure they are focusing on organic users.
I saw this exact same style of bot account years ago on Tumblr. They always follow the same naming scheme: one word or two words combined and then a string of 4 digits. I bet if you go to any of their profiles, you’ll find like 4 comments that are all copied from old threads and a bunch of upvotes on completely random subs, possibly even all of them being on other bot accounts’ posts and comments.
The real question is whether they’re being used to fake activity on Reddit, sway public opinion by posting this sort of political slant, or will they later be used to advertise scams and this is just to make them seem legitimate.
I thought the names followed that format because that’s the format reddit used for suggestions when signing up.
I think the accounts are kind of “warmed up” this way to make them harder for reddit to identify as bots when they’re used for vote manipulation.
Like a bot that just voted in /r/politics threads world be easier to identify than one which comments here and there and gets a few upvotes itself.
No, the left one is older and most the names in the right contain four numbers.
What’s going on here?
Maybe op updated the picture?
I did, because other people complained in another comment that it was confusing to not have the older thread on the left.
Anyway, it’s pretty obvious which one is which one
Hey, you’re not op! Is another bot!
…
Oh right, I’m on Lemmy.