POLL: Should we allow content migration bots?

https://lemmynsfw.com/post/67711

POLL: Should we allow content migration bots? - Lemmy NSFW

Bots of this type have appeared recently, and people are asking if it’s okay to use them. I’m not sure about this either, so I think it would make sense to ask users. These bots follow some subreddits on Reddit and automatically post it to Lemmy when a post is created there. I’ve seen an example site for it: lemmit.online [https://lemmit.online/post/177]. This instance is dedicated solely to mirroring Reddit posts to the Lemmy instance. Maybe instead of mirroring to a community on Lemmy NSFW, we can subscribe to lemmit.online [https://lemmit.online] via Lemmy NSFW. This way we could have kept Lemmy NSFW free of bots. Currently, even if accepted, I believe it should be done under admin control anyways. Here is the poll: https://strawpoll.com/polls/PbZqRw82byN [https://strawpoll.com/polls/PbZqRw82byN]

How tf sum of 69, 64, 33 can be 120?

Yeah, and the whole thing add up to 138.33% on yours. I see something more sane:

Yeah, and the whole thing add up to 138.33% on yours. I see something more sane:

Does lemmit.online support NSFW communities and subreddits?
I guess yes. Even if it doesn't, supporting ones will show up soon probably.
Even as NSFW subs become unavailable via API at the end of the month?
People will just swap API usage with web scraping.
If it's got its own API key, it'll probably stay under the limits, and if not there's other ways like RSS/Atom or web scraping.
I believe lemmit.online scrapes subreddits using RSS feeds, so it will likely continue to work until Reddit does away with RSS feeds.

If we leave the reposting to lemmit, people can easily opt out from it showing up on their All feeds by blocking the singular bot that does all the posts, whereas if we allow it here people would need to individually block each bot.

However, Lemmit only reposts Reddit, there is at least one bot here that reposts one of the rule34 boorus, and I'm sure other sites are to follow. Those seem to be the real issues here.

Absolutely. I'm on side of don't allowing or at least bot only communities. If we mix bots and real users content with each other, we fucked up.
I would prefer it if bots only reposted approximately 10-40% of the content that receives the most votes.

As others have said, I think it makes sense to have the top 40% of posts of all time, or something like that, reposted. As a mod for several subs here, those bots would help a lot when seeding content, and could even help me crank out posts of new, relevant content in the future.

I don't think it's unreasonable for the amount of posts a bot can make a day to be limited. Or maybe a time frame that we allow content migration en mass like x date to x date, the bot posting is limited.

Thanks for the notification of this post @[email protected].

I have created a script that would take the top (user-selectable) 0-1000 posts of a subreddit and post them to a Lemmy community. My plan was then to implement a vote threshold so that posts older than 48 hours and above a user-defined karma limit would be pulled in each time it was run - however the account login no longer works so I assume it and its posts were purged, so I'm here instead!

I do think that in order to get people engaged, we need content to draw them in. I noticed that once I'd posted 50 items across I immediately started getting subscribers to the community.

What I don't think is right is using bots to just replicate all the content on Reddit. As a moderator of several subs, a lot of content gets removed through moderation (hence the 48 hour limit), and a lot of junk gets through but just doesn't get upvoted (resulting in the karma threshold). Avoiding the "rubbish" would be good.

My view is that using bots/scripts to seed communities means we can kick start them into life much more quickly, and then when a critical mass of users is reached they become irrelevant and can be disabled. I don't think we're here to just copy and paste from Reddit - otherwise surely you'd just go there instead.

from a 'get as much content as possible' angle bots are good but i think there's a big risk of them drowning out actual posters which really takes away a lot of the fun of posting and making it feel like an archive rather than an active community.
Totally agree. IMO they have a limited use to get initial content seeded and then it's over to actual members of the community to continue and develop. Other instances are focusing more on the "archive" aspect so we should let them do that.
For setup communities i agree, but for communities with few people right now it can be a very large undertaking to keep them alive manually, and the boost of content can really help growth for migrating communities
Agreed. I think organic growth is better than essentially being nothing better than an RSS feed that copies content from another site because otherwise what's the point when I might as well go to the original source in the first place. I want this community to be separate and grow into its niche on its own.
The issue I have is various communities on reddit are disrupting that platform ( with good reason ) . We could end up mirror a bunch of just crap if that happens with one of the communities we are mirroring. We would then be using up server resources to host garbage. It might just be better to sub to those communities that are mirroring.
I'm seeing a lot of bots archiving imgur links, too. The content was removed ages ago, it's just a dead link.
right and do we want just a bunch of dead links? I think not.
Honestly I think that's botters being a bit lazy... like anyone whose been in these communities should know that all imgur links are dead now, there should just be a line exlucing those from their pool of links to pull from
I think that would be super beneficial to help jumpstart this site. Maybe at some point communities can do it the other way around, so that the „main“ branch of the community is on lemmy and can port over the follower of reddit
If something like that should be allowed, I think it should be marked with a tag/flag/flair (if that's a thing on Lemmy) so that you can easily filter them out if you want to.
There is a setting on an account that says it's a bot account and I think I remember seeing it mentioned after the username in a similar way to how nsfw posts are flagged.

I am voting a no, and have been blocking bots whenever I see them.

I understand trying to jumpstart communities, however I feel bots should be limited to 1 post every 6-12 hours if allowed at all. And honestly, I think bots reduce the quality of a community regardless, especially if you are basing it off metrics on Reddit, where the top posts in many subreddits aren't necessarily the the nicest pictures, just the ones with the most professional setup and money to burn on bots upvoting them for exposure.

I voted no, although I nearly voted for the "only in certain comms" option. I've got the beginnings of a tool that will make it easy for people to select content from the reddit backups, but I didn't intend to make it automatic, just something that helped people find content.
I voted no. My biggest concern is that Reddit's legal team would try to take the instance down and then we would have to start all over on a different instance.
An instance dedicated to NSFW replication off Reddit could be a workaround. Then at least if Reddit tries to sue it wouldn't effect this instances native content.
It is not as if an instance is a legally distinct entity.
Are they really not? Distinct from what?

it definitely it'll is its own legal instance the people hosting the instances are responsible for the content served.

that being said there isn't any legal risk with this so it should be fine

The problem that I see with that, is that these bots are mostly being used for kickstarting/seeding communities that want to turn the bots off and become a full user active sub later. Essentially it's a way to solve the chicken and egg problem of attracting users to generate content by having good content
Totally agree, great idea.
Reddit doesn't own that content, individual posters do. They do grant Reddit a license to display it to others (in the terms of service nobody reads) but they are still the legal owners of their posts (assuming we're talking about OC)
Even more reason to not simply mirror the content as the bot owner does not have permission from the content owner to post it here.
I vote no, subscribing would be better
I think we should not have them or else Lemmy will only be known as the platform that copies from others. Also there could be legal issues when the platform the content is copied from has an ill-minded CEO who wants to shutdown others.
I do you think there are grounds for attacking Lemmy or an instance for hosting all/some of reddit's user generated content?
from a legal standpoint I can't see how there is, reddit doesn't own any of the content it serves. it's terms of service do require they give the right for them to display it, but not ownership of the content
What do you mean? Just because someone has uploaded their nudes to Reddit doesn't mean you have the right to upload it somewhere else? It's a clear violation of copyright. And I'd suspect a sex worker advertising their onlyfans is not going to be too happy about it.

I'm responding to weather there is a risk of reddit seeing not the individual creator creating a take down request.

If an individual creator got a bee in their bonnet and decided they were OK having their posts on one forum but not the other they could request it be taken down, but reddit themselves couldn't. I think overall since we are pulling from a free non gated site that's a minimal issue, as people like only fans creators are using this explicitly to show people free content to get them to go to their paid content. It would be like them getting mad you reposted or shared their original post

Can't vote. There's a recurring error that says: "Timeout, please try again."
I feel a little yes, but mostly no. I'd LOVE a bot that'd scrape r/SpaceX content for example, because I want to be cold turkey on reddit, but I miss that community so much. But ultimately it's counter productive to organic growth.
I think content import should be a thing across Lemmy, most users moving over have tons of content they've posted on Reddit, and having an easy way to bring that here would be great. But Lemmy isn't really built to handle bulk imports yet, if you simply hit the API of your instance it will flood /New on every instance that's indexed whatever sublemmy is being imported, and it will severely disrupt the use of the sub for a while. If content could be backfilled directly to the database with earlier timestamps it could be done smoothly, though.

I the past few days have been working on a quick c# app to simply pull the last 24 hour posts on a subreddit, and allow me to click a button to upload them individually here to kickstart a community to replace the subreddit. I think like what alot of people are saying these tools can help seed communities and boost engagement. i think in the long term we should start to block these tools, but i dont think the time is yet. This is still a new group, and while some communities on this instance are sizable, others are way to small to be sustainable yet :/

Also If we could hopefully prevent the massive amount of content on reddit from disapearing, and give a place to preserve that that would definitly be amazing. although thats easy for me to say as the one who isnt hosting the files :/ (although for some communities the actual data for that isnt that significant, and since lemmy doesnt support galleries yet, galleries are still hosted offsite)

Anyway thanks so much for the work you all are doing, I appreciate that you all went out of your way to enable communities and users to migrate off of reddit and hopefully to help form a better platform.

Hmm I was thinking of making the same. Some communities are just dead till they get bootstrapped with some little spam and then ppl feel encouraged to participate.
Someone else here had a smarter idea honestly of haveing a 48 hour buffer and using the reddit apps up vote ratio, and total karma to filter out alot of the spam. Honestly when I built this I had missed that part of the api, so I'm probably going to rework it in a bit
Id be super interested in doing something like this, using a bot to "seed" a community with content until it picks up steam. Not like 50 posts all at once but maybe one ever few hours to help people find the comunity
I think this will work well if it is very limited - get some content in (ideally by getting permission from the 3 biggest posters and pulling theirs over) but then ban it.
Even just having an easier way to post, like post scheduling etc. Ive got content to seed my community but posting it so it doesnt spam is difficult
Sorry i wanted to add one more thing: it would be really nice if there was a way to do this without flooding new... as i think the people doing this dont really want to impact other communities

Categorically no. The point of lemmy isn't to be a content farm. It's to be a community where people respect each other. How can you respect anything or anyone while stealing content?

In fact I would go the other way and start banning people that are posting content they don't explicitly have permission to post. In this day and age where more people are seeking to monetize sharing nudes, we need to protect the few free content creators we have left, whether on this platform or the former one.

Fuck people and bots that steal nudes.

Wish there was gold to give you.
That's a fair point but what about artwork? For example hentai. Are we really expecting only OC content to appear on Lemmy? I don't think that's realistic at all..

I would argue that we're no more entitled to post people's hentai artwork as we are their nudes. More so in fact, it can take days to create hentai, only for it to be posted here with no mechanism for the original author to request it taken down.

If people really want to see other people's hentai posted here, why not link to it? I'm pretty sure the original authors would appreciate that. Plus if they see Lemmy generating traffic, maybe they'll join up and post stuff here directly.

That's what sourcing content is for. All the communities I moderate require sources when possible, and proof of best effort when not.

Your suggestion would make hentai communities all but impossible for the exact reasons you already stated.

Linking only to an art page would make the communities functionally useless. People don't come to follow a million links, open a hundred web pages, and sign up for a dozen services in hopes of seeing something they like, they come to see the content, same as in any IRL content community.

Nude photos are cheap and easy to produce, so those communities would never be starved for content, but a hentai community that only allows direct images if they're OC would be dead on arrival.

I don't agree that a bot would steal content. Content creators need exposure and they post for free on Reddit, so why wouldn't they want the same here?