ok genuine crisis of faith here. I've got two games I've been working on, both are intended to be fully open source. One is nearly ready for a public release, the other is probably realistically a year or two out. I feel it is important for the source to be available for the former, and it might not yet be part of any training sets because it only has 17 stars on github, but I'm assuming it probably is already.

The latter is largely not online at all, because I simply haven't built it yet...

I was ambivalent about my code going into training sets, but now for a variety of exciting reasons I am no longer ambivalent. So here's the conflict: the only way I can think of to realistically prevent it from ever going into the plagiarism machine is to simply never make the source available. There we go. That's the crisis of faith.
This is all possible because we had a healthy commons and so that's the reward for building a healthy commons: enclosure by billionaires who are promising the end of everyone's ability to make a living in a creative field. If that really will come to pass, we'd have been better off if we never shared anything on the internet and that really hurts.
right now my best answer is just... not open source anything. or at least, hang on to it for a few years, see what happens with this bubble. if it pops and takes down half the west coast tech industry, nothing was lost and source code can go online again. if it doesn't though, well, i guess go ask the god mommy machine to hallucinate an interactive video for you or something i'm sure it'll be just as good 🙄

@aeva i'm looking into hosting my own code forge but i'm also not looking forward to fending off the scrapers.

might be best to just not share code for a while (or share directly over email or something).

@agentultra @aeva the scrapers only need to get through once, and they probably will

@agentultra @aeva I'm currently running a forgejo instance with anubis on the front of it, and several of the things @ric discusses in https://www.qweb.co.uk/blog/securing-your-servers-and-websites-from-aggressive-content-scrapers-and-ai-or-llm-bots at server level.

I also run a nepenthes instance to grab scrapers who fall in and feed them poison.

One thing I'm thinking of adding is some invisible links across all my sites that either go into the nepenthes trap, or if followed add the IP to a fail2ban jail.

So far, it seems to be working to keep them out. So far.

Securing your servers and websites from aggressive content scrapers and AI/LLM bots | QWeb Ltd Web & Game Design, Leeds

@aeva

i will email you the source if youre provably chill

@aeva I'm in the same boat. Kind of hard to feel like putting anything out in the open when it's going to get immediately stolen by the worst people in order to do ever growing harm to a bunch of things I love...
@aeva make all source request be made in writing and physically mailed to a po box and then be printed into punch cards and mailed back in 20lb blocks
@pupxel why bother making the source available at that point?
@aeva @pupxel make the source available in electronic form but only under NDA
@ratsnakegames @aeva @pupxel litigation is expensive and uncertain even when it’s technically a control
@owen @aeva @pupxel "litigation is uncertain" goes both ways. If you only give the code to private citizens, and hold *them* liable if they leak the code to AI scrapers, most people will think twice before doing it
@ratsnakegames @owen @pupxel I'm not interested in being a copyright troll
@aeva @owen @pupxel i think this is protecting legitimate interests
@aeva @owen @pupxel i'm not even saying that you should actively sue people to shake money out of them - but sometimes you need to keep that option open as a deterrent
@ratsnakegames @owen @pupxel yeah the most i'm willing to do is the LICENSE.txt equivalent to a sign that says "if you can read this you're in range" which only works until people discover it doesn't have teeth
@aeva @ratsnakegames @pupxel threatening your customers because you’re concerned about some third party (here, the collection of AI scrapers) is also pretty hostile
@owen @aeva @pupxel every single contract in the world has negative consequences if you violate it.
@owen @aeva @pupxel also, if you give people something for free, they're not customers. Asking them to keep stuff reasonably confidential in return for something you are voluntarily granting them with no benefit to yourself is perfectly reasonable.
@ratsnakegames @owen @pupxel I'm not interested in spending my limited time on this earth in litigation
@owen @aeva @pupxel ofc Mr Anthropic should not get access even if Mr Anthropic signs the NDA
@aeva I think “source available on good request” after a conf talk etc is often a good option. Not everything needs to be widely public, not anymore, but collaborative communities are vital.
@coral @aeva Agreed, and also I've seen that turn into de facto gatekeeping. This shit is really hard...
@xgranade @coral I think that will only delay the source going online not prevent it. Also given they've got bots now that email maintainers to harass them into accepting contributions, I think it wont be so simple.
@aeva @coral "bots to harass" is kind of their whole modus operandi, yeah. I hate this timeline.
@xgranade @aeva I would not consider a email to be enough evidence of collaborative intent on its own to share much. For me, the goals are to support good work whist sharing enough to interest specific external parties, like any researcher, and research-like norms can be used.
@coral @xgranade @aeva anecdotally not using github and adding a few nginx rules to blackhole obvious scrapers gets rid of the bot traffic
@coral @xgranade @aeva this is all you need
@xgranade @aeva it’s de jure gatekeeping! It’s the ivory tower, it’s the dark forest.
@coral @aeva I mean, yeah... it's gatekeeping hopefully in the service of the public good, but it easily turns into the pejorative sense of the term.

@aeva My next project won't be open source. The license has a few provisions that are meant to amount to a right to repair and free personal use. No redistribution or commercial redeployment.

It's mostly performative considering it's a web app without a build process, so I'm delivering the source to the browser anyways. And I've never seen evidence that anybody uses my projects. If I require people to contact me, maybe I'll find out somebody does use it.

@aeva
A few thoughts on this, short cuz I should be asleep but - you're far from the only person reacting this way right now unfortunately. There's a bunch of us that are witholding releases for sake of keeping it out of the sludge machines. But the good part here at least is that you can always release it later.
@aeva
Another thought though is licenses. This relies on an optimistic view that the law is/will be an ally to us but if you use a license with explicit carveouts preventing feeding into The Machine™️, they will ignore it and they will slurp up the codebase, but in doing so it will create a poison pill allowing a legal case to be made in the future, realistically as member of a class action. It's been... Hilariously easy to prove that your work has been added to datasets by replicating snippets
@aeva I don't blame anyone for having zero faith in the law as an ally on this one though, so don't feel bad if you're pessimistic about it.
@MissAemilia @aeva if AI training is considered fair use, licenses do not matter. They cannot effectively restrict fair use.

@ratsnakegames @aeva
Yeah, therein's the rub, aint it. It really is hedging a bet on the law.

And for what it's worth, considering I'm holding off publishing some audio stuff, I'm definitely being pessimistic myself. I'm just really keeping the license thing as a hope.

@aeva this is kinda how it feels to me too

@aeva maybe back to old school style, mail out diskettes with the source code to people who ask?

doesn't affect those people then posting the code, but it might be fun. especially if you decide to use 5.25" floppies.

@aeva fingers crossed the energy cost spike of Current Geopolitical Fuckery will pop the bubble but otherwise yeah, I’d hold off on open sourcing.

@aeva Honestly, it's hard to really say anything at all. Corporations clearly don't give a flying fuck about licensing, but I suppose the best route forward is to favor GPL-style ones wherever possible (permissive licensing often becomes an okay for corporations to exploit your work).

Oh, and skip out on GitHub if you can. Use Codeberg or your own Forgejo instance if you have the resources.

While one can hope this bubble will pop and (ideally) take some of the tech industry with it, it's still a lose-lose situation.

@ColorfulCeleste I doubt using codeberg or any other public forgejo instance is effective at preventing source code from being scraped into training sets beyond the most opportunistic scraping, but github definitely is the fast path to the bad ending. I am also skeptical that github will continue to exist (without getting much much worse) after the bubble pop because the paid services are redundant to azure and the rest is a significant cost that wont be paying for itself anymore.
@ColorfulCeleste the number of people who replied to my thread to say there had been feeling the same thing was... well, it didn't make me feel better, ok let's put it this way. we are constantly bombarded by Problems, and the number of Actionable Problems is always lower than the number of Known Problems, much as the set of Real Problems and Perceived Problems don't fully overlap. Knowing which taxonomy is relevant is itself a form of agency even if the Problem is not Real and Actionable

@ColorfulCeleste if a particular Problem is Real, not directly Actionable, the most useful thing to do is then plan accordingly and put any work in you can do to minimize the damage.

so like Actionable problem is "bad thing will happen unless I do X" and not directly Actionable is "bad thing will happen, and I can survive it and/or help others survive it if I do X"

@ColorfulCeleste a significant amount of agreement and virtually non-existent dissent is not proof of Real or Actionable, but comparing notes with people you trust and respect can bring some clarity to what is probable or improbable
@aeva That makes a lot of sense. I doubt there's much else I can say that wouldn't just be regurgitating what has already been said in this thread, but I definitely see where you are coming from :(
@aeva I really need to bookmark interesting posts more often. There was a post with a solution that sounds like a good place to start. Paraphrasing: "If you want access to the source, talk to me. Make the human connection."
@aeva I guess you could put it up encrypted and hand out keys but that doesn't prevent someone else from feeding it to the AI machine.

@aeva

well, that.

Or we could do it like on the school yards some 30 years ago and exchange software – this time in the form of Git repositories – using physical media.

Bonus points for using ZIP-Disks, but USB thumb drives are okay, too, I guess.

@datenwolf @aeva Git repos don’t have to be public.

Although there is a certain appeal to falling back to sneakernet...

@sabrina @aeva

I'm fully aware that Git repos can be put behind logins. But then you'd have to exchange login credentials, which IMHO is lesser than the direct exchange of data in a face-to-face transaction in meat space (which incidently also implements a clanker check as part of the transaction).

@datenwolf @aeva All my repos are on DAT cassettes.

@aeva yup

you can hide it from scrapers, but any healthy fork will be noticed and scraped eventually

@aeva yep, oof

The idea that the kind of people who took the piss with open source were just a minority was what I held on to. But it’s been completely blown up by the availability of tools that fully automate taking the piss

That and the fact that many programmers I used to respect don’t seem to think that’s a problem makes me want to retreat to a cave and never emerge

@aeva I wonder if anyone managed to nightshade text by abusing Unicode

@aeva they cannot really take the commons, though. BSD still exists, despite their permissive licence.

The enclosure happens where they take on key roles in key projects, and steer development in the direction they want, that is constant effort and cannot be automated.

@GyrosGeier yeah I used to also think that I can ignore the problem and it will go away, but I now understand that anything I make that ends up in the slop training set directly contributes to the credibility of these thieves, and so by continuing to ignore the problem you are very indirectly contributing to perpetuating it.
@GyrosGeier and there are some significant consequences to allowing the problem to metastasize, and some of them we are seeing already. the one weighing on my conscience the most at this particular moment is these tools being used by the military to pick random targets to bomb.
@GyrosGeier I think there is also a very real possibility that if these tools may get to the point where they are able to consistently generate passably functional garbage, and we're just going to be wallowing through an endless sea of liquid shit for the rest of our lives. You don't need "AGI" for that to happen, you just need the cost/benefit scale to tip far enough in one direction. You also wont have a career in tech anymore that isn't cleaning up superfund sites.
@GyrosGeier and that liquid shit will leak into your precious BSD. maybe the funding will dry up and that will be the only way to keep development going. or maybe it'll just die too.
@aeva oof, yeah
🙁