Regarding #Github, #Microsoft, and "AI"...
There was a lot of unease when Microsoft, famous for hating open-source / free software for so long, bought Github. The worry was Microsoft would return to their anti-competitive ways and use Github as a cudgel against its perceived ~~enemies~~ competitors. But no, they declared, Github would forever remain independent of Microsoft, running for the benefit of the world. There was no reason to #distrust them.
Then, of course, recently they memory-holed those promises, and said Github would no longer be #independent. Instead, it would operate as part of Microsoft - specifically as part of their #AI unit.
Why AI? Well, to use Github's mountain of code and user data to train their #LLM, of course. The LLMs that are #copyright violations at mass scale, and that despite any PR claims really are intended to allow customers to replace expensive, #knowledgeable #software #engineers with their #plagiarism machine.
In response, many have closed or stated their intention to close their Github accounts.
To that, I say: don't. Closing it sends a brief message, and nothing else.
Instead, keep your #account and #repositories. Create new ones. Not for your real use - check in your buggy, non-working #code. Check in #broken code generated by their (or other) LLM. Pollute the #training #data. Merge idiotic unrelated PRs. Check in stuff that doesn't build, doesn't run.
Throw those #sabots in the machine. Make it serve a purpose.
This afternoon, I got close to what I wanted to achieve in terms of load-balancing between the two #AI #sabots I have running.
I had originally planned to use #OpenBSD's #OpenHTTPD or #RelayD to do the job, but #HAProxy #PROXY protocol was the limiting factor… so I went #nginx instead.
One thing I haven't worked out yet, is how to pass the client IP by PROXY protocol to a HTTP back-end. Seems I can do it for a generic TCP stream, but not HTTP.
The alternative is to set X-Forwarded-For, and have the back-ends trust it, like they trust PROXY for the gateway's IPv4 address for #sniproxy.
But… it works, you can hit https://sabot.vk4msl.com/ and you'll either get sabot01 (which uses nepenthes) or sabot02 (which uses iocaine). Since neither cares about the URI, I can bounce the client between them.
This did get me thinking though, if enough of us did it, we could have a #AISabotAsAService for websites to redirect/link to when they think they're being scraped by an AI bot.
We could provide a pool of servers that would provide the link maze. Front-end proxies would just bounce you between all the pool members, feeding your bot nonsense.
I just managed to get the first of my #sabots going to help clog up AI.
This is running on the node I resurrected yesterday. Single CPU VM, 1GiB RAM… AlpineLinux 3.21.
It is using this tool:
https://zadzmo.org/code/nepenthes/
Not the most well documented in terms of installation requirements… I've fed it with some of my blog posts for corpus input. I note every third link it generates is a 404 too.
Now I've put the bait out, I just wait.
Might try another engine out and deploy a VM for that elsewhere. Then maybe I can load-balance between them.
Thinking about this in the shower (as you do)… I've got now, a surplus of compute power.
I was running with just one 16-core node… with 14 VMs on it, and after the discovery that the old boards I had were actually repairable, I now have one 8-core node back, with another 8-core node sitting on the table almost ready to join it.
That in addition to a little MSI Core i3 machine I bought, means I've got 16 cores mostly doing nothing, and will soon have 24 nearly cores free. If I pull my finger out and finally commission that AMD Epyc board I have… there's another 8… that'd give me 5 nodes in total.
That makes me think of #AI #sabots…
https://tldr.nettime.org/@asrg/113867412641585520 -- lists 5 different AI solutions.
My thinking is maybe to load a VM onto each node with a different AI sabot engine, and load balance between them.
Question is, where is a good ethical place we can source the seed material from? I can throw in my own (and I will), but I think I'll need a little more variance than that.
Is there some training material that people are willing to donate for this cause?

Attached: 1 image ## **Sabot in the Age of AI** A list of offensive methods & strategic approaches for facilitating (algorithmic) sabotage, framework disruption, & intentional data poisoning. ### **Selected Tools & Frameworks** - **Nepenthes** — [Endless crawler trap.](https://zadzmo.org/code/nepenthes) - **Babble** — [Standalone LLM crawler tarpit.](https://git.jsbarretto.com/zesterer/babble) - **Markov Tarpit** — [Traps AI bots & feeds them useless data.](https://git.rys.io/libre/markov-tarpit) - **Sarracenia** — [Loops bots into fake pages.](https://github.com/CTAG07/Sarracenia) - **Antlion** — [Express.js middleware for infinite sinkholes.](https://github.com/shsiena/antlion) - **Infinite Slop** — [Garbage web page generator.](https://code.blicky.net/yorhel/infinite-slop) - **Poison the WeLLMs** — [Reverse proxy for LLM confusion.](https://codeberg.org/MikeCoats/poison-the-wellms) - **Marko** — [Dissociated Press CLI/lib.](https://codeberg.org/timmc/marko/) - **django-llm-poison** — [Serves poisoned content to crawlers.](https://github.com/Fingel/django-llm-poison) - **konterfAI** — [Model-poisoner for LLMs.](https://codeberg.org/konterfai/konterfai) - **Quixotic** — [Static site LLM confuser.](https://marcusb.org/hacks/quixotic.html) - **toxicAInt** — [Replaces text with slop.](https://github.com/portasynthinca3/toxicaint) - **Iocaine** — [Defense against unwanted scrapers.](https://iocaine.madhouse-project.org) - **Caddy Defender** — [Blocks bots & pollutes training data.](https://defender.jasoncameron.dev) - **GzipChunk** — [Inserts compressed junk into live gzip streams.](https://github.com/gw1urf/gzipchunk) - **Chunchunmaru** — [Go-based web scraper tarpit.](https://github.com/BrandenStoberReal/Chunchunmaru) - **IED** — [ZIP bombs for web scrapers.](https://github.com/NateChoe1/ied) - **FakeJPEG** — [Endless fake JPEGs.](https://github.com/gw1urf/fakejpeg) - **Pyison** — [AI crawler tarpit.](https://github.com/JonasLong/Pyison) - **HalluciGen** — [WP plugin that scrambles content.](https://codeberg.org/emergentdigitalmedia/HalluciGen) - **Spigot** — [Hierarchical Markov page generator.](https://github.com/gw1urf/spigot) --- *This is a living resource — regularly updated to reflect the shifting terrain of collective techno-disobedience and algorithmic Luddism.*