Mastodawn

Show thread

a very weeny construct 💀Dec 4

caddy: proxy, fastcgi, builtin static fileserver, traffic shaping requires a module that doesn't do quite what i want. getting tired of guessing my way around caddyfile syntax. don't need its magic certificate management.
- use 'tc' for traffic shaping? big learning curve
- front it with haproxy? lots of redundant features, feels heavy

dang i gotta draw up a feature matrix or something

Show thread

a very weeny construct 💀Dec 5

it's pretty weird that it took me this long to actually do but

tonight i have set up for the first time a program running on a computer inside my home, that people may access like a normal website, without learning my cable modem's ip address in the process, and if someone starts ddosing me i can just unplug and let the household continue watching videos unaware

(i'm having my @colocataires vps proxy traffic through a tailscale vpn to my closet fileserver)

Show thread

a very weeny construct 💀Dec 5

safe(r) home-hosting by reverse proxy from a little computer in a datacenter is one of those things that seems like complex esoteric engineering from afar

but once you've experienced it, and then again when you've set it up yourself, all of a sudden it makes sense and is totally normal and a whole mess of possibilities for what you can cheaply and casually build on the internet blasts wide open

like the first time you experience nerd astral projection

Show thread

a very weeny construct 💀Dec 7

llm scrapers ignoring my robots.txt and pounding on my small website 28 times per second, 24/7. 600kbps of my available bandwidth wasted just on markov trash

it's easy to imagine how they'll ddos any service that does a bit of compute on each request

Show thread

a very weeny construct 💀Dec 8

it's not super exciting but if you're the kind of weirdo who wants to look at my vm's gauges, they are viewable here:

https://telemetry.orbital.rodeo/

i have been cobbling it together using collectd, rrdtool, and scripts instead of the far more reasonable and popular prometheus / grafana combo. because it might be more lightweight? haven't measured

for now it updates only when i run the command, so don't sit there wondering

no light mode or explanatory text (yet) soz

Show thread

a very weeny construct 💀Dec 9

i learned how to make haproxy throttle iocaine's output so the scrapers continue to download delicious poison but now only at 56kbps (down from 600kbps) 🎉

Show thread

a very weeny construct 💀Dec 9

oof, it's somewhat heavy though. went from about 3% avg cpu use to about 6%

the throttling that haproxy does just gets buffered up by caddy in front of it and the result is a long initial delay before a fast transmission of data. like latency

which could probably be implemented more simply with a sleep statement somewhere

i wonder what other strategies i can use to slow down crawlers. thinking random connection drops or http errors 429, 402, and 451

Show thread

a very weeny construct 💀Dec 11

somewhere on my to-do list: look into how yunohost compares to just installing various bits of software on your vps. is it heavier, easier, less customizable, can you put other stuff alongside it, etc.

Show thread

a very weeny construct 💀Dec 17

problem: once detected, how best to slap back at ai scrapers? return poison quickly? tarpit? throttled poison drip? drop their ip's packets at the firewall?

idea: drop packets during business hours to free up bandwidth for legit visitors; fast poison otherwise to collect ip addresses for next day's ip ban. "party all night sleep all day" strategy

Show thread

a very weeny construct 💀Dec 17

lots of bot traffic hitting port 80 (http) on my vm just to get redirected to port 443 (https) where they get a "go away, bot" error

who am i keeping port 80 open for?

who types in "orbital.rodeo," lands on http, and doesn't know to or can't try https instead?

many people use hsts and abandon 80

caddy auto-magically puts a redirect on 80 for my sites but i'm increasingly annoyed by its magic. wanna go back to haproxy

think i'll shut it

HTTP Strict Transport Security - Wikipedia

Show thread

a very weeny construct 💀Dec 19

oh, whoopsie

if an ipv4->ipv6 proxy is telling my webserver the clients' ipv4 addresses using proxy protocol, blocking those ipv4 addresses at the webserver firewall isn't going to do much 🤦

i have a v4 address now, i was just lazy about reconfiguring dns to send v4 web traffic direct to my vm instead of through the v4-v6 proxy. time to get on that

Show thread

a very weeny construct 💀Dec 19

oh that's more like it
after blocking ai scrapers at the firewall, cut my cpu use down to 1% and bot traffic to almost nothing
you love to see it!

Show thread

a very weeny construct 💀Dec 20

🤔 i wonder how many innocents i'll accidentally shut out if i adopt a policy of, "any /24 prefix with 3 or more scrapers within it dooms the lot"?

🤔 i could set up a "pls let me back in" automation. tell me my biceps are eleven out of ten in this web-form and you get added to an inclusion list that takes effect before the block list

Show thread

a very weeny construct 💀Dec 23

i could implement both of those defense mechanisms

reduce bookkeeping on my part by being a bit overeager about blocking whole prefixes instead of individual ip addresses

definitely want to do something like @alex's butlerian jihad where i block all networks from any ASN abusing my sites

but also, have a cooldown that sends traffic from blocked prefixes to a "let me back in" form that allowlists individual addresses

Show thread

a very weeny construct 💀Dec 23

oh cool, while i wasn't paying attention anubis has grown dataset poisoning features like what iocaine does and a (paid) collaborative reputation database mechanism

Making sure you're not a bot!

Show thread

a very weeny construct 💀Dec 23

haha oops i accidentally banned my own ip. fixed it but guessing i'll have to flush the ban lists and rebuild in case i caught any more i shouldn't have

one super nice thing i'm doing this time around is using a wireguard-based vpn for all my ssh'ing. so even when i blocked my own ip address my ssh session was unaffected and i could fix it. and zero log spam from vulnerability scanners constantly trying the door 😌

Show thread

a very weeny construct 💀Dec 23

i want to block any requests from google and facebook; also i want to block any isp who would tolerate scrapers

the database of ip range ("prefix") assignments is downloadable but it's big. 590 entries just for as32934 (facebook). too big to just dump into the firewall

but there's often nothing between multiple records for any given asn. maybe i could treat that as a single range, which would let me express the set of ranges to block more concisely 🤔

Show thread

a very weeny construct 💀Jan 7

too big to just dump into the firewall

whoopsie, that was a wrong assumption on my part based on a bad time i had with way too many iptables firewall rules created by fail2ban many years ago

these days i'm using nftables and its set structure to hold ip addresses, which uses radix trees just like the routing tables do, and you can dump addresses in there all day long, it will manage merging them into ranges and auto expiring them if you want, works great

Show thread

a very weeny construct 💀Jan 27

so upthread i was surprised that you can just shovel truckloads of ip addresses into nftables' "set" structure, for blockin' purposes

but i want to do stuff like detect if several addresses within some autonomous system's range are coordinating for shenanigans, and block the whole damn asn

this example, on the nftables wiki itself, loads a whole ass maxmind geoip db into nftables' "map" structure and my first reaction was "surely not"

https://wiki.nftables.org/wiki-nftables/index.php/GeoIP_matching

GeoIP matching - nftables wiki

Show thread

a very weeny construct 💀Jan 27

i mean are there limits? how many rules and addresses can i dump into nftables tables, chains, maps and sets (which, iiuc, all live in the kernel) before it crashes

Show thread

a very weeny construct 💀Jan 27

anyway it's goblin week, or it was recently? so mnabye imma try implementing automatic, immediate, asn matching and blocking in nftables rules 😈

Show thread

a very weeny construct 💀Mar 18

woah cool i just learned about the nftables feature concatenations

i'm already 🤩 about nftables' very fast sets and maps but today i learned that you can store essentially tuples of data in them

which in some cases can let you test multiple conditions at once, replacing multiple rules with a fast set-membership check

Concatenations - nftables wiki

Show thread

a very weeny construct 💀Mar 27

about 1.5 days after asking iocaine to not just poison but also block ai scrapers masquerading as browsers, i have about 36000 ip addresses blocked at the firewall

this is for a site that is not advertised anywhere, disliked by search engines, and contains maybe 10 blog posts that rarely change. AND which preemptively blocks several whole gafam corporate ASNs so not even counting them

so i expect more popular sites are seeing many multiples of this traffic

Show thread

a very weeny construct 💀Mar 27

anyway, thinking again about how to analyze this ever growing set of blocked ai scraper addresses, most of which are probably "residential ips."

calculate for each asn the percentage of its ip range that i've blocked, and above a certain threshold block the whole range? (that would be more efficient than recording every single bad address)

Show thread

a very weeny construct 💀Mar 27

ideas contd.:

have an unblocked subdomain where a legit user of a blocked ip might fill out a form and click a "let me back in" button to get onto an allow-list

double extra forever-ban anybody that uses the "get me back in" button then starts snarfing down poison again

Show thread

a very weeny construct 💀Mar 27

also, at some point, i want to bring iocaine to work. i'm on easy mode now because idgaf about my site's visibility to search engines

but what to do when boss requests that when customers ask their ai bullshit to order from our website on their behalf, maybe i shouldn't reply with an HTTP redirect into the fucking sun with gigabytes of foul invective zip bomb for the response body

Show thread

a very weeny construct 💀Mar 27

ideas contd.:

live-updated status page listing all the ip addresses i've blocked, in nice formats for easy import into firewalls, tools for consuming and contributing to said databases

serious looking landing page for blocked addresses "your ip address is sending malicious traffic to this domain and has been reported. check for compromise immediately."

live-updated ASN leaderboard naming and shaming those with the most ip addresses used by ai scrapers

Show thread

a very weeny construct 💀Mar 28

ideas contd.:

undo my mild mitigation against syn flood, crank the synack retries back up, and collect the ip addresses guilty of doing it. for blocking

✅ make caddy 'abort' the connection after one ioproxy poison reply, which closes the socket and blocks ip addresses faster

Show thread

a very weeny construct 💀Mar 28

ideas contd.:

i'm extremely doubtful that most isps will give any shits at all about complaints that llm bots are using their network to destroy websites

i was thinking upthread about an error message to show to legit users of residential ips who get blocked from services; showing them a scolding message like "your ip has been sending malicious traffic"

but maybe more effective will be to direct them, with contact info, to their own isp's abuse line

Show thread

a very weeny construct 💀Mar 29

ideas cont'd.:

poison url generator that encodes the spider's address, so when the headless browsers on residential ips begin scraping them we know which big tech cos are buying access to residential ip address proxies to disguise themselves

there are so many! 133k addresses in my firewall now. starting to wonder if maybe ip blocklists are untenable and i need a blocked-by-default-policy with a request-access mechanism instead

Show thread

a very weeny construct 💀Mar 29

i've got iocaine set to block only new connections when an ip requests a poison page. that keeps their ip from returning, but doesn't kick them off my server immediately

i tried adding an abort to the end of iocaine's handle_response block in caddy, (and rebooted)

i think what i'm seeing now is scrapers successfully getting kicked out at the first request, but their sockets now get stuck in fin-wait-1 state until they time out

Show thread

Luddicus Mus Mar 29

@pho4cexa You might want to try the current head of the iocaine-3.x branch, that has a fix that addresses the FIN WAIT state: adding a ct state vmap { established : accept, related : accept, invalid : drop } rule to the start of the blocking chain.

(You can add something similar manually too, but iocaine will drop & recreate the chain on restart)

Show thread

a very weeny construct 💀Mar 30

@algernon oof i totally forgot that i had restarted iocaine since the time that i manually stuck in that rule and it was no longer taking effect. thanks!

Show thread

a very weeny construct 💀Mar 30

@algernon running iocaine compiled from there (hash 7bb5447) is having trouble starting for me

when I run:

sudo nft destroy table inet iocaine
sudo iocaine -c /opt/iocaine/etc/iocaine

it successfully builds its table, chain filter with rules, and four sets but then quits with:

[...] failed to initialize firewall options="VaccineSpecs { [...] }" error="nftables already initialized"
Error: : init script not found, at [...]/means_of_production/mod.rs:291:24

Show thread

Luddicus Mus Mar 30

@pho4cexa Hrm. Interesting! Are you using the built in script, or NSoE?

I suspect it is a race condition, but not sure yet.

Show thread

a very weeny construct 💀Mar 30

@algernon the builtin. the only config in my firewall kds is the enable. everything else is defaults

Show thread

Luddicus Mus Mar 30

@pho4cexa Mmmmh. Do you have multiple http-handlers that use a script with firewall enabled?

If so, I can reproduce, and a fix will be coming shortly.

Show thread

a very weeny construct 💀Mar 30

@algernon ah! i think i do! (had to step away from the office for a few.) i have metrics and the haproxy hpoa enabled.

Show thread

Luddicus Mus Mar 30

@pho4cexa Yep, that's it then. If you're comfortable with patching:

diff --git a/iocaine-powder/src/vaccine/linux.rs b/iocaine-powder/src/vaccine/linux.rs
index a8a9792..03a9388 100644
--- a/iocaine-powder/src/vaccine/linux.rs
+++ b/iocaine-powder/src/vaccine/linux.rs
@@ -44,10 +44,6 @@ static BLOCK_METRICS: LazyLock<IntCounterVec> = LazyLock::new(|| {
 
 impl Vaccine {
     fn init_nftables(options: &VaccineSpecs) -> Result<()> {
-        if TABLE_NAME.get().is_some() {
-            return Err(VibeCodedError::message("nftables already initialized").into());
-        }
-
         let mut nft = Nftables::new();
 
         command(&mut nft, format!("add table inet {}", options.table_name))?;
@@ -136,6 +132,10 @@ impl Vaccine {
     }
 
     pub fn init(options: &VaccineSpecs) -> Result<()> {
+        if TABLE_NAME.get().is_some() {
+            return Ok(())
+        }
+
         Self::init_nftables(options)?;
 
         let (queue_tx, mut queue_rx) = mpsc::unbounded_channel::<IpAddr>();

This addresses the issue. The fix I will commit will likely end up being a bit different (it will have some sanity checks instead of blindly returning if already initialized), but it should unblock you until I get around to fixing it in git.

Show thread

Luddicus Mus Mar 30

@pho4cexa Aand the fix is now on the iocaine-3.x branch.

Thanks for the report! Would've been embarassing to cut a release (even if "only" a release candidate!) with this bug present!

HTTP Strict Transport Security - Wikipedia

Making sure you're not a bot!

GeoIP matching - nftables wiki

Concatenations - nftables wiki

Cookie monster!