statistics:
hits 1037549
after filter 13530
bot rate 98.70%
addrs filtered 6018
UAs filtered 225
paths filtered 1566

something something dead internet theory

got another few valid hits on my bitflip experiment:
statistics:
hits 53731
after filter 19
bot rate 99.96%
addrs filtered 200
UAs filtered 8
paths filtered 86

new hits:
- one from an Android 9 device on Rogers (ipv6) using gmail webview
- 4 from google-owned IPs(!): three tracking pixels from blogger domains, and one pagespeed proxy request

I am slightly intrigued by the google IPs - do they run a lot of gear without ECC memory?

update since the 8th:

hits 110397
after filter 27
bot rate 99.98%
addrs filtered 382
UAs filtered 13
paths filtered 141

new hits:
- two hits from distinct AWS IPv4s ~2 seconds apart, to a Gmail asset URL
- two more hits to the exact same tracking pixel URL from before (same referrer as well), one from DigitalOcean and another from residential .VN ISP
- one hit to the default Google profile picture (referrer accounts.google.com) from a possible proxy in .PK
- one hit to a placeholder image used in the Google Photos app (com.google.android.apps.photos in the UA) from a residential IPv6 in .VN
- one hit to a user's google profile picture from a residential IPv4 in .IN (referrer speedtypingonline.com)

it's getting a little trickier to filter out all the weird noise, my regex rules are starting to get kinda cluttered and I didn't provide for any means of commenting/documenting the rules. I think I will pick up another batch of 15 domains next paycheck, as it looks like there is still a surprising amount of activity even with only my small sample set so far.
next steps will also be to start logging all DNS queries - it seems like 99.9% of the garbage traffic is hitting the base domain, while all the interesting stuff is hitting well known subdomains. I can see this sort of analysis being a lot harder for non-CDN domains that don't have unique subdomains...
i'll probably try to cobble together a custom DNS server for this and run the nodes on my anycast routers, perhaps there will be some interesting geographical bias in where corrupt requests come from once more data is available. I am also wondering if there are possibly other services outside of HTTP{,S} running on googleusercontent.com - does anybody know if that's a thing?

+11 day update:

hits 228081
after filter 51
bot rate 99.98%
addrs filtered 634
UAs filtered 14
paths filtered 208

notable or interesting hits:
- a hit from a facebook crawler
- several hits for GCP block storage downloads from Nepal
- several pagespeed hits for Horse Talk
- a few hits from a chromecast dongle with a *lot* of flips in the URL, poor thing must be really suffering
- a number of hits for user profile pictures from what I think is PUBG mobile? game=ShadowTrackerExtra, engine=UE4, version=4.18.1-0+++UE4+Release-4.18, platform=IOS, osver=26.3.1
- some unknown unity app? UnityPlayer/2019.4.40f1, libcurl/7.80.0-DEV
- classic Opera with the Presto rendering engine, on a 32 bit Linux machine in Egypt!

getting paid next week and will pick up another batch of domains, which should hopefully increase the hitrate. like originally expected it mostly seems to be mobile devices, but there have been a few desktops and servers in the data so far.

+5 day update:

hits 252444
after filter 75
bot rate 99.97%
addrs filtered 760
UAs filtered 15
paths filtered 243

notable hits:
- a whole lot more hits from Unity apps, all for the same URL, in short temporal succession from all over the world. I suspect a game server experienced a bitflip and served the bad URL to many clients
- doc-0s-7g-apps-viewer.guc.c/viewer/secure/pdf/ via CF Warp, from an iPhone
- several hits to different URLs from a ~2016-2018 Sony Bravia TV running Android 9 (cpu: MT5891) from a Japanese IPv4 address
- several more profile photos w/ referrer accounts.g.c

it's the end of the month and payday, which means I can add another ~10 domains to the experiment and collect more data! :D

just registered the next batch of domains, now at 36 of 81 total / 78 available (a few of the variants were already registered)
remaining cost is $465.36.. maybe finish it up next month 😔
now the proud owner of coogle and goofle user content dot com

+4 day update:

statistics:
hits 330360
after filter 543
bot rate 99.84%
addrs filtered 1242
UAs filtered 17
paths filtered 369

traffic has popped off significantly with this new batch of domains, plus a whole lot more scanner traffic that I think is mostly filtered now.

notable hits:
- hundreds of hits from the same Level3/Lumen IPv4 for /proxy/<encoded> endpoints
- a handful of drive-thirdparty.guc.com hits for video players
- one of the first hits from a semi-modern device: SM-A032M(Samsung Galaxy A03 Core) rel 2021, Android 13
- GmsCore/261133035 which looks like an alternative Play Services implementation

I should probably browse through the debug logs of blocked requests to make sure I'm not accidentally filtering anything legit, but it seems unlikely