If you are wondering what web crawlers want from your website, particularly those that fake their identity and ignore robots.txt and more, this is what they want, accessed thousands of times over the last 15 days:

/ip
/aaa.php
/.git/config
/xmlrpc.php
/wp-login.php
/sitemap.xml
/ioxi-o.php
/no_branch
/about.php
/admin.php
/defaults.php
/edit.php
/goods.php
/.env~
/wp-configs.php
/hehe.php
/_profiler/phpinfo
/favicon.ico
/load.php
/flower.php
/file.php
/index.php
/menu.php
//xmlrpc.php?rsd
//blog/wp-includes/wlwmanifest.xml
//web/wp-includes/wlwmanifest.xml

In other words: mostly attempting to hack your WordPress installation. Interesting to see also attempts at getting at backup files for python environments.

#WebMaster #WordPress

And Windows is finally dying, at least among the visitors to my tiny personal website. Used to be much higher in percent; likely explained by the shift to mobile for web browsing.

OS | # hits | percent | amount
* Windows: 5020 34.78%% 42.67 MB
* Crawlers: 3190 22.10% 60.98 MB
* MacOS: 2794 18.73% 27.88 MB
* Linux 1045 7.24% 10.94 MB
* iOS 938 6.50% 14.85 MB
* Android 833 5.77% 19.83 MB
* Unknown 664 4.60% 13.43 MB

Crawlers, if only I could stop them without stopping everyone else.

@albertcardona recently I’ve just started to block entire Google Cloud IP ranges. Getting so sick of this shit.

@Htbaa

Mind sharing the rule for blocking it?

@albertcardona I run fail2ban with a couple of jails that check for xmlrpc.php calls (no one uses that, so after a couple of requests I just ban) and frequent wp-login.php calls. These aren't too exciting.

I also made a honeypot of sorts. The default host is where I see a lot of this probing. So again after several 404 requests I just ban them. There's no reason to view that default host and if you do no 404 will be be caused with a regular client.

The entire ranges? That's a manual action.