Robert Rothenberg (@[email protected])
At work, we've decided to block the Common Crawl bot from our websites, because their index is used to train generative #AI systems. We've also blocked or severely limited requests from IP ranges associated with various cloud providers, because they are usually from unidentified bots. 1/n
