Starting to see (and getting a bit excited about) some components of openwebsearch.eu, and I was wondering if the EU will finally get its own Common Crawl, like dataset (commoncrawl.org).

It seems the crawling results aren't publicly accessible yet, and there's already some discussion about GDPR implications.

At this pace, we're still far from being able to compete with US-scale open data efforts 🤦‍♂️

#europe #commoncrawl #openwebsearch

🔗 https://pipeline.shared-search.eu/
🔗 https://pipeline.shared-search.eu/explain/license.html

Crawl Pipeline

Shared effort to extract useful data from search engine crawls.

@a true about privacy, but you must understand that when Google was built there wasn't any concern on privacy.

Nowadays, everyone talks about privacy and it makes it harder for anyone to come up with something new.

It will surely take time for the EU itself to bring something to life while adhering to its own GDPR.

Good thing is personal data won't be used to support their companies. Id pay for google if it didn't hoard my personal data.

Someone once said "if it's free, you are the product"