@drmorrisj the monetization aspect is based on their licensing which is excellent plus the value prop for concrete search engine results which are internal and custom is great ip that can add to business workflows and continuity, that is my concrete, non abstract take on yacy and why people may want to use it, also makes for good rag pipeline and structured data, json, they have a nice api #open source value leader #solr dump #nutch
there is a ton of info here but i spider and it indexes so you can find what you are looking for quicker - this is about 7gb #lib archive #index #nutch #solr #common crawl

What are your favorite / the best #WebCrawlers for broad / #WebScale #crawling?

I've built a list but am looking for anything I missed: https://github.com/davidshq/awesome-search-engines/blob/main/WebCrawlers.md

Main options I've found include #Apache #Nutch, #StormCrawler, #Scrapy, #Norconex, #PulsarR, #Heritrix, and #sparkler

#question #search #SearchEngines

awesome-search-engines/WebCrawlers.md at main · davidshq/awesome-search-engines

You know, an awesome list of search engines. Contribute to davidshq/awesome-search-engines development by creating an account on GitHub.

GitHub