Can a search engine be selfhosted?
Can a search engine be selfhosted?
I mean, you can easily host a meta-search engine like Searx, Searx-ng, Whoogle, etc. I run Searx-ng and it sends your queries to multiple engines and shows them to you.
To host your own, you’d need to crawl and index every site. It’s certainly doable, but it would take a lot of time /effort.
Solr
triggered
Even the main search engines don’t index the entire internet of content these days and their databases are truly massive already. Writing a basic web crawler to produce a search index isn’t all that hard (I used to do it as a programming exercise for applicants) but dealing with the volume of data of the entire internet and storing it to produce a worthwhile search engine however is just not feasible on home hardware, it would be TB’s I suspect. It wouldn’t just be a little worse it would dramatically worse unless you put substantial resources to it including enormous amounts of network bandwidth which would have your ISP questioning your “unlimited 1 gbps fibre” contract. It would probably take years to get decent and always be many months out of date at best.
Doesn’t seem practical to try to self host based on the need to download and index every single page of the internet its a truly massive scale problem.