This Guy Has Built an Open Source Search Engine as an Alternative to Google in His Spare Time

https://lemm.ee/post/23315957

This Guy Has Built an Open Source Search Engine as an Alternative to Google in His Spare Time - lemm.ee

“I found it very weird that there essentially is no way to browse the web in an open manner. So that’s what I am trying to build,” the founder of Stract said.

“Sign up for free access to this post”

No.

To save reading the paywalled article, the site is at stract.com

I’ve only done a single search but it gave me a summary at the top, and some discussion forums in a different format. I’m impressed so far!

Stract

It’s a free account, like the one you made so you can write your comment. I’d hardly call it paywalled.
Tbh I just saw it needed a login and scrolled back up to the link without reading further, so was obviously a bit hasty in my assessment of it being a paywall.

I did a few searches but had terrible results.

Searching for “Tokyo”, I got a summary about some Indonesian food chain. I had to scroll down quite a bit to get info about the city.

It looks interesting, but seems far from ready.

I found the GitHub for it: github.com/StractOrg/stract/tree/main

What I still can’t figure out (in my very shallow dive into the repo) is if it’s a meta search engine like Searx-NG or if it does its own crawling and builds its own search index.

Anyone know?

GitHub - StractOrg/stract: web search done right

web search done right. Contribute to StractOrg/stract development by creating an account on GitHub.

GitHub

From the readme, it uses its own index:

Fully independent search index.

Also here’s a related discussion: github.com/StractOrg/stract/discussions/136

Storing warc files? · StractOrg stract · Discussion #136

Hey there! I noticed in this reply you (Mikkel) mention that you're crawling to WARC files. Are you keeping these around after indexing? I imagine the storage requirements for that would get quite ...

GitHub
#YaCy is an open source crawler that you can run and feed Searx with. I recall some searx instances that run their own YaCy. YaCy can also share indexes with other YaCy instances.
For everyone complaining about 404media needing an account for the posts, they explain their reasoning here : www.404media.co/why-404-media-needs-your-email-ad…
We Need Your Email Address

AI stealing our work. The collapse of social networks. The need to pay journalists to produce impactful journalism. Here is why we are asking for your email address to read 404 Media.

404 Media

They’re fully within their rights to restrict access to their content, just as everyone complaining is fully within their rights to not give up their email to access content.

I realize independent media financing is a huge struggle right now, and the quality of journalism has been in a downwards spiral for decades now. Clearly, the current system is unsustainable, I agree with 404media on that much. I wholeheartedly disagree with restricting access to information as a solution, as that seems completely opposed to what journalism should aim to achieve.

For most of its history, journalism has been locked behind a paywall. I think it’s a bit disingeneous to claim that this principle is against the idea of journalism. Journalism and especially good journalism is expensive - under a capitalist system, it’s entirely normal to ask for your work to be valued through monetary means.

That said, I’m most annoyed because no one is actually talking about Stract, just about how 404media decided to lock the article.

It worked in the history doesn’t mean it should be continued that way. Also neighbors and companies tended to share the same newspaper back then.

Writing was also a much rarer skill in the past.

Newpapers are available in public libraries

We don’t live in history anymore, we live in the present. Our relationship to information and journalism is not the same as it was in the past, for better and for worse.

In the past, a typical individual would have access to maybe a handful of news sources. You’d pay for the printing and delivery of a physical newspaper and that was going to be the extent of the journalism you were exposed to. I don’t think it’s realistic to think one should subscribe to every news source they’re likely to encounter online. I’d also counter that radio journalism was one of the main sources of information in the 20th century and had no such paywalls.

That said, I’m most annoyed because no one is actually talking about Stract, just about how 404media decided to lock the article

You know how that could have been avoided? If the link actually contained any useful information about Stract instead of being a sign-up page :P

Yeah, that’s an automatic no for me on all of their articles. I hope they eventually see posts like this and realize they’re shooting themselves in the foot.
Another interesting open source search engine is Mwmbl
GitHub - mwmbl/mwmbl: An open source, non-profit web search engine

An open source, non-profit web search engine. Contribute to mwmbl/mwmbl development by creating an account on GitHub.

GitHub
Thanks for sharing.

I will say I’m pretty glad to see a search engine which actually is not just a meta search engine. I wish Kagi would attempt this rather than partnerning with Brave.

One thing I find odd though is why these engines trying to make their own index don’t do the adversarial strategy that Brave Search has done : while using other indexes, collect what people actually click on and use it in your own index. I will note that I do not support Brave.

It’s not just a meta search. They do have their own index. And Brave is only one of a dozen-ish external index’s they also use.
Yes. Kagi doesn’t partner with Brave. They use Brave’s search index.

The open source SearXNG is good enough for me so far. Any reason to switch to Stract ?

Stract and SearXNG are two entirely different projects. SearXNG is just using other search engines to power itself - it’s known as a meta search engine. Stract has its own index that does not use other search engines to power itself.
If Stract is any good, it would be nice combo to make it work in SearXNG.

For anyone wondering about how they’ll eventually address financial sustainability if Stract takes off:

Stract is currently not monetized in any way, but its website says it will eventually have contextual ads tied to specific search terms but that it will not track its users, which is similar to the system DuckDuckGo uses. Stract also plans on offering ad-free searches to paying subscribers.

I’d pay for independent, non meta, ad-free search. I bet a more straightforward approach is more energy efficient as well. In the meanwhile the big tech are running a gazillion processes on our data to suck every bit of wealth they can out of our existence through their free (in it’s littlest sense) products.

I’d pay for independent, non meta, ad-free search.

Haven’t tested it yet, but have seen it mentioned several times here on Lemmy:

kagi.com

Kagi Search - A Premium Search Engine

Better search results with no ads. Welcome to Kagi (pronounced kah-gee), a paid search engine that gives power back to the user.

Yes, I’ve seen Kagi mentioned quite often here on Lemmy.

Though Kagi seems Tor unfriendlly maybe.

indeed. I cannot reach this link from tor:

help.kagi.com/kagi/…/search-sources.html

Search Sources | Kagi's Docs

Kagi Search Help

I use Kagi and love it.
Kagi is a meta search engine though. They just do calls to Google, Yandex, Brave, etc. cut the ad rot and sprinkle some secret spice on top.