kagi seems nice. i'm conflicted between yet another fucking subscription and the obvious rational for it

EDIT: lol i had no idea kagi was cancelled already sorry didn't get the memo

@aeva Just use a custom search URL to get working results rather than paying another bad reseller of Google/Bing results.

See also: https://goblin.technology/@dgold/statuses/01KS27XN21KCBNMPWN93GRQXA8

Gracchus Babeuf Bourguignon (@[email protected])

[1 media attachment] Lot of Kagi-shilling going on again after the latest nonsense from Google Please remember:- - Kagi requires you to sign in to use it; - So every search can be uniquely linked to your account; - If you pay for that account, this links to you as a real person. - Kagi's HQ is in Palo Alto, California, and is thus subject to US Laws; - This means their logs and records can be expropriated without a warrant by Federal Authorities; - Those authorities are currently under the control of an actual fascist demagogue. - Kagi began life as an AI-First Company. The name is a portmanteau of K and AGI -- Artificial General Intelligence. They are not a credible actor, and they are in no way safe for searching for (e.g.) Reproductive Health Providers.

goblin.technology
@dalias wait what they're just another reseller???!!
@aeva They supposedly "buy indexes from Google", whatever that means. My understanding is that they do a little more doctoring of the results than direct resellers like ddg do, but AIUI they are not actually crawling and indexing the web like a real search engine.
@dalias so tbh as much as I'd love to see a source for that claim, i have an easy time believing it because their results were surprisingly similar to duck duck go's in my brief test of it so far, though I also found stuff I wasn't able to find via duck duck go so idk.
@dalias the problem of "how on earth do you make a new search engine today" is a super interesting problem since a significant part of the web has some kind of anti-bot challenge screen. I figured they must have had to make an under the table deal with cloud flare or something. Aggregating google and bing would be much simpler though.
@aeva @dalias Some also use CommonCrawl, which is also a dataset used for training LLMs.
@dreid @dalias oh! I did not know about common crawl. Ever since I came across marginalia I've been curious to experiment with making my own search engine, but I've been unsure how to go about the data gathering part.
@aeva @dalias I have this pet theory that a personal search engine that basically only indexes stuff you've browsed to would satisfy like 75% of my search engine usage. And searching wikipedia would handle like 10% more.
@dreid @aeva Starting with Wikipedia and treating all references that have been there a long time without being edited out as legitimate crawl paths, would seed a very viable crawl and pagerank-replacement root for noncommercial searches.