kagi seems nice. i'm conflicted between yet another fucking subscription and the obvious rational for it

EDIT: lol i had no idea kagi was cancelled already sorry didn't get the memo

@aeva Just use a custom search URL to get working results rather than paying another bad reseller of Google/Bing results.

See also: https://goblin.technology/@dgold/statuses/01KS27XN21KCBNMPWN93GRQXA8

Gracchus Babeuf Bourguignon (@[email protected])

[1 media attachment] Lot of Kagi-shilling going on again after the latest nonsense from Google Please remember:- - Kagi requires you to sign in to use it; - So every search can be uniquely linked to your account; - If you pay for that account, this links to you as a real person. - Kagi's HQ is in Palo Alto, California, and is thus subject to US Laws; - This means their logs and records can be expropriated without a warrant by Federal Authorities; - Those authorities are currently under the control of an actual fascist demagogue. - Kagi began life as an AI-First Company. The name is a portmanteau of K and AGI -- Artificial General Intelligence. They are not a credible actor, and they are in no way safe for searching for (e.g.) Reproductive Health Providers.

goblin.technology
@dalias wait what they're just another reseller???!!
@aeva They supposedly "buy indexes from Google", whatever that means. My understanding is that they do a little more doctoring of the results than direct resellers like ddg do, but AIUI they are not actually crawling and indexing the web like a real search engine.
@dalias so tbh as much as I'd love to see a source for that claim, i have an easy time believing it because their results were surprisingly similar to duck duck go's in my brief test of it so far, though I also found stuff I wasn't able to find via duck duck go so idk.
@dalias the problem of "how on earth do you make a new search engine today" is a super interesting problem since a significant part of the web has some kind of anti-bot challenge screen. I figured they must have had to make an under the table deal with cloud flare or something. Aggregating google and bing would be much simpler though.
@aeva @dalias Some also use CommonCrawl, which is also a dataset used for training LLMs.
@dreid @dalias oh! I did not know about common crawl. Ever since I came across marginalia I've been curious to experiment with making my own search engine, but I've been unsure how to go about the data gathering part.
@aeva @dalias I have this pet theory that a personal search engine that basically only indexes stuff you've browsed to would satisfy like 75% of my search engine usage. And searching wikipedia would handle like 10% more.
@dreid @dalias I have a similar theory, that one could produce a really good search engine out of something along the lines of a federated bookmark sharing system, and the web-of-trust strat would be a practical curation strategy. The hard parts are unfortunately not the technical aspects however. If I didn't have to work for a living I'd probably take this on as a major personal project, but given the major time sink it would be I don't know how to fit it into my life otherwise.
@dreid @dalias I've been thinking about it for a long time though. Maybe I'll snap one day and try building it anyway :/

@aeva @dreid @dalias Sweet! Another person thinking about this problem!

I've been wondering about it too, especially how to build the indexers and the trust system. Going to bookmarks/sites already visited might be a decent place to start (although there are privacy issues involved when considering federation)

@brib @dreid @dalias one of my inspirations is firefox's bookmark tagging system, which replaced folder categorization for me a long time ago. Over the years I've had ideas for features that mostly amount to wishing I could partition private bookmarks in their own categories (eg, work, health care stuff, pronz, etc) that could be locked/hidden separately
@brib @dreid @dalias my clearest vision for this would probably require making a custom browser though, and the ui design clarity is super important
@brib @dreid @dalias I'm really hoping that servo will end up being nice for embedding and try to have a stable API, that could be huge for making this not a crazy undertaking
@aeva @brib @dalias I too do not desire to work in the firefox codebase.
@dreid @brib @dalias servo is its own thing now 🤷‍♀️

@aeva @brib @dalias I know! I have been financially supporting their development for a while. Except for that brief period where they were considering changing their AI contribution policy.

I just meant, I am also kind of waiting for servo to be good. :)

@dreid @brib @dalias oh! yes. :) I agree completely then