kagi seems nice. i'm conflicted between yet another fucking subscription and the obvious rational for it

EDIT: lol i had no idea kagi was cancelled already sorry didn't get the memo

@aeva Just use a custom search URL to get working results rather than paying another bad reseller of Google/Bing results.

See also: https://goblin.technology/@dgold/statuses/01KS27XN21KCBNMPWN93GRQXA8

Gracchus Babeuf Bourguignon (@[email protected])

[1 media attachment] Lot of Kagi-shilling going on again after the latest nonsense from Google Please remember:- - Kagi requires you to sign in to use it; - So every search can be uniquely linked to your account; - If you pay for that account, this links to you as a real person. - Kagi's HQ is in Palo Alto, California, and is thus subject to US Laws; - This means their logs and records can be expropriated without a warrant by Federal Authorities; - Those authorities are currently under the control of an actual fascist demagogue. - Kagi began life as an AI-First Company. The name is a portmanteau of K and AGI -- Artificial General Intelligence. They are not a credible actor, and they are in no way safe for searching for (e.g.) Reproductive Health Providers.

goblin.technology
@dalias wait what they're just another reseller???!!
@aeva They supposedly "buy indexes from Google", whatever that means. My understanding is that they do a little more doctoring of the results than direct resellers like ddg do, but AIUI they are not actually crawling and indexing the web like a real search engine.
@dalias so tbh as much as I'd love to see a source for that claim, i have an easy time believing it because their results were surprisingly similar to duck duck go's in my brief test of it so far, though I also found stuff I wasn't able to find via duck duck go so idk.
@dalias the problem of "how on earth do you make a new search engine today" is a super interesting problem since a significant part of the web has some kind of anti-bot challenge screen. I figured they must have had to make an under the table deal with cloud flare or something. Aggregating google and bing would be much simpler though.
@aeva @dalias Some also use CommonCrawl, which is also a dataset used for training LLMs.
@dreid @dalias oh! I did not know about common crawl. Ever since I came across marginalia I've been curious to experiment with making my own search engine, but I've been unsure how to go about the data gathering part.
@aeva @dalias I have this pet theory that a personal search engine that basically only indexes stuff you've browsed to would satisfy like 75% of my search engine usage. And searching wikipedia would handle like 10% more.
@dreid @dalias I have a similar theory, that one could produce a really good search engine out of something along the lines of a federated bookmark sharing system, and the web-of-trust strat would be a practical curation strategy. The hard parts are unfortunately not the technical aspects however. If I didn't have to work for a living I'd probably take this on as a major personal project, but given the major time sink it would be I don't know how to fit it into my life otherwise.
@dreid @dalias I've been thinking about it for a long time though. Maybe I'll snap one day and try building it anyway :/

@aeva @dreid @dalias Sweet! Another person thinking about this problem!

I've been wondering about it too, especially how to build the indexers and the trust system. Going to bookmarks/sites already visited might be a decent place to start (although there are privacy issues involved when considering federation)

@aeva @dreid @dalias I'm unemployed so (in theory) have a fair bit of time on my hands. What's lacking is prioritisation, I jump on a shiny new project every week
@brib @dreid @dalias one of my inspirations is firefox's bookmark tagging system, which replaced folder categorization for me a long time ago. Over the years I've had ideas for features that mostly amount to wishing I could partition private bookmarks in their own categories (eg, work, health care stuff, pronz, etc) that could be locked/hidden separately
@brib @dreid @dalias my clearest vision for this would probably require making a custom browser though, and the ui design clarity is super important
@brib @dreid @dalias I'm really hoping that servo will end up being nice for embedding and try to have a stable API, that could be huge for making this not a crazy undertaking
@aeva @brib @dalias I too do not desire to work in the firefox codebase.
@dreid @brib @dalias servo is its own thing now 🤷‍♀️

@aeva @brib @dalias I know! I have been financially supporting their development for a while. Except for that brief period where they were considering changing their AI contribution policy.

I just meant, I am also kind of waiting for servo to be good. :)

@dreid @brib @dalias oh! yes. :) I agree completely then
@brib @aeva @dalias Yeah I'm pretty much ignoring federation here. My idea is that this is purely locally run software. The indexer is part of your browser, the search interface is part of your browser, except for fanout to blessed service specific search like wikipedia (or anything you can make a search shortcut for) as a fallback/enhancement.

@dreid @aeva @dalias I suspect it does a decent job at avoiding the slop too.

I wonder how hard it will be to build an index that way