kagi seems nice. i'm conflicted between yet another fucking subscription and the obvious rational for it

EDIT: lol i had no idea kagi was cancelled already sorry didn't get the memo

@aeva Just use a custom search URL to get working results rather than paying another bad reseller of Google/Bing results.

See also: https://goblin.technology/@dgold/statuses/01KS27XN21KCBNMPWN93GRQXA8

Gracchus Babeuf Bourguignon (@[email protected])

[1 media attachment] Lot of Kagi-shilling going on again after the latest nonsense from Google Please remember:- - Kagi requires you to sign in to use it; - So every search can be uniquely linked to your account; - If you pay for that account, this links to you as a real person. - Kagi's HQ is in Palo Alto, California, and is thus subject to US Laws; - This means their logs and records can be expropriated without a warrant by Federal Authorities; - Those authorities are currently under the control of an actual fascist demagogue. - Kagi began life as an AI-First Company. The name is a portmanteau of K and AGI -- Artificial General Intelligence. They are not a credible actor, and they are in no way safe for searching for (e.g.) Reproductive Health Providers.

goblin.technology
@dalias wait what they're just another reseller???!!
@aeva They supposedly "buy indexes from Google", whatever that means. My understanding is that they do a little more doctoring of the results than direct resellers like ddg do, but AIUI they are not actually crawling and indexing the web like a real search engine.

@dalias @aeva

Yeah. DDG etc. are search "brokers" in a way - but they license index/crawled results from Bing etc.

Running the engines to index/crawl _the whole web_ is a real computationally expensive affair. There've been a few attempts at peer-2-peer or indie indexing but its a hard problem to solve so those don't go very far.

@tezoatlipoca @aeva Exactly. It's an actual hard problem, and SV startup types don't like hard probems. They like flashy things that latch on to media cycles to sell to gullible readers of the tech press.
@dalias @aeva ergo, the end result (like always, everywhere, all the time) is there are few deeply entrenched incumbants that stay in business by reselling services to everyone else. :/

@tezoatlipoca @aeva And like social networking, this is a problem that fundamentally will never be solved by capitalist businesses and hierarchical power structures.

Thinking a search startup has any hope of solving what's wrong with Google is as foolish as thinking Bluesky had any hope of solving what was wrong with birdchan.

Like with the fedi, the only way this problem will be solved is with real decentralization of power.

@tezoatlipoca @dalias @aeva Can we just go back to "home pages"?

Like, a massive list of "human curated stuff to click on and ctrl-F" would actually be pretty useful with "the state of Things"...

@meejah @tezoatlipoca @dalias they call them "awesome lists" these days

@aeva @meejah @dalias

Hrm. I have been writing a tool that manages awesome lists and I didn't realize what they were called.

Here's the demo site: https://lists.awadwatt.com/index.html
Here's the github: https://github.com/tezoatlipoca/GeFeSLE-server#readme

The css is bad but its functional. Idea being lists can be changed, but infrequently; hosting a static html page is lightweight. List change -> update static html.

GeFeSLE Sample Site - Index of lists

@tezoatlipoca @meejah @dalias there's a lot of projects on github that are just readme's that are curated lists of projects within a a topic and for some reason they're all called "Awesome <Topic>" e.g. https://github.com/reHackable/awesome-reMarkable

I don't recall ever seeing them outside of github, so idk if it's like, a github thing or something. Mostly I mention it because it's a contemporary thing where people are taking the time to make curated lists for niche topics.

GitHub - reHackable/awesome-reMarkable: A curated list of projects related to the reMarkable tablet

A curated list of projects related to the reMarkable tablet - reHackable/awesome-reMarkable

GitHub

@aeva @meejah @dalias

I guess my concern about github is taht like Google docs or whatever, you don't OWN it; so if someone doesn't like your awesome list, it gets taken down.

My project's goal were:
- single binary self-hostable (assuming you can figure out the dns and rev.proxy stuff)
- list name + your host.domain==url of list. persistant
- access granularity to list level; public to private

Github is same on last two, easier on the first; but you have to adhere to someone else's T&Cs.

@tezoatlipoca @meejah @dalias oh yeah it's problematic for sure. I think those lists ended up there because github's search is tragically better for discovery than a real search engine, but I really wish this all were more indie web.
@tezoatlipoca @meejah @dalias we direly need an indie web search engine that can handle more complex queries than marginalia search
@aeva @tezoatlipoca @dalias kagi's "small web" stuff seems good .. But yeah in general I definitely agree.

@aeva @tezoatlipoca @meejah @dalias Hey if you have any particular queries that you find aren't working well please let me know about them, those types of problem cases are often very good for either finding bugs or as benchmarks as a a goal to work toward when adding new capabilities.

Like either here, or email me, or make an issue on GH. I'd appreciate it.

@marginalia @tezoatlipoca @meejah @dalias off the top of my head i just remember that things get dicey with more than two search terms.
@marginalia @tezoatlipoca @meejah @dalias the search that i had the most problems with recently on *every* search engine was "midi mandolin" (or to be precise, a midi controller that takes the shape and general characteristics of a mandolin (in which only two afaict exist), and definitely not anything about a knife or daw samples)
@marginalia @tezoatlipoca @meejah @dalias i thiink i managed to coax out an old article via marginalia about a roland pickup that could do midi, i don't remember how i got there. kagi surfaced a few semi-relevant forum posts that i didn't find with duck duck go, but most of the useful information for my survey of existing devices came from image searches instead
@marginalia @tezoatlipoca @meejah @dalias afaict there's only two things that fit the bill: a gaudy 10 string electric abomination that doesn't seem to be available for purchase anyway, and a prototype button grid on a pcb in the arrangement of the top of a fretboard

@aeva @tezoatlipoca @dalias I'm thinking like "original Yahoo!" etc

...but yeah, thanks! I do have several "awesome" github repos etc bookmarked. Mostly this is just whining / pining for the Better Times ;)

@dalias so tbh as much as I'd love to see a source for that claim, i have an easy time believing it because their results were surprisingly similar to duck duck go's in my brief test of it so far, though I also found stuff I wasn't able to find via duck duck go so idk.
@dalias the problem of "how on earth do you make a new search engine today" is a super interesting problem since a significant part of the web has some kind of anti-bot challenge screen. I figured they must have had to make an under the table deal with cloud flare or something. Aggregating google and bing would be much simpler though.

@aeva It's interesting and difficult, but I don't think anywhere near as astronomically difficult as Google and Microsoft want you to believe it is.

You don't need to hammer every site every day. For the most part, information worth indexing does not change frequently. You also don't need to even bother with a site once you've determined it's SEO-slop.

@aeva @dalias Some also use CommonCrawl, which is also a dataset used for training LLMs.
@dreid @dalias oh! I did not know about common crawl. Ever since I came across marginalia I've been curious to experiment with making my own search engine, but I've been unsure how to go about the data gathering part.
@aeva @dalias I have this pet theory that a personal search engine that basically only indexes stuff you've browsed to would satisfy like 75% of my search engine usage. And searching wikipedia would handle like 10% more.
@dreid @dalias I have a similar theory, that one could produce a really good search engine out of something along the lines of a federated bookmark sharing system, and the web-of-trust strat would be a practical curation strategy. The hard parts are unfortunately not the technical aspects however. If I didn't have to work for a living I'd probably take this on as a major personal project, but given the major time sink it would be I don't know how to fit it into my life otherwise.
@dreid @dalias I've been thinking about it for a long time though. Maybe I'll snap one day and try building it anyway :/
@aeva @dreid @dalias ohhhh hey, that sounds exactly like my current personal project I'm working on...

(it is very slow going but)

@aeva @dreid @dalias Sweet! Another person thinking about this problem!

I've been wondering about it too, especially how to build the indexers and the trust system. Going to bookmarks/sites already visited might be a decent place to start (although there are privacy issues involved when considering federation)

@aeva @dreid @dalias I'm unemployed so (in theory) have a fair bit of time on my hands. What's lacking is prioritisation, I jump on a shiny new project every week
@brib @dreid @dalias one of my inspirations is firefox's bookmark tagging system, which replaced folder categorization for me a long time ago. Over the years I've had ideas for features that mostly amount to wishing I could partition private bookmarks in their own categories (eg, work, health care stuff, pronz, etc) that could be locked/hidden separately
@brib @dreid @dalias my clearest vision for this would probably require making a custom browser though, and the ui design clarity is super important
@brib @dreid @dalias I'm really hoping that servo will end up being nice for embedding and try to have a stable API, that could be huge for making this not a crazy undertaking
@aeva @brib @dalias I too do not desire to work in the firefox codebase.
@dreid @brib @dalias servo is its own thing now 🤷‍♀️

@aeva @brib @dalias I know! I have been financially supporting their development for a while. Except for that brief period where they were considering changing their AI contribution policy.

I just meant, I am also kind of waiting for servo to be good. :)

@dreid @brib @dalias oh! yes. :) I agree completely then
@brib @aeva @dalias Yeah I'm pretty much ignoring federation here. My idea is that this is purely locally run software. The indexer is part of your browser, the search interface is part of your browser, except for fanout to blessed service specific search like wikipedia (or anything you can make a search shortcut for) as a fallback/enhancement.

@dreid @aeva @dalias I suspect it does a decent job at avoiding the slop too.

I wonder how hard it will be to build an index that way

@aeva @dreid This is absolutely something that could be built on the distributed autonomous identity system I have sketched in my mind. It's intended to support having all sorts of trust graphs with different semantics and different concepts of trust built on top of it.
@dreid @aeva Starting with Wikipedia and treating all references that have been there a long time without being edited out as legitimate crawl paths, would seed a very viable crawl and pagerank-replacement root for noncommercial searches.
@aeva @dalias If it's "just" a browser feature it also addresses the javascript and bot problems.
@dreid @aeva @dalias sounds vaguely like searxng?
@aeva @dalias it's a mix of multiple indexes, not sure what's the full list though includes Brave
@aeva @dalias Wikipedia: "As of April 2024, Kagi listed that its sources for search results were derived from Google, Brave Search, Mojeek and Yandex." https://en.wikipedia.org/wiki/Kagi
Kagi - Wikipedia

@lnl @dalias marvelous, thank you for the source

@dalias @aeva https://help.kagi.com/kagi/search-details/search-sources.html#search-sources

They claim to have their own index (but they also say words about including results from "all major" search engines as well?)

Search Sources | Kagi's Docs

Kagi Search Help