Mastodawn

kagi seems nice. i'm conflicted between yet another fucking subscription and the obvious rational for it

EDIT: lol i had no idea kagi was cancelled already sorry didn't get the memo

@aeva Just use a custom search URL to get working results rather than paying another bad reseller of Google/Bing results.

Gracchus Babeuf Bourguignon (@[email protected])

[1 media attachment] Lot of Kagi-shilling going on again after the latest nonsense from Google Please remember:- - Kagi requires you to sign in to use it; - So every search can be uniquely linked to your account; - If you pay for that account, this links to you as a real person. - Kagi's HQ is in Palo Alto, California, and is thus subject to US Laws; - This means their logs and records can be expropriated without a warrant by Federal Authorities; - Those authorities are currently under the control of an actual fascist demagogue. - Kagi began life as an AI-First Company. The name is a portmanteau of K and AGI -- Artificial General Intelligence. They are not a credible actor, and they are in no way safe for searching for (e.g.) Reproductive Health Providers.

goblin.technology

Show thread

aeva May 20

@dalias wait what they're just another reseller???!!

Show thread

Cassandrich May 20

@aeva They supposedly "buy indexes from Google", whatever that means. My understanding is that they do a little more doctoring of the results than direct resellers like ddg do, but AIUI they are not actually crawling and indexing the web like a real search engine.

Show thread

aeva May 20

@dalias so tbh as much as I'd love to see a source for that claim, i have an easy time believing it because their results were surprisingly similar to duck duck go's in my brief test of it so far, though I also found stuff I wasn't able to find via duck duck go so idk.

Show thread

aeva May 20

@dalias the problem of "how on earth do you make a new search engine today" is a super interesting problem since a significant part of the web has some kind of anti-bot challenge screen. I figured they must have had to make an under the table deal with cloud flare or something. Aggregating google and bing would be much simpler though.

Show thread

dreid May 20

@aeva @dalias Some also use CommonCrawl, which is also a dataset used for training LLMs.

Show thread

aeva May 20

@dreid @dalias oh! I did not know about common crawl. Ever since I came across marginalia I've been curious to experiment with making my own search engine, but I've been unsure how to go about the data gathering part.

Show thread

dreid May 20

@aeva @dalias I have this pet theory that a personal search engine that basically only indexes stuff you've browsed to would satisfy like 75% of my search engine usage. And searching wikipedia would handle like 10% more.

Show thread

Cassandrich

@dreid @aeva Starting with Wikipedia and treating all references that have been there a long time without being edited out as legitimate crawl paths, would seed a very viable crawl and pagerank-replacement root for noncommercial searches.