This idea that somehow search engines _can_ arbitrate "truth" is just so… not how any of this works or could even conceivably work.

The reason that search engines "backstop" with wikipedia is because wikipedia is a giant curated and mostly-audience-appropriate collection of knowledge.

Knowing what is "true" is so incredibly nontrivial.

@hrefna @futurebird there is also this vast disconnect - a lot of search engines and people broadly seem to think that there is a single “rightl answer to any given query.

When, j would argue, in nearly all cases there is not. That every query (and the person making the query) has a context which may or may not be known to the search engine and which can make the best answer/link for their context differ from others.

This is true even if seemingly obvious questions “how many hours in a day?”

@Rycaut

I'm becoming increasingly resigned to the fact that I'm going to need to write my own search engine.

It was never reasonable to expect any service much less a free one to really do full text search; the data are too big.

That's the sad truth behind all this suggesting & dumbing down of results. Sometimes even species names get "corrected" to something more popular.

But if I want to be able to have everything I've read and written searchable I'm just gonna need to do it myself.

@futurebird yup. For my local content Apple’s built in search is halfway decent but doesn’t get anything like my posts here or elsewhere unless I figure out a way to archive them locally (which wouldn’t be a half-bad idea to figure out though non-trivial)

I do wish there was search that trusted users enough to default to searching for the exact query vs trying to mangle it (autocomplete on iOS and on desktops doesn’t help)

@Rycaut @futurebird
Big Same to this whole thread.

I've made inroads, but have more work to switch my writing to a POSSE workflow:

https://emilygorcenski.com/post/posse-comitatus-twitter-as-a-syndication-engine/

A personal search engine will be a crucial piece of this.

On the theme of programmers being of service to regular folks in community, I've also been wondering how to make this accessible to non-techies. Besides if it's simple and comprehensible then I might not weary of maintaining it for myself, so those align nicely.

POSSE Comitatus: Twitter as a Syndication Engine · EmilyGorcenski.com

After a couple months away from Twitter, I’m reactivating the account. Not for posting, but rather to use it as a syndication engine. Twitter remains the

EmilyGorcenski.com

@Rycaut @futurebird
On the subject of public / community search engine. With an opt-in approach to #Fediverse indexing I've been wondering if something as simple as '00s era Page Rank would allow us to bootstrap a search engine off of fedi posts.

Fold OpenScience journals and public libraries into the Fediverse. Use community moderation to fight SEO abuses.

/riff

Pie in the sky, I know. But as Google (and Amazon) further enshittifies it's less hard to catch up to when search worked.

@futurebird
@Rycaut

if it's just things you write and you scrape your accounts' posts as you write them, you can probably use elasticsearch. but that only works going forward, finding all your old posts would be harder

(though you could probably use ES to search your Twitter etc archive)

I'm not deeply familiar with ES but I'm pretty sure this is what it's meant for, though getting the data into the right shape might take work

@futurebird having available a copy of "everything you have read and written" would be the hard part (e.g. is there even a browser addon that archives locally every page you visit?).

Once you have the data, you could easily grep it. And move to more indexed forms of searching if needed.

@NireBryce

@platonides @futurebird @NireBryce i think a bookmarklet that indexes might be a quick start