This idea that somehow search engines _can_ arbitrate "truth" is just so… not how any of this works or could even conceivably work.

The reason that search engines "backstop" with wikipedia is because wikipedia is a giant curated and mostly-audience-appropriate collection of knowledge.

Knowing what is "true" is so incredibly nontrivial.

@hrefna @molly0xfff I was just thinking yesterday “maybe the weird ontology people were right all along”
@hrefna @molly0xfff it’s almost as if facts as socially constructed! Thanks Bruno LaTour!
@hrefna lol 100% truth. Find me a Knowledge Panel on Google that isn't basically a straight rip from Wikipedia - doesn't exist.
@developit @hrefna r/FuckCars did it a few times, Google were instead pulling the "knowledge" from oil propaganda think tanks instead.

@developit @hrefna While much is stolen from Wikipedia, there are other sources for the knowledge panel, notably Google's local business database.

https://www.google.com/search?q=berkeley+bowl

Bevor Sie zur Google Suche weitergehen

@hrefna I believe the goal for search engines is easier, focusing attention on credible information. That's a lower bar and has been done for decades using techniques like TrustRank:
http://ilpubs.stanford.edu:8090/770/
Google has used this since the beginning -- they didn't use PageRank for long because it got spammed immediately -- and it's also in Google's mission statement, making information accessible (despite adversaries flooding the zone) and useful (focusing attention on the best stuff).
Combating Web Spam with TrustRank - Stanford InfoLab Publication Server

@glinden @hrefna The really unfortunate thing is that the spammers never really got the memo about how things changed so they’re still spamming every single forum and web form that looks like a comment entry with the hopes that the resulting links won’t be rel=“nofollow” and it’s made the Internet a much worse place overall.

I mean it’s the spammers’ fault, but like, it’d be nice if their efforts were even meaningful towards their goal and not just shitting on lawns unnecessarily.

@hrefna One quirk I've found is that the reliability of any given Wikipedia page is directly proportional to the popularity of the subject. High-traffic pages about common topics are well curated. Obscure topics less so.
@Alan_Au in my niche subject area I've found very few errors, and the page views are probably on the order of a dozen per year, excluding me. Maybe this is only true of science wiki's. I am mostly looking up compounds and uncommon analytical methods.

@hrefna @futurebird there is also this vast disconnect - a lot of search engines and people broadly seem to think that there is a single “rightl answer to any given query.

When, j would argue, in nearly all cases there is not. That every query (and the person making the query) has a context which may or may not be known to the search engine and which can make the best answer/link for their context differ from others.

This is true even if seemingly obvious questions “how many hours in a day?”

@hrefna @futurebird

My example “how many hours in a day” is one of my favorite “trick” questions - there is no single right answer to this. It entirely depends on unknown context - where you are, what day it is, what year it is, in some cases what your political affiliations are. All of which can make the answer differ.

(To explain - day light savings as well as time zones and changing day light savings and time zones is a political decision - where borders are fought over time can differ)

@Rycaut

I'm becoming increasingly resigned to the fact that I'm going to need to write my own search engine.

It was never reasonable to expect any service much less a free one to really do full text search; the data are too big.

That's the sad truth behind all this suggesting & dumbing down of results. Sometimes even species names get "corrected" to something more popular.

But if I want to be able to have everything I've read and written searchable I'm just gonna need to do it myself.

@futurebird yup. For my local content Apple’s built in search is halfway decent but doesn’t get anything like my posts here or elsewhere unless I figure out a way to archive them locally (which wouldn’t be a half-bad idea to figure out though non-trivial)

I do wish there was search that trusted users enough to default to searching for the exact query vs trying to mangle it (autocomplete on iOS and on desktops doesn’t help)

@Rycaut @futurebird
Big Same to this whole thread.

I've made inroads, but have more work to switch my writing to a POSSE workflow:

https://emilygorcenski.com/post/posse-comitatus-twitter-as-a-syndication-engine/

A personal search engine will be a crucial piece of this.

On the theme of programmers being of service to regular folks in community, I've also been wondering how to make this accessible to non-techies. Besides if it's simple and comprehensible then I might not weary of maintaining it for myself, so those align nicely.

POSSE Comitatus: Twitter as a Syndication Engine · EmilyGorcenski.com

After a couple months away from Twitter, I’m reactivating the account. Not for posting, but rather to use it as a syndication engine. Twitter remains the

EmilyGorcenski.com

@Rycaut @futurebird
On the subject of public / community search engine. With an opt-in approach to #Fediverse indexing I've been wondering if something as simple as '00s era Page Rank would allow us to bootstrap a search engine off of fedi posts.

Fold OpenScience journals and public libraries into the Fediverse. Use community moderation to fight SEO abuses.

/riff

Pie in the sky, I know. But as Google (and Amazon) further enshittifies it's less hard to catch up to when search worked.

@futurebird
@Rycaut

if it's just things you write and you scrape your accounts' posts as you write them, you can probably use elasticsearch. but that only works going forward, finding all your old posts would be harder

(though you could probably use ES to search your Twitter etc archive)

I'm not deeply familiar with ES but I'm pretty sure this is what it's meant for, though getting the data into the right shape might take work

@futurebird having available a copy of "everything you have read and written" would be the hard part (e.g. is there even a browser addon that archives locally every page you visit?).

Once you have the data, you could easily grep it. And move to more indexed forms of searching if needed.

@NireBryce

@platonides @futurebird @NireBryce i think a bookmarklet that indexes might be a quick start
@Rycaut @hrefna @futurebird
at the very least, the number of hours in a day depends on what rotating body it's based on (example: Mars, Earth, and Venus have different rotational periods), and when in the lifetime of the body (example: Earth's rotational period has been growing longer, and was only about 22 hours long back in the Devonian), and please don't even talk about time zone and/or calendrical issues ...

@hrefna FWIW, the decision to include Wikipedia is so many search results is not because the engineers (or anyone else) "decided to include it". Wikipedia has a LOT of trust signals pointing to the site and frequently to specific pages. Those trust signals will surface Wikipedia near or at the top for many queries.

The search industry hasn't really discussed this in a decade, but I believe Google actually dialed BACK some of the trust signals to keep Wikipedia from dominating SERPs.

@hrefna and when naive researchers believe the statistical package can tell them whether something's real ou not?!
@hrefna
Wikipedia is "like an encyclopedia" in the same way that Madonna is like a virgin

@hrefna Does God exist? I can definitely imagine a world in which I might try to come up with an answer to that question by entering it in a search engine. I can not—I refuse to—imagine a world in which I simply take the search engine's word for the answer.

That goes double now that the search engine would search instead for "Does Greenland exist?" instead—without telling me—if it thought that was more popular, and it will go triple once the ChatGPT knockoffs finish taking over search.

@hrefna
Absolutely untrivial. But is there a singular truth?
From a historiographic perspective; consider the historian's mantra of primary vs secondary [& tertiary] sources. We all know that primary sources [eye or ear witness accounts ] can vary - the watcher or listener or participants' recollections are coloured by context, cultural points of view, experiences, power hierarchies.Thus versions of the truth exist.
But there is still truth vs untruth or outright lies [a la Trump]

@hrefna so well put, thanks for that, it seems like an impossible problem, to try and present truth,

Eg If you look at the big story by Kevin Roose in NYT you could take a view that the ai was doing its best to generate a response that he was looking for