Meta. OpenAI. Google.

Your AI chatbot is not *hallucinating*.

It's bullshitting.

It's bullshitting, because that's what you designed it to do. You designed it to generate seemingly authoritative text "with a blatant disregard for truth and logical coherence," i.e., to bullshit.

@ct_bergstrom annoyingly the chatbots don’t HAVE to behave this way. They are missing relatively straightforward steps.
- After its main response for the user, it needs to parse it like a prompt from the user.
- It needs to identify statements of fact, names, dates, math, geography, news, etc.
- then classic search engine lookup, history reference lookup, calculator, maps, etc query will rcan generate a confidence score, for each part. Label accordingly as the final output.
@mickdarling @ct_bergstrom that would result in high BS scores or hidden BS scores since the "if it's on the Internet it is true" doesn't apply anymore (did it ever?). They're selling the idea that LLM helps, BS scores wouldn't help selling it.

@jt_rebelo @ct_bergstrom

All the search engines have old school tools for validating certain content on the web. Various sites are rated for higher accuracy than others and that has been tweaked and modified over the last two decades. Thats not the hard part.

The hard part is just as you say, they are trying to convince people ‘This one new tool is the answer!’ and slapping a 59% confidence metric on a statement about the JWST doesn’t look nearly impressive enough.

@mickdarling @ct_bergstrom yes, I know that every search engine has it already (with better or worse results) and agree that is the easy part (a sisyphus effort, nonetheless). My problem with the hard part is that the confidence metric will always be opaque to us (unless some opensource code is used and outed) and might even be an AI/LLM own assessment. They're as fallible as we are on that (because we were the ones that programmed them and the ones that fed them information).

@jt_rebelo @ct_bergstrom

This is how Google could maintain its Search supremacy, and its golden goose. They list all the sites that back up their LLMs output with a page of links, or more like the sidebar thing that Bing is using.

I would love a Open Source rating system, but you are right it is hard to deobfuscate their inner workings.

My company, Tomorrowish, built a rating system for Tweets more than a decade ago, and though it gave good results, finding out WHY was very tough.

@jt_rebelo @ct_bergstrom And that was our own internal tool we had trouble understanding!

Admittedly, we were a tiny team running a startup on a shoestring, and constantly trying to adjust to pull in some more revenue, SO, we didn't have a lot of time to investigate rigorously something that already worked.
🤷

I think a Google, or Microsoft level company could pull it off. with their personnel and resources.