Wow.. while we were all making fun of Google's Bard demo for making some small mistakes about the James Webb Space Telescope, it turns out the Bing demo was wildly hallucinating made up financial comparisons between Gap and Lululemon! https://dkb.blog/p/bing-ai-cant-be-trusted
Bing AI Can't Be Trusted

Microsoft knowingly released a broken product for short-term hype.

DKB Blog

These are some seriously misleading errors!

> Lululemon’s gross margin is given as “58.7%”, which is a hallucinated value that doesn’t appear in their financial document. The real value is 55.9%.
>
> Lululemon’s operating margin is 19%, not 20.7%.
>
> Lululemon’s diluted earnings per share is $2.00 not $1.65.
>
> Cash and cash equivalents is wrong for Gap (should be $679 million) but correct for Lululemon.
>
> Inventory is wrong for Gap (should be $3.04 billion) but correct for Lululemon.

@simon Let's test AI in production, the best kind of testing!
@simon so you mean they are all crap? 😆
@simon can't wait for this whole situation to be written off as a collective hallucination
@simon @mattjhodgkinson It isn’t a small mistake. It’s how these work. There is no verification of anything they produce, breaking expectations of users everywhere.
@SloanLA @simon There needs to be anchoring in verifiable information built in to make these tools of any use.

@mattjhodgkinson @SloanLA the wild thing here is that's supposed to be how the Bing one works!

It runs regular searches and, according to the leaked prompts at least, instructs the language model to only use only those facts in its output, and provide citations

Problem is you can't actually tell a language model to do that - it's still going to predict random made up next tokens, because that's how language models work

@simon @mattjhodgkinson @SloanLA yeah, this will go down in history as a (totally predictable) bs usecase . But does look like LLMs can be used on top of proper search. Check out this recent paper by FAIR : https://arxiv.org/abs/2302.04761
Toolformer: Language Models Can Teach Themselves to Use Tools

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

arXiv.org
@simon @SloanLA You can lead an LLM to sources, but you can’t make it think.
@simon Anyone who has ever used Bing as a search engine is completely unsurprised. It seems to have a built-in randomiser. There is a reason that it’s allowed in China; like, good luck finding anything on Bing. Bing AI was always going to be psychedelic babbling.
@simon GPTs have no episodic memory, I guess they'll keep hallucinating. The transformer predicts a vector that is mostly a general idea and the final step is basically the decoder of a VAE so it will generate plausible sounding stuff from any general Idea. The way to improve would be to remember training data which search engines already are kind of doing and transformers are query/key/value based so should not be too long.
@simon The memes about this from the inside have been Rather Good.