These are some seriously misleading errors!
> Lululemon’s gross margin is given as “58.7%”, which is a hallucinated value that doesn’t appear in their financial document. The real value is 55.9%.
>
> Lululemon’s operating margin is 19%, not 20.7%.
>
> Lululemon’s diluted earnings per share is $2.00 not $1.65.
>
> Cash and cash equivalents is wrong for Gap (should be $679 million) but correct for Lululemon.
>
> Inventory is wrong for Gap (should be $3.04 billion) but correct for Lululemon.
@mattjhodgkinson @SloanLA the wild thing here is that's supposed to be how the Bing one works!
It runs regular searches and, according to the leaked prompts at least, instructs the language model to only use only those facts in its output, and provide citations
Problem is you can't actually tell a language model to do that - it's still going to predict random made up next tokens, because that's how language models work

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.