Bing's "AI" demo also contained a ton of factual errors: https://fedi.simonwillison.net/@simon/109859342792302510
Simon Willison (@[email protected])

Wow.. while we were all making fun of Google's Bard demo for making some small mistakes about the James Webb Space Telescope, it turns out the Bing demo was wildly hallucinating made up financial comparisons between Gap and Lululemon! https://dkb.blog/p/bing-ai-cant-be-trusted

Mastodon
GPT doesn't know anything, it just generates really convincing-looking text. Everybody needs to calm down about it being the future of search.
@seldo I think the best thing I’ve seen it do so far is give me a template for a formal email
@jefflembeck @seldo it was able to solve an incident at work for me. I plugged in two lines of the stack trace and gave it the symptoms. It gave the obvious things that we already looked at and then it asked us about a parameter that wasn’t set. If this parameter wasn’t set, then our symptoms occurred. Sure enough, a missing config fixed our issue.
@adrianacala @jefflembeck Isn't that just what you'd get from reading posts on a search engine though?
@seldo @jefflembeck yes if you know exactly what to look for and there was enough noise to push this down in the search results. We all looked. Once it gave the idea of the missing config, I was absolutely able to google it and match what Chat GPT said.
@seldo people need to wrap their head around the fact that GPT models are telling you what they think would be most likely response to your question, not actually going out and finding the answer.
@samdcbu We need to stop calling it AI.

@seldo I don't expect regular search to always return reliable information either.

To me it's already more useful for many queries.

If I need a reminder what was that Docker flag again, ChatGPT already better than Google or StackOverflow.

@seldo I think Tom Scott framed it nicely — we're on an S-curve of evolution of a new type of AI. Whether it's going to plateau on being a bullshit generator or become a massive disruption depends on where you assume we already are on the S-curve. I bet we're early, and there are still may ways to improve accuracy.

https://www.youtube.com/watch?v=jPhJbKBuNnA

I tried using AI. It scared me.

YouTube
@kornel many people seem to have such a harsh take on ChatGPT and the likes. Maybe it's expectations. But to me, it is so so good in its current state and the future looks bright.

@hboon When it's positioned as an AI that knows all the answers, the expectations are understandably high.

I wonder if people will get a sense of what it's good for and what isn't, so framing will change from "LLM AI isn't perfect" to "LLM AI isn't a good tool for this task" (good for writing whimsical poems, not math and logic puzzles)

@kornel true. I hope hype doesn't kill it.

But this specific example:

> not math and logic puzzles

I'm not sure it can't (soon). If it can write small programs and if it can turn prose into code, it might well be able to do it and frighteningly soon.

@kornel This bothered me, so I tried a simple example of an example on Reddit where it doesn't work for a simple math problem. The numeric response ("Therefore, the simplified fraction is 1350/39.") is wrong (should be "450/13"), but when I run the Python program it returned (modifying the last line for syntax error), it gave the correct answer:

> The simplified fraction is 450 / 13

@hboon No, it really can't. The tokenization and attention model of LLMs makes them really bad at math. It may be fixable eventually, but what you get today is almost brute-force memorization, not computation.

Programming language generation fares a bit better, because it's a translation and symbolic manipulation, which is more of a language task than a math task.

There are some attempts to inject computational ability: https://arxiv.org/pdf/2302.04761.pdf

@seldo not so impressive, we've all been in meetings with folks like that
@seldo I asked it to write an academic paper about the origins of one of the musical instruments I play. Not only was it massively factually incorrect, but the paper itself was riddled with internal contradictions