Mastodawn

Bing's "AI" demo also contained a ton of factual errors: https://fedi.simonwillison.net/@simon/109859342792302510

Simon Willison (@[email protected])

Wow.. while we were all making fun of Google's Bard demo for making some small mistakes about the James Webb Space Telescope, it turns out the Bing demo was wildly hallucinating made up financial comparisons between Gap and Lululemon! https://dkb.blog/p/bing-ai-cant-be-trusted

Mastodon

Laurie Voss Feb 13, 2023

GPT doesn't know anything, it just generates really convincing-looking text. Everybody needs to calm down about it being the future of search.

Jeffrey Lembeck Feb 13, 2023

@seldo I think the best thing I’ve seen it do so far is give me a template for a formal email

adrianacala Feb 14, 2023

@jefflembeck @seldo it was able to solve an incident at work for me. I plugged in two lines of the stack trace and gave it the symptoms. It gave the obvious things that we already looked at and then it asked us about a parameter that wasn’t set. If this parameter wasn’t set, then our symptoms occurred. Sure enough, a missing config fixed our issue.

Laurie Voss Feb 14, 2023

@adrianacala @jefflembeck Isn't that just what you'd get from reading posts on a search engine though?

adrianacala Feb 14, 2023

@seldo @jefflembeck yes if you know exactly what to look for and there was enough noise to push this down in the search results. We all looked. Once it gave the idea of the missing config, I was absolutely able to google it and match what Chat GPT said.

Henri Helvetica Feb 13, 2023

@seldo thank you!

Sam Butler Feb 13, 2023

@seldo people need to wrap their head around the fact that GPT models are telling you what they think would be most likely response to your question, not actually going out and finding the answer.

Laurie Voss Feb 13, 2023

@samdcbu We need to stop calling it AI.

Kornel Feb 13, 2023

@seldo I don't expect regular search to always return reliable information either.

To me it's already more useful for many queries.

If I need a reminder what was that Docker flag again, ChatGPT already better than Google or StackOverflow.

Kornel Feb 13, 2023

@seldo I think Tom Scott framed it nicely — we're on an S-curve of evolution of a new type of AI. Whether it's going to plateau on being a bullshit generator or become a massive disruption depends on where you assume we already are on the S-curve. I bet we're early, and there are still may ways to improve accuracy.

https://www.youtube.com/watch?v=jPhJbKBuNnA

I tried using AI. It scared me.

YouTube

Hwee-Boon Yar Feb 14, 2023

@kornel many people seem to have such a harsh take on ChatGPT and the likes. Maybe it's expectations. But to me, it is so so good in its current state and the future looks bright.

Kornel Feb 14, 2023

@hboon When it's positioned as an AI that knows all the answers, the expectations are understandably high.

I wonder if people will get a sense of what it's good for and what isn't, so framing will change from "LLM AI isn't perfect" to "LLM AI isn't a good tool for this task" (good for writing whimsical poems, not math and logic puzzles)

Hwee-Boon Yar Feb 14, 2023

@kornel true. I hope hype doesn't kill it.

But this specific example:

> not math and logic puzzles

I'm not sure it can't (soon). If it can write small programs and if it can turn prose into code, it might well be able to do it and frighteningly soon.

Hwee-Boon Yar Feb 14, 2023

@kornel This bothered me, so I tried a simple example of an example on Reddit where it doesn't work for a simple math problem. The numeric response ("Therefore, the simplified fraction is 1350/39.") is wrong (should be "450/13"), but when I run the Python program it returned (modifying the last line for syntax error), it gave the correct answer:

> The simplified fraction is 450 / 13

Kornel Feb 14, 2023

@hboon No, it really can't. The tokenization and attention model of LLMs makes them really bad at math. It may be fixable eventually, but what you get today is almost brute-force memorization, not computation.

Programming language generation fares a bit better, because it's a translation and symbolic manipulation, which is more of a language task than a math task.

There are some attempts to inject computational ability: https://arxiv.org/pdf/2302.04761.pdf

EndlessMason Feb 13, 2023

@seldo not so impressive, we've all been in meetings with folks like that

Miles Feb 14, 2023

@seldo I asked it to write an academic paper about the origins of one of the musical instruments I play. Not only was it massively factually incorrect, but the paper itself was riddled with internal contradictions