Mastodawn

Mar 16, 2023

Whenever I wonder what cool application I would personally build with #GPT models, I keep coming back to the problem that there's zero guarantee of "worst-case" performance.

When I put on my software engineering hat, my first thought is always to ask "what could go wrong and how?" The answer of #LLM|s is that things could go wrong in completely unpredictable ways, we just try to make it statistically unlikely.

1/3
#nlproc #gpt4 #chatgpt

Show thread

Marcel Bollmann

Using #GPT to build a product is like programming a calculator that gives the right answer 90% of the time, but in 8% of cases fails in subtle and hard-to-notice ways, and in the remaining 2% it claims to be a potato farmer, insults the user, or deletes your hard drive.

Yet somehow people are okay with that, because when it's in the 90%, it's a really really awesome calculator?

2/3

Show thread

Marcel Bollmann

Mar 16, 2023

Of course that's not a fair analogy (none is) — #GPT models can do the most impressive things that no other software could do before. But the problem of worst-case behavior remains, and I personally am totally put off by that.

I love the potential to use #AI models for creative uses, I just don't see myself wanting to build any other kind of serious application with them at this point. And I'm surprised that so many people don't seem to care.

3/3

Show thread

wraptile Mar 16, 2023

@mbollmann Yeah that's my problem too. I've been prototyping various programs and the edge case failure is too big of a problem.

Seems like the only real option is "assistant" type programs which still a huge niche though a very different one from standard programs. I do think this could be improved with really good validation tools. e.g. GPT write a script for a very specific niche and then it's being rigously tested though for now that's more effort than it's worth.

Show thread

Marc Khoury Mar 16, 2023

@wraptile @mbollmann I'm actually very thankful that LLMs tend to work better as assistants. It leads to outcomes where humans are augmented through their collaboration with the tool rather than replaced. Hopefully this limitation of LLMs continue for a long time!

I think we'll find that this niche turns out to be huge.

Show thread

sara Mar 16, 2023

@mbollmann Agreed. Especially if you're working with paying customers or data prone to bias, not being able to control the back box is problematic. You can put a Labs or Alpha label on it, or position it as a suggestion, but some risk of hallucinations that are brand-damaging remains. Will users come to accept the weirdness without blame? Unsure #GPT4

Show thread

Fabian N. T.Mar 17, 2023

@mbollmann Well said. While I'm skeptical by nature, I also see value in "soft reliability" tasks, brainstorming, creativity, findings in huge data etc. But hard decisions should not be made or blindly accepted. Even in the support chat bot use-case, I'd fear that they'd give a wrong answer in 1% or 0.1% of cases. A human-curated FAQ that covers 90%, and human chat for the rest is still better, imho. But I'm also old, not a capitalist, and underestimate people's carelessness 🤷

Show thread

Marcel Bollmann

Mar 17, 2023

@fabian I've seen several reports of researchers being contacted about papers they didn't write, because ChatGPT claimed that they did.

That's a relatively harmless failure case, but it's easy to imagine bigger problems resulting from people putting too much blind faith in a model's output. Especially if it's so convincingly presented.

Show thread

Erik Jonker Mar 17, 2023

@mbollmann just as an assistant, helping writing pieces of code or suggesting them, it works quite while. That is how many people use it i think ?

Show thread

Marcel Bollmann

Mar 17, 2023

@ErikJonker Oh, absolutely. But even there you have to be alert to carefully check the suggestions, and they can be wrong in very subtle ways. I feel many people have too much blind faith in the output.

Show thread

Dieu Mar 16, 2023

@mbollmann

Think like a business clown, not like an engineer.

You need something that seems to work on the surface and is cheap. Then you sell it for a profit and if people begin to cry you say, sorry people! software's always been shitty, everybody knows that. We might fix this in the next release if you're lucky.

Meanwhile you competitors, stupid enough to put actual work and effort into their product, silently leave the market because they are prohibitively expensive.

Show thread

Dieu Mar 16, 2023

@mbollmann I mean, we live in a world where the single most important cryptography software is written in C. And it shows. And everyone is fine with that because that's how the world works.

Show thread

Michal Měchura Mar 16, 2023

Good thing AI wasn't around when they were inventing calculators! Because then we'd have calculators today that go like "how much is 7 by 23? LOL a lot."

Show thread

Grzegorz Chrupała Mar 17, 2023

@mbollmann I don't know about the potato farmer scenario, but failing in rare cases in subtle and hard to notice ways is what happens with the vast majority of software that I've ever used.