Whenever I wonder what cool application I would personally build with #GPT models, I keep coming back to the problem that there's zero guarantee of "worst-case" performance.

When I put on my software engineering hat, my first thought is always to ask "what could go wrong and how?" The answer of #LLM|s is that things could go wrong in completely unpredictable ways, we just try to make it statistically unlikely.

1/3
#nlproc #gpt4 #chatgpt

Using #GPT to build a product is like programming a calculator that gives the right answer 90% of the time, but in 8% of cases fails in subtle and hard-to-notice ways, and in the remaining 2% it claims to be a potato farmer, insults the user, or deletes your hard drive.

Yet somehow people are okay with that, because when it's in the 90%, it's a really really awesome calculator?

2/3

Of course that's not a fair analogy (none is) — #GPT models can do the most impressive things that no other software could do before. But the problem of worst-case behavior remains, and I personally am totally put off by that.

I love the potential to use #AI models for creative uses, I just don't see myself wanting to build any other kind of serious application with them at this point. And I'm surprised that so many people don't seem to care.

3/3

@mbollmann Yeah that's my problem too. I've been prototyping various programs and the edge case failure is too big of a problem.

Seems like the only real option is "assistant" type programs which still a huge niche though a very different one from standard programs. I do think this could be improved with really good validation tools. e.g. GPT write a script for a very specific niche and then it's being rigously tested though for now that's more effort than it's worth.

@wraptile @mbollmann I'm actually very thankful that LLMs tend to work better as assistants. It leads to outcomes where humans are augmented through their collaboration with the tool rather than replaced. Hopefully this limitation of LLMs continue for a long time!

I think we'll find that this niche turns out to be huge.