Mastodawn

Simon Willison Jan 29, 2023

One of the things I'm finding so interesting about large language models like GPT-3 and ChatGPT is that they're pretty much the world's most impressive party trick

All they do is predict the next word based on previous context. It turns out when you scale the model above a certain size it can give the false impression of "intelligence", but that's a total fraud

It's all smoke and mirrors! The intriguing challenge is finding useful tasks you can apply them to in spite of the many, many footguns

Show thread

Simon Willison Jan 29, 2023

And in case this post wasn't clear: I'm all-in on large language models: they confidently pass my personal test for if a piece of technology is worth learning:

"Does this let me build things that I could not have built without it?"

What I find interesting is that - on the surface - they look like they solve a lot more problems than they actually do, partly thanks to the confidence with which they present themselves

Figuring out what they're genuinely good for is a very interesting challenge

Show thread

Drew Breunig Jan 29, 2023

@simon The applicable problem space is even fuzzier because there’s some things they do well, but not consistently. Which to me means the problem space has to be limited to areas where the cost of a false-positive/negative is low.

Show thread

Simon Willison

@dbreunig right - there are so many potential applications where you might get good results 90% of the time and utter garbage 10% of the time, which for things like loan application evaluation should be completely unacceptable, but might be fine for things like bulk sentiment analysis to identify general trends

There's a lot of depth just to learning how to identify places that the technology is a good fit

Show thread

Drew Breunig Jan 29, 2023

@simon Agreed. So far all appropriate apps fit in a) toys, b) selector interfaces (letting the user ‘approve’ final output) or c) fully mediated output (an artist in photoshop tweaking the bad parts of infill audiences never see). I’m sure there are more modes.

Show thread

Simon Willison Jan 29, 2023

@dbreunig one area I'm particularly excited about it data extraction: given a huge jumble of badly OCRd documents, can a language model be used to ask questions of each scanned page to extract relevant facts from them?

If you applied data entry people to a task like that you'd also get a portion of errors

Show thread

Drew Breunig Jan 29, 2023

@simon Also agree. Bureaucracy—where forms and formulas have tried to interface with humans as if they were APIs—and it’s artifacts is where the potential energy lies. Anywhere there is rote tasks and boredom is the X marking the spot.

I call it the ‘Brazil Antidote’ model. Doesn’t get enough attention because it’s not sexy. I’m going to found the Boring AI Working Group.

Show thread

Simon Willison Jan 29, 2023

@dbreunig I find it pretty interesting that a while ago web scraping was rebranded "robotic process automation" and quietly became a multi-billion dollar market https://en.m.wikipedia.org/wiki/Robotic_process_automation

Robotic process automation - Wikipedia

Show thread

Drew Breunig Jan 29, 2023

@simon Over a decade ago I built a consumer surveying tool inside a big media company. The trick used was all survey questions were open-ended, which cut down dramatically the spam responses because writing how you actually felt was easier than lying. (Unlike say, blindly clicking a radio button.) I then fed the responses through MTurk and similar to cluster the responses into multiple choice after the fact. Would be perfect for this stuff.

Show thread

Nuzz 🧋Jan 29, 2023

@simon @dbreunig The use case you describe is commonly done with BERT, which is another language model but I think it works in a different way. I would be slightly curious to see how ChatGPT compares, since BERT was more directly designed for that but ChatGPT is newer and larger.

Show thread

Simon Willison Jan 29, 2023

@plznuzz @dbreunig I'd love to read more about using BERT for this kind of project - I wouldn't know where to even start with that right now