If you try to use ChatGPT and similar language models as search engines they're going to lie to you, a lot, and you're at risk of writing off the whole space as hype

The trick is to learn what they're useful for and how to take advantage of them, which is actually quite a lot of work

@simon Every time someone says "you've just got to learn what these glorified Markov chains are good for" it sounds a lot like "but not ALL cryptocurrencies are Ponzi schemes, some are nice"

@jwz it's weird, I'm very firmly in the "cryptocurrencies are a waste of everyone's time" camp, but the more time I spend with large language models the more convinced I am that they are going to let me solve all kinds of problems that I couldn't solve before

Nailing down exactly what those problems are is a lot more involved than I think most people expect though

@simon What kind of problems?

Even if there are upsides, the downsides seem pretty severe. These systems are optimized for bullshit and lies, like, that's their core competency to which all use degrades. See also "facial recognition is the plutonium of AI". https://jwz.org/b/yjMP

@jwz Extracting structured data from unstructured text is one promising angle - I'm interested in the potential for investigative data journalism, for problems like turning 20,000 ad-hoc poor quality scanned police complaint reports into actionable information, without spending six months on human-powered data entry first
@simon @jwz without spending six months on human-powered data entry first, you will never be able to know if the GPT result is a series of hallucinations, like so much of its output

@amyhoy @jwz I'm talking about prompts like "Here is a copy and pasted police report. Return JSON with the names of the mentioned officers and the date of the incident"

My hunch here is that spot checks on the results could help tell if it's working well enough, and that the end results would reach the same level of accuracy as asking human data entry people (who are also infallible) to do the same task

If there's a better way to do this than using a language model I'm interested to hear it

@simon @amyhoy
But how could you ever trust that data? You are asking for *facts* but the system is optimized to produce *believable answers* which are not at all the same thing.

Suppose the system optimizes its march to the goal by just making up some numbers that subtly (or not so subtly) tilt the data one way or another. Now you've built a black box to confirm your biases.

And the black box, by its nature, cannot "show its work" without lying.

@jwz @amyhoy The black box thing is why I'm finding this whole space so utterly beguiling

I hate that it's a black box. But I've spent my entire career working with computers that do exactly what you tell them... and now I'm faced with one that very much does not do that

It's like someone's given me a spell that raises actual dragons from another dimension and challenged me to try and tame them!

@jwz @amyhoy I can't see this tech being un-invented, so the interesting question to me now is what I can build with it now that I couldn't build before - and what are the new, genuinely valuable problems I can solve for people
@simon @amyhoy I've got no time for that attitude, whether applied to dangerous software or chemical weapons. We regulate things that cause harm.

@jwz
@simon @amyhoy

Actually, we generally don't do a good job of regulating things that hurt working class people.

@ian @jwz @simon @amyhoy It is actually really good at syntactic problems. That seems fine to me. If you go looking for the truth, that might be misguided.
@ian @simon @sayrer @jwz there are so many cases of it doing wrong (very basic!!) math and “explaining” why its wrong stuff is right. not to mention anything more complex than 2+2=4, like word problems. so i would say no, it isn’t very good at syntax problems.

@amyhoy @ian @sayrer @jwz It's a next-token-predicting language model, so using it for math is very much the wrong application of it - that's one of the many reasons I keep trying to convince people that these things are deceptively difficult to use effectively

The idea that a computer can be bad at math is very counter-intuitive!

@simon @amyhoy @ian @sayrer @jwz I think a key part here is making sure those of us who are critical are specific and fluent in the systems we’re criticizing, rather than blanket dismissal that sounds glib because we’re fast-forwarding to the conclusion instead of showing our work.
@simon @amyhoy @ian @sayrer @jwz What Molly White (and to some degree, Moxie) did in breaking down the faults and flaws of crypto assertions did far more to hasten good regulation than any amount of “it a bunch of dumb scams!” ranting did. Simon’s path here seems more likely to yield effective harm reduction.

@anildash @simon @amyhoy @ian @sayrer @jwz I just wish it were easier to separate the wheat from the chaff.

As someone who is generally positive toward this technology, I can still benefit from thoughtful critiques on its efficacy informed by experts who understand its limits.

Even amongst those who see value here, it’s important to understand what we’re dealing with and how far it can be taken safely.

@jeff @anildash @simon @amyhoy @ian @sayrer @jwz But much of the discourse seems like motivated reasoning, as it’s perceived as posing a risk to certain professions.

For many it seems less about the technology’s limits and more that they don’t _want_ it to be/get good.

@jeff @anildash @simon @amyhoy @ian @sayrer @jwz what would it mean for this tech to “be good”? Its entire *purpose* is to be a bullshit fountain. Like, at minimum, to be able to provide reliable outputs, the training data would need to be editorially flagged as “true” or “false”, not an undifferentiated slurry of Internet Words, which is such a monumental undertaking as to make the whole effort no longer cost-effective.
@jeff @anildash @simon @amyhoy @ian @sayrer @jwz as specified, an LLM’s job is always to repeat common misconceptions or likely errors, not to produce accurate results. By construction its erroneous outputs will always be maximally unsurprising so as to subvert spot-checking; its stipulated goal is just to make the same mistakes the median human would make, just… faster

@glyph @jeff @anildash @simon @amyhoy @ian @jwz

Is this right? I think it seems ok. I don't use Twisted, but it seems about right and I could fix anything I don't like or that is in fact incorrect.

@sayrer @glyph @anildash @simon @amyhoy @ian @jwz I don’t use Python so I can’t say, but I have had it write significantly more complex command line tools to do similar in Swift which worked.

Granted, it took a few follow-up replies to get it right, and I benefit from being a Swift dev.

Still faster than writing from scratch, and I have no reason to assume it won’t improve in time.

@jeff @glyph @anildash @simon @amyhoy @ian @jwz Right, so you can get something like "create-react-app", but more versatile. I don't really object to project templates, but they don't write programs either.
@sayrer @jeff @anildash @simon @amyhoy @ian @jwz reproducing small, uncontroversial examples is something that it undoubtedly excels at. In this case it’s reproduced a bad, legacy way of accomplishing this, but I can’t fault it for that; the overwhelming majority of historical training data would present it that way. (In fact this is so short it’s nearly plagiarized from historical documentation)
@sayrer @jeff @anildash @simon @amyhoy @ian @jwz FWIW I do strongly agree with Anil here — my minor gripes here are not going to lead to substantive policy outcomes, for that we will definitely need an “AI is going just great”; and, no disrespect to molly, but AI is a less target-rich environment than blockchain (it’s hard to imagine a *more* target-rich environment than that) so this is going to be a bigger lift
@glyph @jeff @anildash @simon @amyhoy @ian @jwz so, the key here is that you can have it elaborate. I wrote "Can you write one..." and it understood that. That is a good advance, and I think it's a mistake to focus on the answers, which I agree are mostly precooked.
@sayrer @jeff @anildash @simon @amyhoy @ian @jwz and this is definitely wrong, in kind of a funny way
@glyph @jeff @anildash @simon @amyhoy @ian @jwz I'm cool with that, I've never written a program in Twisted, but I knew you wrote it. How is it wrong?

@sayrer (removing the large CC list here, because I don't think this is of quite so broad an interest)

1. it never sets a content type, so it's not controlling the interpretation of the response
2. it's manually doing quoting rather than using the built-in twisted.web.template
3. it's assuming it's emitting HTML but it doesn't enclose anything in an HTML document
4. the request isn't necessarily in UTF-8, so there's maybe some wiggle room for an encoding-confusion attack here

@sayrer it's also doing some stylistic stuff wrong that other examples do wrong; it's using listenTCP rather than endpoints, it's not encapsulating its main fucntion in an `if __name__ == '__main__':` block, just executing it at the top level of the script so it can't be a module, since it's doing listenTCP it can't do HTTPS (which is, I should say, the common-est security problem)
@sayrer it's just the sort of thing that I would expect a cut-rate doesn't-really-know-Twisted consultant to come in and do
@sayrer it also should probably be in a rpy so you can use `twist web` and not run it directly with python but that's really nitpicking :)
@glyph Right, it's actually interesting! I knew it was doing the Python wrong (it's always wrong, and they seem to just make a habit of changing the rules...), and all of the HTTP stuff was a little too simple (I'm in the HTTP 1.1 acks), but I thought it was funny that it got so close.

@glyph Your #1 and #4 are not required, since I think you would hit the chardet stuff, and #3 seems pretty picky (the HTML5 algorithm would automatically insert the needed elements). #2 I can't speak to, but I believe you that it's not idiomatic.

So you have a thing that would actually totally work, but has been arrived at in a strange way.

@glyph Now, of course the next step is to put this generated code on the internet and see what happens.
@glyph @sayrer @jeff @anildash @jwz @amyhoy @simon @ian but it didn’t understand, it just computed what a likely response would have been, if one had been included in its training set
@ShadSterling @glyph @jeff @anildash @jwz @amyhoy @simon @ian I agree with what you say about the response, but it did understand that it was to follow up on the previous effort, without any explicit nouns from me. If you try that kind of thing with the various talking cylinders, you will not get that (maybe they're better now, I don't use them daily).
@glyph @sayrer @jeff @anildash @simon @amyhoy @ian @jwz If the premise is "this is.a tool that produces close, but wrong, answers," I think that could still be useful. Basically useful in any space where verification/fixing is cheaper than authoring. I could probably use it to answer beginner questions, since most of my time with those is spent typing the answer.
@agocke @glyph @jeff @anildash @simon @amyhoy @ian @jwz oh, precisely. I’m surprised that some others are so hostile here, when they could automate a large variety of repetitive computer questions. these aren’t necessarily stupid questions, but you tend to deal with a lot of the same ones if you are nice enough to answer at all.
@anildash @jeff @jwz @sayrer @glyph @ian @simon @amyhoy I think it’s still fair to be critical. That use case is far narrower then the marketing. And importantly it never allows you to remove expert oversight
@glyph most evolving frameworks suffer from this. You google, and find the recommendations from 5 years ago, and only during a code review someone calls you out for being a Dinosaur
@Migueldeicaza yes, hence I can’t fault it; this isn’t a *controversial* choice, it’s just the one that most people would choose with some light research. (Heck, probably a bunch of up-to-date docs still describe things this way, it’s hard to do comprehensive updates on a shoestring volunteer budget). Just an example of how LLMs are idea popularity-contest collages and not reasoning beings you can ask for correct answers

@glyph @jeff @anildash @simon @amyhoy @ian @sayrer @jwz
One thing we’ve tested is it’s ability to generate a large batch of MCQs on a topic that we (as domain experts) can then rapidly prune down to an effective set of questions. In that context the ‘bullshit fountain’ is very useful as writing effective distractors that are viable but _wrong_ is hard to do but results are relatively easy to check.

Perfect accuracy isn’t necessarily required for every useful activity, just more effectiveness.

@glyph @anildash @simon @amyhoy @ian @sayrer @jwz I’ve used that “bullshit fountain” to write a reasonably complex Swift command line tool to accept input from the user, interact with data sets on AWS S3, and spit back results.

I verified the code and its output.

I find this “it doesn’t work” stuff to be a bit overly dismissive. It’s useful for SOME things, clearly.

@jeff @anildash @simon @amyhoy @ian @sayrer @jwz I didn’t say it doesn’t “work”, I said its definition for “good” is unclear. it’s currently useful to produce outputs that correspond to the median internet user writing on a particular topic, regardless of accuracy. Perhaps the median swift programmer can write an AWS CLI with no particular common security errors, in which case I’m sure your code works great.
@jeff @anildash @simon @amyhoy @ian @sayrer @jwz like the dials on radium watches really did glow! That sliver of utility was not in dispute. But the overall cost-benefit was not worth it. Here, the cost is that once you scale up past trivial examples and convince yourself that it can be unsupervised (or inevitably succumb to review fatigue from the humans in the loop), it will immediately start producing worse quality on more complex tasks

@jeff @glyph @anildash @simon @amyhoy @ian @sayrer
What I (and others) have been saying is not "it doesn't work". I at least am saying:

1) It does not do what you think it does;

2) The thing that you appear to want is absolutely not a thing that it does;

3) It is extremely skilled at lying to you about point #2.

Building A Virtual Machine inside ChatGPT

Unless you have been living under a rock, you have heard of this new ChatGPT assistant made by OpenAI. Did you know, that you can run a whole virtual machine inside of ChatGPT?

Engraved

@glyph well, it's kinda "good" if you actually want bullshit.

Like "letter to X about Y and" Z" or "news-post about X dying" where you'd get a letter with all the default greetings/boilerplate and stuff.

Yes, you'd have to clear up the actual content. But probably less so than if you'd reuse the last letter you'd written, like many people do.

Don't think anything technical is a good application of that bullshit-fountain - but many people spend a lot of time at manually generating bullshit.

@drazraeltod
This right here is the main point isn't it? Think of all those people who are afraid of losing their job to ChatGPT. It's a tacit acknowledgement that their job involves primarily bullshitting. There's even a whole book about that https://en.m.wikipedia.org/wiki/Bullshit_Jobs
@glyph
Bullshit Jobs - Wikipedia

@glyph @jeff @anildash @simon @amyhoy @ian @sayrer @jwz Yup. This is why people in really niche domains are impressed, because the domain knowledge fed into the system is lots of stuff where the content is all in agreement. But as soon as you get out to general information it all falls apart without being able to add some kind of accuracy scoring