Mastodawn

Kevin Beaumont 1d ago

The power of ChatGPT

Show thread

Sean Reifschneider 1d ago

@GossiTheDog Cannot confirm. ChatGPT Thinking-Standard.

Show thread

Lou Cyx 22h ago

@jafo @GossiTheDog even then, a reliable source of information should be consistent, meaning both Kevin and you should have gotten the same result, but we all know LLMs aren't consistent (even when the same user asks the same question) so if anything, you added more evidence proving we should avoid LLMs 🤷🏻‍♀️

Show thread

Sean Reifschneider 12h ago

@loucyx @GossiTheDog I don't know about you, but I've long ago learned to not just blindly trust tools I use, on the Internet and elsewhere. I use tools understanding the limitations, and check the work. In this case, it seemed like outside sources corroborated the assertions ChatGPT made. I can't speak to Kevin's answer, because no information on WHAT ChatGPT was given; as I said, I used "Thinking-Standard" to get my answer, YMMV if you use other models.

Show thread

ben

@jafo @loucyx @GossiTheDog but your mileage should not vary. that's the point.

getting a different answer each time is what makes these tools not fit for purpose. if they return the right answer some of the time but you never know which times, what's the point in them?

Show thread

Lou Cyx 11h ago

@benjamineskola @jafo @GossiTheDog 100% this! If they were always right or always wrong it would be one thing, but the only constant is that they are always confident about their answer (either if it’s right or wrong) which is what makes them dangerously unreliable.

And this isn’t even getting into the whole detrimental effect they have on cognitive analysis and reasoning for LLM consumers.

Show thread

Kate 11h ago

@benjamineskola @GossiTheDog @loucyx @jafo It’s the difference between lying and bullshitting. Lying at least has a regard for what is true. Bullshitting doesn’t care if it is correct or not.

Show thread

Sean Reifschneider 6h ago

@benjamineskola Your mileage will *ALWAYS* vary if you use different classes of tools. You're welcome to complain about how a hand saw doesn't produce the same results as a commercial bandsaw at milling down a tree. BUT, saying the tool is failing and then not noting what model you are using is deceptive.

Show thread

ben 5h ago

@jafo Sorry, but this is nonsensical on multiple levels.

These aren’t ‘different classes of tools’. It’s not comparing a hand saw with a power saw; it’s comparing two different models of power saw. Perhaps they are different quality but they’re not fundamentally different.

And you will get different results each time you ask *even with the same model*. You can’t ever guarantee that any one model will give the same results. You tried with a different model and happened to get a better result than the OP; but someone else might try again with the same model as you and get a wrong result again. The criticism was not just ‘this is wrong’ and fixable by using a ‘better’ model to get the right answer; the criticism is that this entire class of tool cannot be relied upon to produce a correct answer.

It will sometimes give a right output — but sometimes it won’t, and you can’t predict when that will be. And that’s in the very nature of the tool.

Show thread

Sean Reifschneider 4h ago

@benjamineskola Ignoring much of what you say, because it's an "agree to disagree" situation. But if you're aware that different models produce drastically different results, what is the use in posting something from a model that is known to give worse results? I'm assuming that's what happened to start this thread.

There are plenty of problems with the AI tools (by that I mean models as well as agents). It's more beneficial to discuss those things than to concoct failures.