Mastodawn

Alina Yossimouse Jun 24, 2023

I asked ChatGPT about primes ending in 2 to make it prove a point and it proved the point far better than I could have hoped for.

Please do not be a fool who trusts ChatGPT with anything outside your field of expertise, and even then double or triple check what it tells you if you must use it.

Show thread

abananabag

@alinanorakari
That's pretty funny. Lest anyone think this was an unfair test of ChatGPT, mathematics is one of its core domains of expertise. (At least that's what ChatGPT told me, but maybe I should know better than to believe a pathological liar twice.)

A while ago I asked it for the 100th digit of π and it was hilariously aggressive that there is no 100th digit. It seemed to be basing that on the fact that π doesn't repeat and there are less than 100 distinct digits, but I think I broke it when I asked about base 100. It eventually informed me that there isn't even a first digit of pi, either.

I will note that the answers it gave you are both (a) shorter and (b) less arrogant sounding. ChatGPT previously was incredibly rude, unable to admit, much less contemplate, possibly being wrong.

I think the problem was that they trained it on transcripts from very smart people. It learned to mimic their charmless assertions and condescending style, but with none of their knowledge.

Show thread

Andrej Shadura Jun 24, 2023

@abananabag, it’s even more funny than I have anticipated 😃

Show thread

okanogen VerminEnemyFromWithin Jun 24, 2023

@andrew_shadura @abananabag
Is that your final, final answer?

Show thread

.morris Jun 25, 2023

@andrew_shadura @abananabag wow. That's a great illustration of the limit of an llm. Patterns of words, not patterns of concepts.

Show thread

Shiri Bailem Jun 28, 2023

@abananabag @alinanorakari this is a good point to make, though I'm in disagreement:

ChatGPT's area of expertise is *conversation* and nothing else. Everything else is incidental to it's design (though they keep working to improve the quality of it's output). To be precise, it's focus is on creating what a reply would look like.

This is why it gets a reputation at time for being argumentative, because if the response looks upset it thinks it's looking at the start of an argument so it thinks the reply would be argumentative.

If you ask it for prime numbers, it knows the response looks like a bunch of numbers.

It does well with programming because code is just another sort of language pattern.

Likewise with answering questions about general information because the best looking response is an accurate one.

But that's also why it hallucinates (makes up false information) because "I don't know" is not considered a good response in the system.

Show thread

abananabag Jun 28, 2023

@shiri @alinanorakari Thank you for your insights. I'm curious how you know that "the best looking response is an accurate one".

Also, "code is just another language pattern" seems questionable to me. I'm coming from a computer science perspective where "code" is considered to be closely akin to mathematics and rather different from any natural language.

Show thread

Shiri Bailem Jun 30, 2023

@abananabag
Mathematics is also considered a language. Not that it can do alright with explaining or creating basic formula, it just does poorly at executing it.

As far as "best looking response", that's part of the training process and why they routinely talk about improving accuracy. It's "motivation" is high training scores, during training it gets given a higher score when the information is accurate. This doesn't mean it's always accurate, just that it favors acuracy.
@alinanorakari

Show thread

abananabag Jun 30, 2023

@shiri @alinanorakari Wait... Are you pulling my leg by posting responses written by ChatGPT? If not, I think we may have fundamentally different conceptions of how Large Language Models work.

Show thread

Shiri Bailem Jun 30, 2023

@abananabag @alinanorakari They are somewhat black box entities in which we train their responses by feeding in large quantities of data and temper those responses by manual training, in the manual training especially we set a tone for what's expected of the AI.

The scoring system creates a sort of implicit motivation for the AI, it's designed to "want" a higher score and learns from the training what answers give it higher scores and learns patterns from that.

The nature of the models is that they're generating what looks like human responses. It's a whole different beast (of which this would be just a component) to make an AI that's specifically broad general knowledge expertise, especially when said AI needs to also have a conversation with the user.

And as far as things like mathematics being a language, it's well established and it's also why their programming and math expertise are accidental. They didn't design these models initially for that, they just fed it craploads of data and it incidentally picked up those skills from the training data set. Both math and programming statements are just a form of instructions, just rigid ones. And because it's rigid, it's actually probably easier for LLMs to understand.

It's similar to how they have a wide array of language understanding, because they're not natively english or such... they had a wide array of languages thrown at it, and it's understanding of any language is learned from the dataset rather than being hard programming. So it knows English and German much the same as it knows Python and C++.

Show thread

Alina Yossimouse Jun 29, 2023

@shiri @abananabag it does well with code syntax over small blocks, it really struggles with global syntax (e.g. type safety, concurrency, object lifetimes, immutability) as well as semantics and it knows nothing about pragmatics

Show thread

Shiri Bailem Jun 29, 2023

@alinanorakari @abananabag much the same as it does over longer conversations lol