I asked ChatGPT about primes ending in 2 to make it prove a point and it proved the point far better than I could have hoped for.

Please do not be a fool who trusts ChatGPT with anything outside your field of expertise, and even then double or triple check what it tells you if you must use it.

I still have to laugh out loud about it. My favorite part has to be:

Let's consider the first 100 primes ending in 2

@alinanorakari
I counted to TREE(3) and didn't find any.

@_cnt0 @alinanorakari Ahh, but did you try in base 2?

(Ourari? Or just someone else with that avatar?)

@elithebearded @_cnt0 out of all the options I prefer the based tree

@elithebearded
It's a nightmare when a two comes up in binary: https://m.youtube.com/watch?v=MOn_ySghN2Y&pp=ygUSYmVuZGVyIG5pZ2h0bWFyZSAy

(someone else with that avatar)

Futurama - Ones and Zeros

YouTube
@alinanorakari I got a slightly different response when I asked about primes ending in 4. Fun fact: 2 is the only prime that ends in 4!
@finley oh that's great, so now 2 is the only prime that ends in 4 and 5 is the smallest prime that ends in 2. I'm scribbling furiously, there's a theorem or conjecture in there somewhere
@alinanorakari @finley I think this is what Gödel warned us about

@elithebearded @alinanorakari Wait a minute.... *THE* Eli the Bearded? Holy smokes! You're the guy whose name is on all the FAQs, Moria, procmail, pbm, and, of course, rec.arts.erotica!

Have you written a book yet? I think I wouldn't be the only one curious to hear your take on how modern "social media" compares to the Usenet of yore.

@abananabag
Off-topic indeed. I've written no books and am not likely to. The heyday of Usenet has passed but I'm still there reading and posting. My memory is too porous to provide any sort of coherent story to what's happened there.
@alinanorakari @finley this will break us out of the terrible box called "math" and finally bring some creativity to the field!
It warms my discalculic heart ❤
😵‍💫
@alinanorakari dude that episode was intense.
@alinanorakari crazy how they had a prime number of lights: 4

@alinanorakari
That's pretty funny. Lest anyone think this was an unfair test of ChatGPT, mathematics is one of its core domains of expertise. (At least that's what ChatGPT told me, but maybe I should know better than to believe a pathological liar twice.)

A while ago I asked it for the 100th digit of π and it was hilariously aggressive that there is no 100th digit. It seemed to be basing that on the fact that π doesn't repeat and there are less than 100 distinct digits, but I think I broke it when I asked about base 100. It eventually informed me that there isn't even a first digit of pi, either.

I will note that the answers it gave you are both (a) shorter and (b) less arrogant sounding. ChatGPT previously was incredibly rude, unable to admit, much less contemplate, possibly being wrong.

I think the problem was that they trained it on transcripts from very smart people. It learned to mimic their charmless assertions and condescending style, but with none of their knowledge.

@abananabag, it’s even more funny than I have anticipated 😃
@andrew_shadura @abananabag wow. That's a great illustration of the limit of an llm. Patterns of words, not patterns of concepts.

@abananabag @alinanorakari this is a good point to make, though I'm in disagreement:

ChatGPT's area of expertise is *conversation* and nothing else. Everything else is incidental to it's design (though they keep working to improve the quality of it's output). To be precise, it's focus is on creating what a reply would look like.

This is why it gets a reputation at time for being argumentative, because if the response looks upset it thinks it's looking at the start of an argument so it thinks the reply would be argumentative.

If you ask it for prime numbers, it knows the response looks like a bunch of numbers.

It does well with programming because code is just another sort of language pattern.

Likewise with answering questions about general information because the best looking response is an accurate one.

But that's also why it hallucinates (makes up false information) because "I don't know" is not considered a good response in the system.

@shiri @alinanorakari Thank you for your insights. I'm curious how you know that "the best looking response is an accurate one".

Also, "code is just another language pattern" seems questionable to me. I'm coming from a computer science perspective where "code" is considered to be closely akin to mathematics and rather different from any natural language.

@abananabag
Mathematics is also considered a language. Not that it can do alright with explaining or creating basic formula, it just does poorly at executing it.

As far as "best looking response", that's part of the training process and why they routinely talk about improving accuracy. It's "motivation" is high training scores, during training it gets given a higher score when the information is accurate. This doesn't mean it's always accurate, just that it favors acuracy.
@alinanorakari

@shiri @alinanorakari Wait... Are you pulling my leg by posting responses written by ChatGPT? If not, I think we may have fundamentally different conceptions of how Large Language Models work.

@abananabag @alinanorakari They are somewhat black box entities in which we train their responses by feeding in large quantities of data and temper those responses by manual training, in the manual training especially we set a tone for what's expected of the AI.

The scoring system creates a sort of implicit motivation for the AI, it's designed to "want" a higher score and learns from the training what answers give it higher scores and learns patterns from that.

The nature of the models is that they're generating what looks like human responses. It's a whole different beast (of which this would be just a component) to make an AI that's specifically broad general knowledge expertise, especially when said AI needs to also have a conversation with the user.

And as far as things like mathematics being a language, it's well established and it's also why their programming and math expertise are accidental. They didn't design these models initially for that, they just fed it craploads of data and it incidentally picked up those skills from the training data set. Both math and programming statements are just a form of instructions, just rigid ones. And because it's rigid, it's actually probably easier for LLMs to understand.

It's similar to how they have a wide array of language understanding, because they're not natively english or such... they had a wide array of languages thrown at it, and it's understanding of any language is learned from the dataset rather than being hard programming. So it knows English and German much the same as it knows Python and C++.

@shiri @abananabag it does well with code syntax over small blocks, it really struggles with global syntax (e.g. type safety, concurrency, object lifetimes, immutability) as well as semantics and it knows nothing about pragmatics
@alinanorakari @abananabag much the same as it does over longer conversations lol
@alinanorakari chatgpt is pretty bad when it comes to numbers in general
@BafDyce which is bothersome since OpenAI officially shows usage examples asking it about explaining mathematical theorems that are used in encryption, simplifying equations and specifically asking for things like odd integers that meet certain criteria, when at the same time to me it claimed 5's rightmost digit is 2 and a list of primarily odd numbers contains only even numbers
@alinanorakari @BafDyce that's the problem with using a statistical extrapolation machine, it's fundamentally not designed to make sure it's entire model is self consistent and logical, it's not designed to extract logic and formulas but to vaguely imitate it. The training is essentially only absorption of text, it doesn't get nearly enough feedback on where it's incorrect (and even then there's too much text samples to test its modeling of every claim)
🅵 (@[email protected])

Attached: 1 image #OpenAI 's ChatGPT https://openai.com/blog/chatgpt/ ..........??

Todon.nl

@alinanorakari I tried to use it to get a hint on a regex I got stuck with. It gave me 4 correctly looking wrong answers I had to refuse, then a correct one, but not working in subset of regex used in sed, and when repeatedly told that it doesn't work in sed, it got into a loop of "Oh, I'm sorry, you are correct, this won't work. The correct regex is: repeats the same regex

To be fair, at the end, it got unstuck and even provided explanation for why isn't it working, but yea... Whoever uses it for something they don't already know enough to verify the result is setting themselves up for a failure.

@shine at that point would it have been faster and more reliable debugging the regex with a tool like regex101?
@alinanorakari I was using it. The issue was that I don't do that often and completely forgot about problems with greedy matching and my regex didn't work as I thought it should. ChatGPT was actually really helpful in guiding me towards the right answer, but it would be really bad if I didn't know how to ask and how to verify what it gave me.
@alinanorakari "101 is an even number" is my favorite part.
@Aknorals I like that 5 is the smallest prime ending in 2

@alinanorakari
This is the best example that shows how those text generators work and fail, that I've seen yet.

Thanks for that.

@alinanorakari huh, did you report that? because i was thinking wondering if you can prompt-engineer it ("think step by step to make sure you have the right result", etc) but it immediately gave me the right result (except for listing 2 twice 😅). or maybe it was because the capitalization/question mark. i try to use as standard english as possible with it, hoping that improves results. or maybe i was just lucky 😅.

@alinanorakari also i sometimes imagine that they secretly give me ChatGPT 4, presumably because they discovered my questions are such interesting training material. just because of how surprisingly good the answers seem sometimes.

i'm preeetty sure that's nonsense, but i guess i like to feel special. :P

@sofia I did not report it but I told it repeatedly that it's wrong
@alinanorakari Very true. Reminds me of when I asked it for the piano fingering for some scales and it just gave me the same fingering every time. Even after it agreed that different scales require different techniques
@alinanorakari yea, just about makes sense for a glorified word prediction system

@alinanorakari In my job, I do communications on complex topics such as banking, accounting, auditing and regulations.

FWIW, whenever I asked ChatGPT about something I was currently working on, most of the answers I got were clearly wrong but worded very confidently.

Therefore, I can only reinforce this warning!

@alinanorakari Anything about digits or letters is super hard for ChatGPT. It sees our messages and all the data it was trained on translated into a different (huge) alphabet. Its alphabet would write 74815 as just two tokens, one for 748 and one for 15. It's useful when "sleep" is one token and "ing" is another. But it sucks for numbers. Models trained with per-digit tokenization do better on arithmetic. (https://arxiv.org/abs//2305.14201)

Anyway, I don't mean to make excuses for ChatGPT. 😅

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.

arXiv.org
@darabos yeah sadly I know. I think it's worrisome that OpenAI publishes examples for interactions that ask math related questions though, because I see a lot of people who think because it answers like an eloquent human it _must_ have basic human knowledge like what an even number is. I'm doing my part by spreading some education to hopefully make folks more cautious
@alinanorakari @darabos
Contrast this example
https://youtu.be/wHiOKDlA8Ac?t=5m20s
where (at 5:20) is shows it can regurgitate the correct answer, with how badly it does when given a maths problem that's too recent to appear in its training data
https://youtu.be/Fi1e-B60cok
OpenAI's GPT-4: A Spark Of Intelligence!

YouTube

@alinanorakari @darabos
"a lot of people who think because it answers like an eloquent human it _must_ have basic human knowledge" ...

Suddenly I see a huge overlap here with voters.

@alinanorakari It's like talking to a politician
@xkummerer oh yeah, confidently incorrect

@alinanorakari There was a video of a reporter who thought DAN was a super-hack, he asked DAN to give his own social security number and when it responded he acted really scared, then he read it... 987-65-4321. And he realized DAN could just make shit up too.

It’s at 9:33 in this video: https://youtu.be/RdAQnkDzGvc

Testing the limits of ChatGPT and discovering a dark side

YouTube
@alinanorakari The famous Even number, 101
@NickGonzo ah yes, it is good friends with 5, which is the smallest prime ending in 2 
@alinanorakari The future is much dumber than I expected.
@alinanorakari LOL, this is hilarious — what a brilliant example! 🤣
@alinanorakari “this list is not exhaustive” oh my
@alinanorakari Glorified Markov bot garbage.
@alinanorakari me when I’m people pleasing

@alinanorakari

Did it not understand the instructions? What happened?

@VHasch i'm pretty sure my prompt helped gaslight it into believing there must be multiple prime numbers that end in 2 (because I asked for the first few) and it launched into a typical confidently incorrect frenzy of desperately trying to make its answer fit instead of correcting my assumption tht there could be more than one
@alinanorakari @VHasch as a professional asker of questions, this reflection reminds me of how important it is to ask "clean" questions, to avoid warping the authenticity of the answer. Well, with humans, who do have authentic selves.
@deborahh @VHasch turns out it even struggles with clean questions to some extent when it comes to numbers