Mastodawn

Sea Cow Jun 24, 2023

I asked ChatGPT about primes ending in 2 to make it prove a point and it proved the point far better than I could have hoped for.

Please do not be a fool who trusts ChatGPT with anything outside your field of expertise, and even then double or triple check what it tells you if you must use it.

Show thread

Alina Yossimouse Jun 24, 2023

I still have to laugh out loud about it. My favorite part has to be:

Let's consider the first 100 primes ending in 2

Show thread

_cnt0 Jun 24, 2023

@alinanorakari
I counted to TREE(3) and didn't find any.

Show thread

Eli the Bearded Jun 24, 2023

@_cnt0 @alinanorakari Ahh, but did you try in base 2?

(Ourari? Or just someone else with that avatar?)

Show thread

Alina Yossimouse Jun 24, 2023

@elithebearded @_cnt0 out of all the options I prefer the based tree

Show thread

_cnt0 Jun 24, 2023

@elithebearded
It's a nightmare when a two comes up in binary: https://m.youtube.com/watch?v=MOn_ySghN2Y&pp=ygUSYmVuZGVyIG5pZ2h0bWFyZSAy

(someone else with that avatar)

Futurama - Ones and Zeros

YouTube

Show thread

Finley Ⓥ Jun 24, 2023

@alinanorakari I got a slightly different response when I asked about primes ending in 4. Fun fact: 2 is the only prime that ends in 4!

Show thread

Alina Yossimouse Jun 24, 2023

@finley oh that's great, so now 2 is the only prime that ends in 4 and 5 is the smallest prime that ends in 2. I'm scribbling furiously, there's a theorem or conjecture in there somewhere

Show thread

Eli the Bearded Jun 24, 2023

@alinanorakari @finley I think this is what Gödel warned us about

Show thread

abananabag Jun 24, 2023

@elithebearded @alinanorakari Wait a minute.... *THE* Eli the Bearded? Holy smokes! You're the guy whose name is on all the FAQs, Moria, procmail, pbm, and, of course, rec.arts.erotica!

Have you written a book yet? I think I wouldn't be the only one curious to hear your take on how modern "social media" compares to the Usenet of yore.

Show thread

Eli the Bearded Jun 24, 2023

@abananabag
Off-topic indeed. I've written no books and am not likely to. The heyday of Usenet has passed but I'm still there reading and posting. My memory is too porous to provide any sort of coherent story to what's happened there.

Show thread

Deb has moved! (see profile) 🇨🇦Jun 25, 2023

@alinanorakari @finley this will break us out of the terrible box called "math" and finally bring some creativity to the field!
It warms my discalculic heart ❤
😵‍💫

Show thread

Alina Yossimouse Jun 24, 2023

Show thread

Allan Chow Jun 24, 2023

@alinanorakari dude that episode was intense.

Show thread

maya Aug 26, 2023

@alinanorakari crazy how they had a prime number of lights: 4

Show thread

abananabag Jun 24, 2023

@alinanorakari
That's pretty funny. Lest anyone think this was an unfair test of ChatGPT, mathematics is one of its core domains of expertise. (At least that's what ChatGPT told me, but maybe I should know better than to believe a pathological liar twice.)

A while ago I asked it for the 100th digit of π and it was hilariously aggressive that there is no 100th digit. It seemed to be basing that on the fact that π doesn't repeat and there are less than 100 distinct digits, but I think I broke it when I asked about base 100. It eventually informed me that there isn't even a first digit of pi, either.

I will note that the answers it gave you are both (a) shorter and (b) less arrogant sounding. ChatGPT previously was incredibly rude, unable to admit, much less contemplate, possibly being wrong.

I think the problem was that they trained it on transcripts from very smart people. It learned to mimic their charmless assertions and condescending style, but with none of their knowledge.

Show thread

Andrej Shadura Jun 24, 2023

@abananabag, it’s even more funny than I have anticipated 😃

Show thread

okanogen VerminEnemyFromWithin Jun 24, 2023

@andrew_shadura @abananabag
Is that your final, final answer?

Show thread

.morris Jun 25, 2023

@andrew_shadura @abananabag wow. That's a great illustration of the limit of an llm. Patterns of words, not patterns of concepts.

Show thread

Shiri Bailem Jun 28, 2023

@abananabag @alinanorakari this is a good point to make, though I'm in disagreement:

ChatGPT's area of expertise is *conversation* and nothing else. Everything else is incidental to it's design (though they keep working to improve the quality of it's output). To be precise, it's focus is on creating what a reply would look like.

This is why it gets a reputation at time for being argumentative, because if the response looks upset it thinks it's looking at the start of an argument so it thinks the reply would be argumentative.

If you ask it for prime numbers, it knows the response looks like a bunch of numbers.

It does well with programming because code is just another sort of language pattern.

Likewise with answering questions about general information because the best looking response is an accurate one.

But that's also why it hallucinates (makes up false information) because "I don't know" is not considered a good response in the system.

Show thread

abananabag Jun 28, 2023

@shiri @alinanorakari Thank you for your insights. I'm curious how you know that "the best looking response is an accurate one".

Also, "code is just another language pattern" seems questionable to me. I'm coming from a computer science perspective where "code" is considered to be closely akin to mathematics and rather different from any natural language.

Show thread

Shiri Bailem Jun 30, 2023

@abananabag
Mathematics is also considered a language. Not that it can do alright with explaining or creating basic formula, it just does poorly at executing it.

As far as "best looking response", that's part of the training process and why they routinely talk about improving accuracy. It's "motivation" is high training scores, during training it gets given a higher score when the information is accurate. This doesn't mean it's always accurate, just that it favors acuracy.
@alinanorakari

Show thread

abananabag Jun 30, 2023

@shiri @alinanorakari Wait... Are you pulling my leg by posting responses written by ChatGPT? If not, I think we may have fundamentally different conceptions of how Large Language Models work.

Show thread

Shiri Bailem Jun 30, 2023

@abananabag @alinanorakari They are somewhat black box entities in which we train their responses by feeding in large quantities of data and temper those responses by manual training, in the manual training especially we set a tone for what's expected of the AI.

The scoring system creates a sort of implicit motivation for the AI, it's designed to "want" a higher score and learns from the training what answers give it higher scores and learns patterns from that.

The nature of the models is that they're generating what looks like human responses. It's a whole different beast (of which this would be just a component) to make an AI that's specifically broad general knowledge expertise, especially when said AI needs to also have a conversation with the user.

And as far as things like mathematics being a language, it's well established and it's also why their programming and math expertise are accidental. They didn't design these models initially for that, they just fed it craploads of data and it incidentally picked up those skills from the training data set. Both math and programming statements are just a form of instructions, just rigid ones. And because it's rigid, it's actually probably easier for LLMs to understand.

It's similar to how they have a wide array of language understanding, because they're not natively english or such... they had a wide array of languages thrown at it, and it's understanding of any language is learned from the dataset rather than being hard programming. So it knows English and German much the same as it knows Python and C++.

Show thread

Alina Yossimouse Jun 29, 2023

@shiri @abananabag it does well with code syntax over small blocks, it really struggles with global syntax (e.g. type safety, concurrency, object lifetimes, immutability) as well as semantics and it knows nothing about pragmatics

Show thread

Shiri Bailem Jun 29, 2023

@alinanorakari @abananabag much the same as it does over longer conversations lol

Show thread

Fabian ¯\_(ツ)_/¯Jun 24, 2023

@alinanorakari chatgpt is pretty bad when it comes to numbers in general

Show thread

Alina Yossimouse Jun 24, 2023

@BafDyce which is bothersome since OpenAI officially shows usage examples asking it about explaining mathematical theorems that are used in encryption, simplifying equations and specifically asking for things like odd integers that meet certain criteria, when at the same time to me it claimed 5's rightmost digit is 2 and a list of primarily odd numbers contains only even numbers

Show thread

Natanael ⚠️Jun 24, 2023

@alinanorakari @BafDyce that's the problem with using a statistical extrapolation machine, it's fundamentally not designed to make sure it's entire model is self consistent and logical, it's not designed to extract logic and formulas but to vaguely imitate it. The training is essentially only absorption of text, it doesn't get nearly enough feedback on where it's incorrect (and even then there's too much text samples to test its modeling of every claim)

Show thread

🅵 Jun 24, 2023

@BafDyce @alinanorakari

same with letters

https://todon.nl/@fedor/109446742764634023

🅵 (@[email protected])

Attached: 1 image #OpenAI 's ChatGPT https://openai.com/blog/chatgpt/ ..........??

Todon.nl

Show thread

Shine Jun 24, 2023

@alinanorakari I tried to use it to get a hint on a regex I got stuck with. It gave me 4 correctly looking wrong answers I had to refuse, then a correct one, but not working in subset of regex used in sed, and when repeatedly told that it doesn't work in sed, it got into a loop of "Oh, I'm sorry, you are correct, this won't work. The correct regex is: repeats the same regex

To be fair, at the end, it got unstuck and even provided explanation for why isn't it working, but yea... Whoever uses it for something they don't already know enough to verify the result is setting themselves up for a failure.

Show thread

Alina Yossimouse Jun 24, 2023

@shine at that point would it have been faster and more reliable debugging the regex with a tool like regex101?

Show thread

Shine Jun 24, 2023

@alinanorakari I was using it. The issue was that I don't do that often and completely forgot about problems with greedy matching and my regex didn't work as I thought it should. ChatGPT was actually really helpful in guiding me towards the right answer, but it would be really bad if I didn't know how to ask and how to verify what it gave me.

Show thread

Aknorals⚑Ⓐ 🏴Jun 24, 2023

@alinanorakari "101 is an even number" is my favorite part.

Show thread

Alina Yossimouse Jun 24, 2023

@Aknorals I like that 5 is the smallest prime ending in 2

Show thread

Cegorach Jun 24, 2023

@alinanorakari
This is the best example that shows how those text generators work and fail, that I've seen yet.

Thanks for that.

Show thread

sofia ☮️🏴Jun 24, 2023

@alinanorakari huh, did you report that? because i was thinking wondering if you can prompt-engineer it ("think step by step to make sure you have the right result", etc) but it immediately gave me the right result (except for listing 2 twice 😅). or maybe it was because the capitalization/question mark. i try to use as standard english as possible with it, hoping that improves results. or maybe i was just lucky 😅.

Show thread

sofia ☮️🏴Jun 24, 2023

@alinanorakari also i sometimes imagine that they secretly give me ChatGPT 4, presumably because they discovered my questions are such interesting training material. just because of how surprisingly good the answers seem sometimes.

i'm preeetty sure that's nonsense, but i guess i like to feel special. :P

Show thread

Alina Yossimouse Jun 24, 2023

@sofia I did not report it but I told it repeatedly that it's wrong

Show thread

Möbius Moa🪶Jun 24, 2023

@alinanorakari Very true. Reminds me of when I asked it for the piano fingering for some scales and it just gave me the same fingering every time. Even after it agreed that different scales require different techniques

Show thread

yuki - queen of the snow Jun 24, 2023

@alinanorakari yea, just about makes sense for a glorified word prediction system

Show thread

Stefanie Schulte Jun 24, 2023

@alinanorakari In my job, I do communications on complex topics such as banking, accounting, auditing and regulations.

FWIW, whenever I asked ChatGPT about something I was currently working on, most of the answers I got were clearly wrong but worded very confidently.

Therefore, I can only reinforce this warning!

Show thread

Daniel Darabos Jun 24, 2023

@alinanorakari Anything about digits or letters is super hard for ChatGPT. It sees our messages and all the data it was trained on translated into a different (huge) alphabet. Its alphabet would write 74815 as just two tokens, one for 748 and one for 15. It's useful when "sleep" is one token and "ing" is another. But it sucks for numbers. Models trained with per-digit tokenization do better on arithmetic. (https://arxiv.org/abs//2305.14201)

Anyway, I don't mean to make excuses for ChatGPT. 😅

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

We introduce Goat, a fine-tuned LLaMA model that significantly outperforms GPT-4 on a range of arithmetic tasks. Fine-tuned on a synthetically generated dataset, Goat achieves state-of-the-art performance on BIG-bench arithmetic sub-task. In particular, the zero-shot Goat-7B matches or even surpasses the accuracy achieved by the few-shot PaLM-540B. Surprisingly, Goat can achieve near-perfect accuracy on large-number addition and subtraction through supervised fine-tuning only, which is almost impossible with previous pretrained language models, such as Bloom, OPT, GPT-NeoX, etc. We attribute Goat's exceptional performance to LLaMA's consistent tokenization of numbers. To tackle more challenging tasks like large-number multiplication and division, we propose an approach that classifies tasks based on their learnability, and subsequently decomposes unlearnable tasks, such as multi-digit multiplication and division, into a series of learnable tasks by leveraging basic arithmetic principles. We thoroughly examine the performance of our model, offering a comprehensive evaluation of the effectiveness of our proposed decomposition steps. Additionally, Goat-7B can be easily trained using LoRA on a 24GB VRAM GPU, facilitating reproducibility for other researchers. We release our model, dataset, and the Python script for dataset generation.

arXiv.org

Show thread

Alina Yossimouse Jun 24, 2023

@darabos yeah sadly I know. I think it's worrisome that OpenAI publishes examples for interactions that ask math related questions though, because I see a lot of people who think because it answers like an eloquent human it _must_ have basic human knowledge like what an even number is. I'm doing my part by spreading some education to hopefully make folks more cautious

Show thread

Bornach Jun 24, 2023

@alinanorakari @darabos
Contrast this example
https://youtu.be/wHiOKDlA8Ac?t=5m20s
where (at 5:20) is shows it can regurgitate the correct answer, with how badly it does when given a maths problem that's too recent to appear in its training data
https://youtu.be/Fi1e-B60cok

OpenAI's GPT-4: A Spark Of Intelligence!

YouTube

Show thread

Deb has moved! (see profile) 🇨🇦Jun 25, 2023

@alinanorakari @darabos
"a lot of people who think because it answers like an eloquent human it _must_ have basic human knowledge" ...

Suddenly I see a huge overlap here with voters.

Show thread

chuckadeus kummerer Jun 24, 2023

@alinanorakari It's like talking to a politician

Show thread

Alina Yossimouse Jun 24, 2023

@xkummerer oh yeah, confidently incorrect

Show thread

Tom Jun 24, 2023

@alinanorakari There was a video of a reporter who thought DAN was a super-hack, he asked DAN to give his own social security number and when it responded he acted really scared, then he read it... 987-65-4321. And he realized DAN could just make shit up too.

It’s at 9:33 in this video: https://youtu.be/RdAQnkDzGvc

Testing the limits of ChatGPT and discovering a dark side

YouTube

Show thread

Nick Gonzo Jun 24, 2023

@alinanorakari The famous Even number, 101

Show thread

Alina Yossimouse Jun 24, 2023

@NickGonzo ah yes, it is good friends with 5, which is the smallest prime ending in 2

Show thread

Nick Gonzo Jun 24, 2023

@alinanorakari The future is much dumber than I expected.

Show thread

Klaus Alexander Seiﬆrup Jun 24, 2023

@alinanorakari LOL, this is hilarious — what a brilliant example! 🤣

Show thread

Andromeda Yelton Jun 24, 2023

@alinanorakari “this list is not exhaustive” oh my

Show thread

Cassandrich Jun 24, 2023

@alinanorakari Glorified Markov bot garbage.

Show thread

Medea Vanamonde🏳️‍⚧️ ♀Jun 24, 2023

@alinanorakari sneaky little racters they are

Show thread

Jonathan Jun 24, 2023

@alinanorakari me when I’m people pleasing

Show thread

VHasch Jun 24, 2023

@alinanorakari

Did it not understand the instructions? What happened?

Show thread

Alina Yossimouse Jun 24, 2023

@VHasch i'm pretty sure my prompt helped gaslight it into believing there must be multiple prime numbers that end in 2 (because I asked for the first few) and it launched into a typical confidently incorrect frenzy of desperately trying to make its answer fit instead of correcting my assumption tht there could be more than one

Show thread

Deb has moved! (see profile) 🇨🇦Jun 25, 2023

@alinanorakari @VHasch as a professional asker of questions, this reflection reminds me of how important it is to ask "clean" questions, to avoid warping the authenticity of the answer. Well, with humans, who do have authentic selves.

Show thread

Alina Yossimouse Jun 25, 2023

@deborahh @VHasch turns out it even struggles with clean questions to some extent when it comes to numbers