Mastodawn

Rai Jul 17, 2024

Aaron Toponce ⚛️

Generative AI is garbage, exhibit 99:

#ai #math

Show thread

Estarriol, Terrorist Dragon Jul 16, 2024

@atoponce that is dangerous garbage, next it will apply bodmas....

Show thread

ari :autism: Jul 16, 2024

@atoponce 🔥 mathematics 2 just dropped

Show thread

Alex The Silly Kitty

Jul 16, 2024

@atoponce LMAO what's next, a quarter pounder being larger than a third? 💀

Show thread

Cody Boone Ferguson Jul 16, 2024

@alexthecat @atoponce like this for example ? I don’t know if it’s true but I have zero problem believing it sadly. I have even seen two sets of calculators get different results in a very simple calculation and one of them was wrong. Well obviously one was wrong but you know what I mean.

Show thread

Nizar Kerkeni 🇹🇳 نزار القرقني Jul 16, 2024

@atoponce @djayroma For Mixtral an open source IA:

Show thread

Camille - stature de tragédien Jul 16, 2024

@nizarus @atoponce
What did you expect 🤣

Show thread

Christine Burns MBE 🏳️‍⚧️📚⧖Jul 16, 2024

@atoponce Put another gigawatt in the server farm!

Show thread

ENIGMATICO

Jul 16, 2024

@atoponce You should have asked to express both of them as a fraction and compare them, but I guess that too would fail catastrophically.

Show thread

uwu1ba Jul 16, 2024

@atoponce of course that glorified markov chain can't do math

Show thread

Violet Jul 16, 2024

@atoponce
Hmm, well, 9.11 is wider.

Show thread

Violet Jul 16, 2024

@atoponce
And an answer from Copilot just now:

Show thread

Violet Jul 16, 2024

@atoponce
This time I asked which number has the greater numerical value. It gives a correct explanation, but stalls out before concluding that 9.9 is greater, almost as if it doesn't want to contradict its previous answer.

Show thread

Mᴀʀᴋ VᴀɴᴅᴇWᴇᴛᴛᴇʀɪɴɢ Jul 16, 2024

@violet @atoponce Truly, we have been freed from the labor of our demonstrating our own ignorance, biases, and stupidity, and the end must be near.

Show thread

Philip Wilson Jul 16, 2024

@violet @atoponce

I asked Gemini. It was wrong too.
I asked it for its drafts. It got it right in 1 of 3.

Show thread

Ian McGecko Jul 17, 2024

@violet
Ah, the classic "Ah, the classic" that instantly gives away AI slop

Show thread

HD Jul 16, 2024

@atoponce Conversely, why not both

Show thread

tocisz Jul 16, 2024

@atoponce I agree, 9.11 is a big deal. Never heard of 9.9. You haven't specified which metric to use when comparing.

Show thread

tocisz Jul 16, 2024

@atoponce but it got lost in the explanation for sure

Show thread

tocisz Jul 16, 2024

@atoponce but at least it's politically correct

Show thread

johnaldis Jul 16, 2024

@atoponce It’s not “garbage” exactly. It’s not correct maths, but for example Michael Rosen’s “Hairy Tales and Nursery Crimes” is full of “factual errors”, and no-one should think that is “garbage”. The only issue here is if you try to use generative AI to give you correct answers, which is like trying to get an oboe to tell you who composed Handel’s Messiah.

Show thread

doephin Jul 17, 2024

@johnaldis @atoponce someone should tell the marketers this because they don't seem to know

Show thread

DearFox Jul 16, 2024

@atoponce
I'm also trash, I'm bad at math and don't know the multiplication table, let alone multiplying and dividing fractions

Show thread

Petri Salmela Jul 20, 2024

@DearFox @atoponce But you know it, so you are not trash.

Show thread

DearFox Jul 16, 2024

@atoponce
Well, for the sake of truth, they write it themselves "ChatGPT can make mistakes. Check important info."

Show thread

Jason, Human, Average

Jul 16, 2024

@atoponce
GPT, you got some brain worms.
@rose_alibi

Show thread

Aral Balkan Jul 18, 2024

@jhoward @atoponce @rose_alibi Don’t worry, it’s at least reliably good at reducing our available water supply.

Show thread

Jason, Human, Average

Jul 18, 2024

@aral
That's OK. I'm hearing great things about Brawndo.
@atoponce @rose_alibi

Show thread

Michael ☕️Jul 16, 2024

@atoponce
I don't know what you're talking about.
9.11 is clearly bigger than 9.9.
9.11 is 4 characters, and 9.9 is only 3.
#EverythingIsAString

Show thread

Toxo77 Jul 17, 2024

@mcpinson @atoponce thanks, JavaScript !

Show thread

Andrew Benedict-Nelson Jul 16, 2024

@atoponce hey look it’s a word calculator not a calculator calculator

Show thread

Kelsey Jordahl Jul 16, 2024

@albnelson @atoponce yeah, why would we expect a huge, optimized linear algebra machine to be able to do arithmetic?

Show thread

Andrew Benedict-Nelson Jul 16, 2024

@kajord @atoponce see, algebra is where we went wrong in the first place. Never should have mixed up letters and numbers.

Show thread

Darrin West Jul 16, 2024

@atoponce They want to hook this up to gene sequencing machines. Why do these tech bros exist? Nature may try to eliminate them by eliminating us all.

Show thread

Lewis Edwards Jul 16, 2024

@atoponce "Gee, I wonder why our probe slammed into the surface of the planet?"

Show thread

Jul 16, 2024

@atoponce This is actually a tokenization error. 9.11 looks larger than 9.9 because 11 tokenizes as a single unit and 11 is usually larger than 9.

Show thread

Gustavo Jul 16, 2024

@cadey @atoponce In other words, despite all efforts to make math work better with LLMs, like adding Python support, it's still bad at it. Also it inherited the overconfidence from the dataset, which should include Reddit.

Show thread

ry_Jul 16, 2024

@atoponce it’s not garbage, it’s not intelligent. It has useful applications, but arithmetic isn’t one apparently. It’s not wholly surprising. LLMs model natural language. Arithmetic isn’t natural language.

Show thread

Tor Iver Wilhelmsen Jul 16, 2024

@ry_ @atoponce That would basically mean Copilot in Excel is nearly pointless, unless it does something smarter than the rest of Copilot for M365

Show thread

ry_Jul 16, 2024

@toriver @atoponce I’ve not used copilot, but I assume its numerical analysis output is not directly via the attention mechanism of an llm. Eg it could be using a llm to predict the context to data which is then fed into routines, or use llms to offer code suggestions. None of these things are llms directly doing maths.

Show thread

Noah Cook Jul 16, 2024

@atoponce I wonder: you know how virtual assistants are given feminine names and voices (Siri, Alexa)? And you know how there is a persistant false belief that women are somehow worse at math than men?

I have to wonder whether that combination of biases has any influence on the programmers who create these LLMs? I mean on top of all of the other biases and misunderstandings they already have about neuroscience and language? Are they creating their own stereotype of a ditzy secretary?

Show thread

Bill Zaumen Jul 16, 2024

@UncivilServant @atoponce It would help if "virtual assistants" used a name and voice appropriate to a 5 year old child.
There was an old TV program called "Kids say the Darndest Things" with clips of it shown on The Bill Crosby show (well before he was arrested):
https://www.youtube.com/watch?v=G1voLZyI0SM

Art Linkletter's Kids Say The Darndest Things | 1995 Special with Bill Cosby (CBS)

YouTube

Show thread

flere-imsaho 🇺🇦Jul 16, 2024

@UncivilServant this has nothing to do with biases; llms don't produce correct answers, they produce statistically-probable text completion. @atoponce

Show thread

masukomi Jul 16, 2024

@atoponce i had my head in Semantic Versioning land just before reading this so i was like "yup. 9.11 is bigger than 9.9" while simultaneously thinking "something's not right here"

Show thread

Ryan Dormanesh Jul 16, 2024

@atoponce wow you are either faking it or a really bad prompt engineer: https://chatgpt.com/share/15cf4411-f272-4ebe-a90f-4ddfc93a78bc

ChatGPT - アーミッシュ自転車の変化

Shared via ChatGPT

ChatGPT

Show thread

flere-imsaho 🇺🇦Jul 16, 2024

@Hexa there's always one promptfondler in the thread that doesn't understand that you can't get fully repeatable answers from the confabulation engine, and that any answer to that question is a valid answer within the llm paradigm, no matter if it's incorrect or not.

(there's also another promptfondler who thinks that the problem is just in one particular llm, not in the way llm works)

@atoponce

Show thread

Ryan Dormanesh Jul 17, 2024

@mawhrin @atoponce fair point. I apologize. Also “promptfondler” 😆 I’ve never heard that one.

Show thread

Old Man in the Shoe Jul 17, 2024

@Hexa
they are calling you names

Show thread

Ryan Dormanesh Jul 17, 2024

@jenzi I don’t care, let them not use AI and think it sucks and call people names. Doesn’t affect me. I know how to use it and enjoy it for professional and personal growth.

Show thread

Emma (has_many_books of old)Jul 17, 2024

@mawhrin @Hexa @atoponce came for the promptfondler, stayed for the confabulation engine

Show thread

lp0 on fire

Jul 16, 2024

Its initial response is ‘correct’, but only if the items being compared are version strings.

Show thread

Robert Taylor Jul 16, 2024

@atoponce actually 1/6 is bigger than 9/11

Show thread

Bill Zaumen Jul 16, 2024

@atoponce For the example - ChatGPT botching arithmetic - it actually passed the Turing test. Once in a store, I ordered 2.2 lb of some deli item, and the scale registered 2.02. The guy behind the counter called 2.20 "two point twenty" and 2.02 "two point two". The scale always showed two digits past the decimal point. This guy basically made the same mistake as ChatGPT.

Show thread

flere-imsaho 🇺🇦Jul 16, 2024

@bzdev nah. don't antropomorphise a statistical engine.

Show thread

Joe Vinegar Jul 18, 2024

@bzdev @atoponce well, the guy was omitting the "hundredths" in reading decimals, for short as it was always two decimal digits. That's not wrong at all, given the context, and also different from the error in chatgpt interaction.

Show thread

Bill Zaumen Jul 18, 2024

@joe_vinegar @atoponce Not exactly: he called .02 "point 2" and .20 "point twenty", hence the confusion. As with the ChatGPT example, the problem was in part not realizing that there are an infinite number of implied zeros after the last digit provided.

Show thread

Ritschi Alpenstern

Jul 16, 2024

@atoponce Oh Great, we have a tool that uses energy to simulate stupid. As if we hadn’t enough.

Show thread

echopapa Jul 17, 2024

@RAlpenstern @atoponce

in the meantime, they have fixed this issue. But I think, we only have to dig a little bit deeper now.

Show thread

Ritschi Alpenstern

Jul 17, 2024

@echopapa @atoponce I just tried it on chatGPT yesterday asking it to calculate 9.9 - 9.10 and it tried to convince me it was -0.2

Show thread

echopapa Jul 17, 2024

@RAlpenstern @atoponce

seems to vary:

> please calculate 9.9 - 9.10

The result of subtracting 9.10 from 9.9 using Python is approximately 0.8. The small discrepancy (0.8000000000000007) is again due to floating-point arithmetic precision in computers.

anyway, I prefer using a calculator and not an LLM.