Generative AI is garbage, exhibit 99:

#ai #math

@atoponce that is dangerous garbage, next it will apply bodmas....
@atoponce 🔥 mathematics 2 just dropped
@atoponce LMAO what's next, a quarter pounder being larger than a third? 💀
@alexthecat @atoponce like this for example ? I don’t know if it’s true but I have zero problem believing it sadly. I have even seen two sets of calculators get different results in a very simple calculation and one of them was wrong. Well obviously one was wrong but you know what I mean.
@atoponce You should have asked to express both of them as a fraction and compare them, but I guess that too would fail catastrophically.
@atoponce of course that glorified markov chain can't do math
@atoponce
Hmm, well, 9.11 is wider.
@atoponce
And an answer from Copilot just now:
@atoponce
This time I asked which number has the greater numerical value. It gives a correct explanation, but stalls out before concluding that 9.9 is greater, almost as if it doesn't want to contradict its previous answer.
@violet @atoponce Truly, we have been freed from the labor of our demonstrating our own ignorance, biases, and stupidity, and the end must be near.

@violet @atoponce

I asked Gemini. It was wrong too.
I asked it for its drafts. It got it right in 1 of 3.

@violet
Ah, the classic "Ah, the classic" that instantly gives away AI slop
@atoponce Conversely, why not both
@atoponce I agree, 9.11 is a big deal. Never heard of 9.9. You haven't specified which metric to use when comparing.
@atoponce but it got lost in the explanation for sure
@atoponce but at least it's politically correct
@atoponce It’s not “garbage” exactly. It’s not correct maths, but for example Michael Rosen’s “Hairy Tales and Nursery Crimes” is full of “factual errors”, and no-one should think that is “garbage”. The only issue here is if you try to use generative AI to give you correct answers, which is like trying to get an oboe to tell you who composed Handel’s Messiah.
@johnaldis @atoponce someone should tell the marketers this because they don't seem to know
@atoponce
I'm also trash, I'm bad at math and don't know the multiplication table, let alone multiplying and dividing fractions  
@DearFox @atoponce But you know it, so you are not trash.
@atoponce
Well, for the sake of truth, they write it themselves "ChatGPT can make mistakes. Check important info."
@atoponce
GPT, you got some brain worms.
@rose_alibi
@jhoward @atoponce @rose_alibi Don’t worry, it’s at least reliably good at reducing our available water supply.
@aral
That's OK. I'm hearing great things about Brawndo.
@atoponce @rose_alibi
@atoponce
I don't know what you're talking about.
9.11 is clearly bigger than 9.9.
9.11 is 4 characters, and 9.9 is only 3.
#EverythingIsAString
@atoponce hey look it’s a word calculator not a calculator calculator
@albnelson @atoponce yeah, why would we expect a huge, optimized linear algebra machine to be able to do arithmetic?
@kajord @atoponce see, algebra is where we went wrong in the first place. Never should have mixed up letters and numbers.
@atoponce They want to hook this up to gene sequencing machines. Why do these tech bros exist? Nature may try to eliminate them by eliminating us all.
@atoponce "Gee, I wonder why our probe slammed into the surface of the planet?"
@atoponce This is actually a tokenization error. 9.11 looks larger than 9.9 because 11 tokenizes as a single unit and 11 is usually larger than 9.
@cadey @atoponce In other words, despite all efforts to make math work better with LLMs, like adding Python support, it's still bad at it. Also it inherited the overconfidence from the dataset, which should include Reddit.
@atoponce it’s not garbage, it’s not intelligent. It has useful applications, but arithmetic isn’t one apparently. It’s not wholly surprising. LLMs model natural language. Arithmetic isn’t natural language.
@ry_ @atoponce That would basically mean Copilot in Excel is nearly pointless, unless it does something smarter than the rest of Copilot for M365
@toriver @atoponce I’ve not used copilot, but I assume its numerical analysis output is not directly via the attention mechanism of an llm. Eg it could be using a llm to predict the context to data which is then fed into routines, or use llms to offer code suggestions. None of these things are llms directly doing maths.

@atoponce I wonder: you know how virtual assistants are given feminine names and voices (Siri, Alexa)? And you know how there is a persistant false belief that women are somehow worse at math than men?

I have to wonder whether that combination of biases has any influence on the programmers who create these LLMs? I mean on top of all of the other biases and misunderstandings they already have about neuroscience and language? Are they creating their own stereotype of a ditzy secretary?

@UncivilServant @atoponce It would help if "virtual assistants" used a name and voice appropriate to a 5 year old child.
There was an old TV program called "Kids say the Darndest Things" with clips of it shown on The Bill Crosby show (well before he was arrested):
https://www.youtube.com/watch?v=G1voLZyI0SM
Art Linkletter's Kids Say The Darndest Things | 1995 Special with Bill Cosby (CBS)

YouTube
@UncivilServant this has nothing to do with biases; llms don't produce correct answers, they produce statistically-probable text completion. @atoponce
@atoponce i had my head in Semantic Versioning land just before reading this so i was like "yup. 9.11 is bigger than 9.9" while simultaneously thinking "something's not right here"
@atoponce wow you are either faking it or a really bad prompt engineer: https://chatgpt.com/share/15cf4411-f272-4ebe-a90f-4ddfc93a78bc
ChatGPT - アーミッシュ自転車の変化

Shared via ChatGPT

ChatGPT

@Hexa there's always one promptfondler in the thread that doesn't understand that you can't get fully repeatable answers from the confabulation engine, and that any answer to that question is a valid answer within the llm paradigm, no matter if it's incorrect or not.

(there's also another promptfondler who thinks that the problem is just in one particular llm, not in the way llm works)

@atoponce

@mawhrin @atoponce fair point. I apologize. Also “promptfondler” 😆 I’ve never heard that one.
@Hexa
they are calling you names
@jenzi I don’t care, let them not use AI and think it sucks and call people names. Doesn’t affect me. I know how to use it and enjoy it for professional and personal growth.
@mawhrin @Hexa @atoponce came for the promptfondler, stayed for the confabulation engine
Its initial response is ‘correct’, but only if the items being compared are version strings.
@atoponce actually 1/6 is bigger than 9/11
@atoponce For the example - ChatGPT botching arithmetic - it actually passed the Turing test. Once in a store, I ordered 2.2 lb of some deli item, and the scale registered 2.02. The guy behind the counter called 2.20 "two point twenty" and 2.02 "two point two". The scale always showed two digits past the decimal point. This guy basically made the same mistake as ChatGPT.
@bzdev nah. don't antropomorphise a statistical engine.
@bzdev @atoponce well, the guy was omitting the "hundredths" in reading decimals, for short as it was always two decimal digits. That's not wrong at all, given the context, and also different from the error in chatgpt interaction.
@joe_vinegar @atoponce Not exactly: he called .02 "point 2" and .20 "point twenty", hence the confusion. As with the ChatGPT example, the problem was in part not realizing that there are an infinite number of implied zeros after the last digit provided.
@atoponce Oh Great, we have a tool that uses energy to simulate stupid. As if we hadn’t enough.

@RAlpenstern @atoponce

in the meantime, they have fixed this issue. But I think, we only have to dig a little bit deeper now.

@echopapa @atoponce I just tried it on chatGPT yesterday asking it to calculate 9.9 - 9.10 and it tried to convince me it was -0.2

@RAlpenstern @atoponce

seems to vary:

> please calculate 9.9 - 9.10

The result of subtracting 9.10 from 9.9 using Python is approximately 0.8. The small discrepancy (0.8000000000000007) is again due to floating-point arithmetic precision in computers. ​​

anyway, I prefer using a calculator and not an LLM.