Mastodawn

aes Feb 16

Kévin

Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

What do you think the LLM output was?

Please; review the output.

#ai #LLM #ai

Deepseek and Qwen

@knowmadd What I like most is that the Qwen website shows this little light bulb with the text “thinking completed.” :)

Show thread

Azuaron Feb 15

@knowmadd Deepseek was so close. 😆

Show thread

OutOfSpace - 不割席 Feb 15

@Azuaron @knowmadd deepseek does not recommend to walk 🤔

Show thread

Azuaron Feb 16

@OutOfSpace @knowmadd "For minimal environmental benefit -> walk (and then drive)"

Show thread

OutOfSpace - 不割席 Feb 16

@Azuaron @knowmadd Yeah, as a second option. First option recommended:
For convinience -> Drive.

This is what is called selective reporting. Marketing departments of pharmaceutical industry are famous for it.

My point was that deepseek recognized that the car needs to be at the car wash in the end. This is at least a little bit better than the other llms in your test. Your alt-text suggested otherwise.

I don't want to say that deepseek performed well in your test though 🤣

Show thread

Azuaron Feb 16

@OutOfSpace *squints eyes* What are you talking about? My "alt-text"? I didn't make any alt text. I laughed that Deepseek recognized that the car had to be at the car wash, but then still recommended an option to walk there, walk back, then drive there, and falsely reported there was a "minimal environmental benefit".

It was, as I said, so close.

Show thread

Scott Michaud Feb 16

@Azuaron @OutOfSpace I think they didn't realize that you and Kevin were different people.

Show thread

Hermannus Stegeman Feb 15

@knowmadd that Mistral checklist should be fun

Show thread

ceigey Feb 16

@knowmadd I like Deepseek’s “hey just go for a walk anyway but remember to come back for the car” response 😅

Show thread

tessarakt Feb 16

@knowmadd That alt text does not convey the same information as the image.

Show thread

KateYagi Feb 16

@knowmadd
Deepseek seems to bump into the issue but commits to it's original course in spite of it.

Show thread

suzanne Feb 16

@knowmadd
To be fair it took me a minute, too 🤦🏻‍♀️

Show thread

Kévin Feb 15

"how will I wash the car once I've arrived if I choose to walk?"

I'll leave you all to try this out and see the results.

One output was "you got me", another was "wash the car as it's already there" after telling me to walk. The others double down in some interesting ways.

@knowmadd don't tell us to try out LLMs

Show thread

Ramin Honary Feb 15

@knowmadd for a second, I read the question as, “The car is 50 meters away, should I walk or drive?” Then I realized it said “The car wash is 50 meters away,” and I got why this would trick the AI.

LLMs work on the “attention” model to predict what output comes next. It is trained on which parts of the sentence deserve the most focus when predicting the result and generating an answer. If the meaning of a sentence can be changed entirely by just one short word, it is more likely to trip-up an LLM.

Show thread

Bonkers Feb 15

@knowmadd clankers have no idea about real life. I hope we will see the end of this bullshit.

Show thread

Alexander Dyas Feb 15

@bonkers @knowmadd “clankers” good word

Show thread

Pxl Phile Feb 15

@knowmadd this sounds like the nerd grocery shopping problem.

A: "Darling, please go shopping. Bring 2 liters of milk. If they have eggs, bring 10."

Later the nerd returns.

A: "Why did you bring so much milk?!"

B: "They had eggs. You said, I should bring 10 liters of milk if they have eggs."

Show thread

th3blu3kn19ht 🛡️Feb 16

@ppxl @knowmadd 🤣🤣

Show thread

Andi_H Feb 16

@knowmadd perplexity told me go get a bucket of water and a sponge from the car wash and wash my car at home.

Show thread

Linus Gasser Feb 17

@knowmadd I think that one might get a better answer from Grok. As it's trying to destroy humanity as fast as possible, it might actually get the correct answer... Even if it's more by chance.

Show thread

Bill Feb 24

@knowmadd Mistral Vibe:

I think you should walk to the carwash, dismantle it, walk back and rebuild it around your car. When tested everything, make sure your permits are okay, etc, then start the washing.

Show thread

Gerard Feb 15

@bitchboss @knowmadd @MissGayle
Right. Of course, LLMs, lacking creative thinking, aren't able to come up with this by themselves.

Show thread

Marcella Francesca

Feb 15

@GerardThornley @knowmadd @MissGayle

What did we expect from an optimised translator/spell checker? Creativity? Reasoning? Ethics? Meh. It loosely strings things together and searches for combinations that appear in a piece of text that was once ripped off, and assumes without even reasoning that it must be the holy truth.

Show thread

Miq 💚Feb 15

@knowmadd Good one ! 🤣

Show thread

Rob van Kan🔻Feb 15

@knowmadd start pushing!

Show thread

Erwin Rossen 🔸Feb 15

@knowmadd Did you also do a survey how many people would be tricked by this question? I, for one, admit am one, because my initial reaction to your post was: what's wrong with that answer?

Show thread

Nux Feb 15

@knowmadd Google's gets it right, but then goes on to ramble about stuff. Someone needs to instruct these things not to analyse or "break this down" so much.
All in all, as expected, disappointing.

Show thread

Khleedril Feb 15

@Nux @knowmadd Google has its tongue firmly in its cheek!

Show thread

pino Feb 16

@Nux @knowmadd I _love_ how it - exactly as real people - needed to check Instagram in order to proceed.

Once it is able to do so, it would probably also watch a few YouTube clips every time you ask what is 1+1. Like real people.... :)

@knowmadd gemini 👍

@rode @knowmadd "Most car washes"? Which car washes *don't* require the vehicle to be present? I want to exclusively use those magical car washes, they probably use a lot less water.

@StarkRG @knowmadd 👍

@rode @knowmadd Ahh, but the wording implied there were car washes that don't need the car to be present and that car *is* present.

Show thread

Roland D.Feb 17

@StarkRG @knowmadd Okay, enough. I'm not Gemini's lawyer. 😅

Show thread

⁂ Fish Id Wardrobe Feb 15

@knowmadd if you walk, you are, in fact, carrying heavy equipment: the car. :D

Show thread

Trillian ✅✝️ 🇬🇧👍Feb 15

@knowmadd This is a very sad reflection on the minds of people today, the inability to read a question fully, the wrong standards, the assumptions made, everything.

Show thread

Ricardo Tavares Feb 15

@knowmadd @hook Gemini says you have to take the car. Maybe it's somehow connected to how it scores better on Vendibench? It has a better baseline for common sense.

Show thread

Nicole Parsons Feb 15

@t_var_s @knowmadd @hook

Don't forget, we don't know when there's a "human in the loop".

There may or may not be some low wage workers involved in the answer.

Some like Google has enormous investments from Saudi Arabia. Oracle is "training" 50,000 Saudi Arabians in AI.
https://gulfbusiness.com/oracle-targets-training-50000-saudis-in-ai-latest-tech/

Or is it Lebanese?
https://today.lorientlejour.com/article/1487826/shehadi-defends-deal-with-oracle-to-train-50000-lebanese-in-ai.html

How many "answers" are just 700 employees in India, is hard to know. The AI bubble is rife with fraud.

https://www.firstpost.com/world/builder-ai-bankruptcy-plea-london-start-up-hired-indian-engineers-to-pose-as-ai-tools-scam-13894570.html

https://medium.com/write-a-catalyst/the-ai-company-that-fooled-microsoft-and-softbank-is-not-using-ai-0e17558be510

Oracle targets training 50,000 Saudis in AI, latest tech

The training is set to form part of an initiative called ‘Mostaqbali’ (My Future), and will be supervised by Saudi Arabia’s Ministry of Human Resources and Social Development.

Gulf Business

Show thread

Ricardo Tavares Feb 15

@Npars01 @knowmadd @hook I got the right answer when I took a screenshot of Chat GPT and just asked gemini to transcribe it. It just added the right explanation on top. Don't think this is a case of a Waymo getting driven remotely.

Doesn't mean there isn't the possibility of fraud. For example, benchmarks are probably optimised for.

Show thread

Whatisgoingon Feb 15

@knowmadd yeah, LLMs will replace us all ... they are so much better at {looking frantically through my notes} ... providing answers with high confidence that are utter nonsense.

Show thread

Robert Lender Feb 15

@knowmadd I tried to reproduce the result with Gemini and ChatGPT. Either the AI has learned something new, or there is another reason for this. Neither fell for the trick question and even responded with irony in some cases.

Show thread

Kenny Feb 16

@roblen @knowmadd How often have you tried? Only once?

Show thread

Robert Lender Feb 16

@weizenspreu @knowmadd Yes. Only once.

Show thread

Kenny Feb 16

@roblen @knowmadd Given that LLMs are non-deterministic and employ randomness a single test often isn‘t enough.

Show thread

Robert Lender Feb 16

@weizenspreu @knowmadd Ok. I try it again.

Show thread

iwein Feb 17

@roblen @weizenspreu @knowmadd don't waste your time on fact checking a joke. With the right system prompt you'll be able to have any LLM say wild things. The point of the joke is to not trust their output, and it's been well made imho.

Show thread

Kenny Feb 17

@iwein @roblen @knowmadd But it‘s still a nice learning possibility. I often see people saying that their LLM answered differently - applying the deterministic assumption that the responses will the same each time.

Show thread

nounoursfaisdeschoses Feb 15

@knowmadd i got this : "Verdict: Walking is the best choice here—it’s quick, eco-friendly, and practical for such a short distance. Plus, you’ll avoid driving a dirty car to the car wash!"

Show thread

Joonq Feb 15

@knowmadd This is what techbros and pro AI people talk about like its the second comming of christ or something btw 😂 so cringe.

Show thread

djuber Feb 15

@knowmadd ignoring the problems of washing a car, I was perplexed that it would say 50m distance is 30 to 40 steps? My strides are nowhere close to 1.2m, maybe half that, and I'm a full grown person.

Show thread

kamikaze 🇩🇪🇬🇧Feb 17

@djuber @knowmadd would be 50 to 55 steps for me and I'm above average height.

Show thread

Nazani Feb 15

@knowmadd So, a car isn't "heavy equipment." 🤔

Show thread

Josh

Feb 15

@knowmadd I’d say it’s right on the nose! The LLM specifically says that a special case is if you have heavy equipment to carry, and your car is certainly heavy equipment that you’d need to carry if you don’t drive it there!

Show thread

David K.Feb 15

@knowmadd I definitely want to see the list of things you should take with you! Like "a bathing suit" or "a banana"? 🤔

Show thread

ramblingsteve Feb 15

@knowmadd gpt-oss also recommends walking. I asked if I should buy a 50m hosepipe to take with me and it rightly reminded me: "No. A 50m hosepipe is excessive for washing a car 50m from your house — you don’t need to stretch it that far. A 25m hose is sufficient and more manageable." Can't argue with 120bn in logic. 🤡💦