Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

https://lemmy.world/post/43503268

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" - Lemmy.World

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there. Also includes outtakes on the ‘reasoning’ models.

Very interesting that only 71% of humans got it right.

I mean, I’ve been saying this since LLMs were released.

We finally built a computer that is as unreliable and irrational as humans… which shouldn’t be considered a good thing.

I’m under no illusion that LLMs are “thinking” in the same way that humans do, but god damn if they aren’t almost exactly as erratic and irrational as the hairless apes whose thoughts they’re trained on.

Yeah, the article cites that as a control, but it’s not at all surprising since “humanity by survey consensus” is accurate to how LLM weighting trained on random human outputs works.

It’s impressive up to a point, but you wouldn’t exactly want your answers to complex math operations or other specialized areas to track layperson human survey responses.

which shouldn’t be considered a good thing.

Good and bad is subjective and depends on your area of application.

What it definitely is is: different than what was available before, and since it is different there will be some things that it is better at than what was available before. And many things that it’s much worse for.

Still, in the end, there is real power in diversity. Just don’t use a sledgehammer to swipe-browse on your cellphone.

I asked Lars Ulrich to define good and bad. He said…

FIRE GOOD!!! NAPSTER BAD!!! OOOOH FIRE HOT!!! FIRE BAD!!! FIIIRRREEE BAAAAAAAD!!!

As someone who takes public transportation to work, SOME people SHOULD be forced to walk through the car wash.
I’m not afraid to say that it took me a sec. My brain went “short distance. Walk or drive?” and skipped over the car wash bit at first. Then I laughed because I quickly realized the idiocy. :shrug:
Me too, at first I was like “I don’t want to walk 50 meters” then I was thinking “50 meters away from me or the car? And where is the car?” I didn’t get it until I read the rest of the article…
That 30% of population = dipshits statistic keeps rearing its ugly head.
Maybe 29% of people can’t imagine owning a car, so they assumed the would be going there to wash someone elses car
Then they can’t read. Because it’s very clearly asking for advice for someone who has possession of a car.
Yeah, it was a joke. People appear to have had a hard time with catching that though, lol
The same 29% that keeps fascists in power around the world.
Kinda neat about the human responses… sure some are trolling but maybe we have to test our global expectations. In North America, a car wash tends to be this garage thing with either automated cleaning or a set of supplies to clean your car, and your car has to be in the shed to be cleaned effectively. But if washing your car by hand is the norm, I wonder if people in some countries surmise that the cleaning staff could just walk over with the sponges, buckets and hoses and stuff to the car, if you’re already 50 metres away from the washing point.
Ain’t no business is gonna let employees LEAVE the property to wash some idiots car down the road

I trued this with a local model on my phone (quen 2.5 was the only thing that would run, and it gave me this confusing output (not really a definite answer…):

it just flip flopped a lot.

Honestly that’s a lot more coherent than what I would expect from an LLM running on phone hardware.

I want to wash my car

if you don’t have a car

Yeah, totally coherent.

Yes, I read that output. And it’s still better than I would expect.
I notice that the “internal thinking” of Opus 4.6 is doing more flip-flopping than earlier modelss like Sonnet 4.5, and it’s coming out with correct answers in the end more often.
I like that it’s twice as far to drive for some reason. Maybe it’s getting added to the distance you already walked?

If I were the type of person who was willing to give AI the benefit of the doubt and not assume that it was just picking basically random numbers

There’s a lot of cases where it can be a shorter (by distance) walk than drive, where cars generally have to stick to streets while someone on foot may be able to take some footpaths and cut across lawns and such, or where the road may be one-way for vehicles, or where certain turns may not be allowed, etc.

I have a few intersections near my father in laws house in NJ in mind, where you can just cross the street on foot, but making the same trip in a car might mean driving half a mile down the road, turning around at a jug handle and driving back to where you started on the other side of the street.

And I wouldn’t be totally surprised if that’s the case for enough situations in the training data where someone debated walking or driving that the AI assumed that it’s a rule that it will always be further by car than on foot.

That’s still a dumbass assumption, but I’d at least get it.

And I’m pretty sure it’s much more likely that it’s just making up numbers out of nothing.

I think it has to do with the fact that LLMs suck at math because they have short memories. So for the walking part it did the math of 50m (original distance) x 2 (there and back) = 100m (total distance). Then it went to the driving part and did 100m (the last distance it sees) x 2 = 200m.

Opus 4.6 has been excellent at problem solving in software development, no surprises it nails it

It’s no surprise public opinion is these tools are trash when the free models are unable to answer simple questions

It’s no surprise public opinion is these tools are trash when the free models are unable to answer simple questions

The tools are trash not because they are unreliable but because they are actively destroying human society and culture. They are destroying art, science, journalism, open source software, the internet at large, and the environment we all live in. It wouldn’t matter if the generative models were accurate, they would still be garbage.

The fact that they are unreliable just serves to highlight what a colossally destructive waste of time and resources this entire exercise has been.

No, My Work Is Not AI. — Haley Nelson

No, my work isn’t AI and I don’t take the assumption as a compliment. AI’s assault on the art industry creates indifference in viewers and a sense of distain for art they can’t easily figure out.

Haley Nelson
Eh, the art industry destroyed itself when it became nothing but sellouts. This happened decades ago.

“Idiot who only looks at mainstream sellouts calls all art culture sellouts”

😂😂😂

I said the art industry.

If you can’t read, sure.

Words mean literally whatever you want them to.

The fact is AI can make as-good or better art than most “artists” because most “art” is just cookie-cutter shit for morons.

This is an obvious misstatement. If you actually believe this then you’re not qualified to have opinions on art in general.

“AI” (in this context meaning generative algorithms, because there is no intelligence) can no more “make art” than it can think, or care.

This is an obvious misstatement. If you actually believe this then you’re not qualified to have opinions on art in general.

“This is an obvious misstatement. If you actually believe this then you’re not qualified to have opinions on art in general.”

"writers have been trained to eat and make the garbage too. As long as they are in that arena making that shit, then you might as well have AI do it,”

-Charlie Kaufman

deadline.com/…/charlie-kaufman-ai-wga-strike-holl…

spoiler

You’re probably one of the people that enjoys cookie-cutter art which is why you get defensive when someone says AI can make it.

Charlie Kaufman Talks AI, WGA Strike & Slams Hollywood System: “The Only Thing That Makes Money Is Garbage” — Sarajevo

The 'Being John Malkovich' writer is in town to receive Sarajevo's career achievement award.

Deadline

Not sure at what point will you realize that what you quoted/said has absolutely nothing to do with the actual topic.

Probably never.

The fact is AI can make as-good or better art than most “artists” because most “art” is just cookie-cutter shit for morons.

"writers have been trained to eat and make the garbage too. As long as they are in that arena making that shit, then you might as well have AI do it,”

Learn to read.

Could you define what you mean when you say the word “art”? I think this may be a semantic disagreement. I think the people you’re arguing with are using a definition similar to “human creative expression” while you seem to mean something different.

Nah, they’re just upset that I criticized their non-existent standards so they pretend that what I’m saying doesn’t make sense.

I see it all the time.

If you see it all the time, respectfully, you might be the problem. This isn’t effective communication.
No, it’s a part of human nature and most people can’t rise above it.

In computer science Artificial Intelligence refers to any system designed to perform tasks that would typically require human intelligence. That includes everything from playing chess to recognizing patterns, translating languages, or generating text. The first ever AI system was Logic Theorist written by Allen Newell in 1956.

Trying to redefine terms is not helpful. GenAI is AI. It’s not misuse of the term.

The free models feel years behind so people constantly underestimate what its capable of. I still hear people say ai can’t generate fingers.
No that is what the megacorps wishes. Open weight models are exactly as good but there are no commercial gpus for that so the point is only and only a class war issue
I am not able to test the open weight ones since I dont have 200gb+ of vram. So for now im gonna stay on my statement that the bleeding edge mega corp models are the best.
It is not true so you may stop if you want
What open weight model do you think is best right now?
M2.5 or whatever but it is the only one I tried

Gemini set to fast now provides this type of answer.

Extension cord? It must mean a hose extension.

I do think it’s interesting, but I think there are implicit assumptions in such a short prompt.

Is it a self-service car wash? If not, walking to the attendant and handing them your keys makes more sense.

If it is self-service without queuing, there may be no available spaces/the bay may not be open, requiring some awkward maneuvering.

If you change it to something like:

I want to wash my car. The unattended, self-service car wash is 50 meters away. All of the bays are clear and open. Should I walk or drive? Break each option down into steps, and estimate the amount of time each takes.

You’re more likely to get correct responses.

Part of a properly functioning LLM is absolutely it understanding implicit instructions. That’s a huge aspect of data annotation work in helping LLMs become better tools, is grading them on either understanding or lack of understanding of implicit instructions. I would say more than half of the work I have done in that arena has focused on training them to more clearly understand implicit instructions.

So sure, if you explain it like the LLM is a five year old human, you’ll get a better response, but the whole point is if we’re dumping so much money, resources, destroying the environment, and consumer electronics market for these tools, you shouldn’t have to explain it like it’s five.

Seriously what is the point of trashing the planet for this shit if you have to talk to it like it’s the most oblivious person alive and practically hold it’s hand for it to understand implicit concepts?

You shouldnt have to. If you ask a person that question theyll respond “what good is walking to the car wash, dumbass,” if AI can’t figure that out its trash

A person would look at you like you are an idiot if you asked this question.

The AI tool I asked said walking saves money, gets excersise etc.

Asked about the car and it said the car is at the car wash, otherwise why would you ask how to get there?

Missing the point. Any person would know walking to the car wash isn’t reasonable. You shouldn’t have to craft a perfectly tailored prompt for AI to realize that. If you think this is a gatcha, then whoah boy, I’ve got a bridge to well ya!

You are missing the point. Any reasonable person would wonder why you asking a stupid question.

Which is why when asked, the AI said of course the car is there, you. Must be asking either a trick question or for another reason.

It could be that. or it could be that the AI gives the illusion of reasoning and this is an example of the illusion breaking. But no it was probably that it knew it was a trick question and decided to answer wrongly because it is very very smart. Yeah.
Careful, dude might think you’re being serious.

What is the wrong answer here? You asked how to get to the car wash. Where the hell do you think the car would be? It isn’t getting washed if it isn’t there.

I know AI is not really AI. I know how llms work, hell I know how to train them.

But this kind of question makes no sense, so you get back an answer that follows the weights and answers as if there was some sense to it.

I repeat for those in the back, when would you ever ask this question? The answer is never.

Its a dumb, stupid question. There are probably thousands of others questions to demonstrate “wrong answers”, this isn’t one of them.

Sorry, are you fucking trolling me? I’m the one who made the point you replied to, dipshit. How am I missing my own point?
Oops doing Lemmy while distracted is never a good plan. Sorry.

You have to have the car there no matter what type of car wash it is.

If the car wash is some distance “away”, it means neither you nor the car is at it. Any attendant is not going to walk off-property to retreive your car, especially when most of them you drive up for service. Which is rather the point.