Mastodawn

So, you think AI is ready to write all of your code for you? Ok, start by asking it “There is a car wash 100 meters away. I need to wash my car. Should I walk or drive?”

Show thread

Birgit Pauli-Haack Feb 17

@macmanx response, well argued. "You'll want to drive — the whole point is to get your car there so it can be washed! Even though 100 meters is a very short walk, your car can't wash itself at home. Drive it over, and enjoy the short trip."

Show thread

James Huff

Feb 17

@bph I’ve gotten Claude and Gemini to respond correctly, but GPT, Copilot, and even Gemma (Google’s local model) fail big time, suggesting that walking is the most efficient option.

Show thread

Birgit Pauli-Haack Feb 17

@macmanx Lumo disappoints, too. Quite elaborate for a wrong answer 🤦‍♀️

"At just 100 m, walking will almost always beat driving. Even if you could roll out the door and zip over in a few seconds, you still have to spend time getting into the car, starting it, navigating any curb or parking spot, and then getting back out again. All that adds up to more hassle than a quick stroll.

So, lace up your shoes (or just hop on foot) and head over—it's faster, simpler, and you’ll save a bit of fuel too."

Show thread

James Huff

Feb 17

@bph I feel like I could write a book titled “Lumo Disappoints” at this point. 😅

Show thread

Birgit Pauli-Haack Feb 17

@macmanx I am in the same boat. Got quite a few wrong answers lately. And it has a passive aggressive tone,.once you point out the wrongness. And Its context window is really short.

Show thread

James Huff

@bph Yeah, overall Claude 4.5 is doing well for a remote model, and Gemma 3 is doing well for a hosted model (it shows its work quite well, so hallucinations are easier to spot).

Show thread

Birgit Pauli-Haack Feb 23

@macmanx yeah, totally agree. I don't use Lumo unless I am on my phone... Claude Opus 4.5 or soon 4.6 are the work horses here, too.