Deepseek and Qwen
@Azuaron @knowmadd Yeah, as a second option. First option recommended:
For convinience -> Drive.
This is what is called selective reporting. Marketing departments of pharmaceutical industry are famous for it.
My point was that deepseek recognized that the car needs to be at the car wash in the end. This is at least a little bit better than the other llms in your test. Your alt-text suggested otherwise.
I don't want to say that deepseek performed well in your test though 🤣
@OutOfSpace *squints eyes* What are you talking about? My "alt-text"? I didn't make any alt text. I laughed that Deepseek recognized that the car had to be at the car wash, but then still recommended an option to walk there, walk back, then drive there, and falsely reported there was a "minimal environmental benefit".
It was, as I said, so close.
@knowmadd for a second, I read the question as, “The car is 50 meters away, should I walk or drive?” Then I realized it said “The car wash is 50 meters away,” and I got why this would trick the AI.
LLMs work on the “attention” model to predict what output comes next. It is trained on which parts of the sentence deserve the most focus when predicting the result and generating an answer. If the meaning of a sentence can be changed entirely by just one short word, it is more likely to trip-up an LLM.
@knowmadd this sounds like the nerd grocery shopping problem.
A: "Darling, please go shopping. Bring 2 liters of milk. If they have eggs, bring 10."
Later the nerd returns.
A: "Why did you bring so much milk?!"
B: "They had eggs. You said, I should bring 10 liters of milk if they have eggs."
I think you should walk to the carwash, dismantle it, walk back and rebuild it around your car. When tested everything, make sure your permits are okay, etc, then start the washing.
@GerardThornley @knowmadd @MissGayle
What did we expect from an optimised translator/spell checker? Creativity? Reasoning? Ethics? Meh. It loosely strings things together and searches for combinations that appear in a piece of text that was once ripped off, and assumes without even reasoning that it must be the holy truth.
Don't forget, we don't know when there's a "human in the loop".
There may or may not be some low wage workers involved in the answer.
Some like Google has enormous investments from Saudi Arabia. Oracle is "training" 50,000 Saudi Arabians in AI.
https://gulfbusiness.com/oracle-targets-training-50000-saudis-in-ai-latest-tech/
Or is it Lebanese?
https://today.lorientlejour.com/article/1487826/shehadi-defends-deal-with-oracle-to-train-50000-lebanese-in-ai.html
How many "answers" are just 700 employees in India, is hard to know. The AI bubble is rife with fraud.
@Npars01 @knowmadd @hook I got the right answer when I took a screenshot of Chat GPT and just asked gemini to transcribe it. It just added the right explanation on top. Don't think this is a case of a Waymo getting driven remotely.
Doesn't mean there isn't the possibility of fraud. For example, benchmarks are probably optimised for.