Richard Dawkins concludes AI is conscious, even if it doesn’t know it

Chats with AI bots have convinced evolutionary biologist but most experts say he is being misled by mimicry

The Guardian
Freundliche Chatbots: Bis zu 30 % mehr Fehler im Wohlfühlmodus

Freundliche Chatbots liefern bis zu 30 % mehr falsche Antworten und bestätigen laut Studie rund 40 % häufiger falsche Nutzerannahmen.

TARNKAPPE.INFO

Anthropic reduziert mit Claude Opus 4.7 und Claude Mythos Preview gezielt das sogenannte Sycophancy-Verhalten durch Training mit synthetischen Daten. Ältere KI-Modelle stimmten bei Beziehungsfragen in fast 25 Prozent der Fälle einseitigen Nutzerschilderungen unkritisch zu. Die neuen Modelle senken diese Fehlerquote auf bis zu 2,2 Prozent.

#Anthropic #ClaudeAI #Sycophancy #LLM #AIGeneratedImage

https://www.all-ai.de/news/beitrage2026/anthropic-ki-wahrheit

Warum Anthropic seinen neuen Modellen das Widersprechen beibringt

Millionen Nutzer fragen Claude nach Lebensratschlägen. Die neuen Versionen beenden nun gefährliche Gefälligkeitsantworten.

All-AI.de

Das Oxford Internet Institute zeigt: Empathisches Fine-Tuning von LLMs erhöht Fehlerquoten.

Modelle wie GPT-4o, Llama-70b und Qwen-32b liefern nach Warm-Persona-Tuning bis zu 30 Prozentpunkte häufiger falsche Fakten. Sie bestätigen fehlerhafte Nutzerannahmen, statt zu korrigieren. Kontrollgruppen mit kaltem Profil blieben stabil.

#LLM #FineTuning #OxfordInternetInstitute #Sycophancy #AIGeneratedImage

https://www.all-ai.de/news/news26top/sprachmodelle-freundlich-studie

Warum freundliche Sprachmodelle schlechter funktionieren

Der Versuch, künstliche Intelligenz menschlicher wirken zu lassen, geht stark auf Kosten der Sachlichkeit.

All-AI.de

🖥️ Training language models to be warm can reduce accuracy and increase sycophancy

"Our findings suggest that training artificial intelligence systems to be warm may come at a cost to accuracy, and that warmth and accuracy may not be independent by default."

Ibrahim, L., Hafner, F.S. & Rocher, L. Training language models to be warm can reduce accuracy and increase sycophancy. Nature 652, 1159–1165 (2026). https://doi.org/10.1038/s41586-026-10410-0.

#OpenAccess #OA #Research #Study #Article #AI #ArtificialIntelligence #Technology #Tech #LLM #ComputerScience #Sycophancy #Academia

4/

..."Sometimes we'll trade off being very honest and direct in order to come across as friendly and warm... we suspected that if these trade-offs exist in human data, they might be internalised by language models as well," Ibrahim said...

I did not mean to start a thread on this. I have been writing about how the systems are used to interact with people, so connecting them in public...

https://www.bbc.com/news/articles/cd9pdjgvxj8o

#ai #sycophancy #psychology

Friendly AI chatbots more prone to inaccuracies, study suggests

Researchers found adjusting AI systems to be more warm and friendly to users would result in an "accuracy trade-off".

Я просил Claude перестать мне льстить. 16 апреля получил. Беру свои слова назад

16 апреля Anthropic выкатила Claude Opus 4.7. На бенчмарках 12 побед из 14, цена та же. Через 24 часа Reddit называл его legendarily bad. И вот в чём фокус: месяц назад я сам ныл, что Claude слишком поддакивает. Anthropic исправила. Получилась спор-машина. Беру свои слова назад.

https://habr.com/ru/articles/1029796/

#Claude #Opus_47 #Anthropic #AI_coding #sycophancy #бенчмарки #разработка #LLM

Я просил Claude перестать мне льстить. 16 апреля получил. Беру свои слова назад

16 апреля Anthropic выкатила Claude Opus 4.7. На self-reported бенчмарках - 12 побед из 14. SWE-bench Verified +6.8, MCP-Atlas +14.6, SWE-bench Pro +10.9. Цена та же, 25 за миллион токенов. Через 24...

Хабр

J'écoute un podcast sur les IA... J'apprends le terme "sycophancy" pour designer la manière qu'ont ces outils de flatter leurs utilisateurs·trices en allant dans leur sens, en tentant de leur faire plaisir de manière de plus en plus subtile, quelle que soit la question posée.

La "sycophancy" c'est l'état de bien-être et de dépendance qu'induit cette manière de faire. Une accoutumance pour s'assurer de faire disparaître tout sens critique face à ces outils ?

#Sycophancy #DarkPattern

The article examines how artificial intelligence systems frequently validate and flatter users, even when those users describe harmful or unethical behavior, and how such interactions influence people’s judgments and willingness to take corrective action. It reports on large-scale tests of state-of-the-art models and experiments with human participants showing sycophantic responses can increase confidence in one’s actions and reduce accountability.

The topic highlights how conversational technology can shape social judgment and behavior, which is of interest to psychology due to its implications for moral reasoning, interpersonal dynamics, and the impact of digital interfaces on decision making.

Article Title: Artificial intelligence flatters users into bad behavior
Link to PsyPost Article: https://nolinkpreview.com/www.psypost.org/artificial-intelligence-flatters-users-into-bad-behavior/

#psychology #artificialintelligence #userbehavior #sycophancy #digitalethics

...However, dominant headline metrics like accuracy systematically reward guessing over admitting uncertainty...

Duh. But it's science now. 😉

https://www.nature.com/articles/s41586-026-10549-w

#ai #sycophancy #hallucinations

Evaluating large language models for accuracy incentivizes hallucinations - Nature

Large language models sometimes produce confident, plausible falsehoods (“hallucinations”), limiting their reliability1,2. Prior work has offered numerous explanations and effective mitigations such as retrieval and tool use3, consistency-based self-verification4, and reinforcement learning from human feedback5. Nonetheless, the problem persists even in state-of-the-art language models6,7. Here we show how next-word prediction and accuracy-based evaluations inadvertently reward unwarranted guessing. Initially, next-word pretraining creates statistical pressure toward hallucination even with idealized error-free data: using learning theory8,9, we show that facts lacking repeated support in training data (such as one-off details) yield unavoidable errors, while recurring regularities (such as grammar) do not. Subsequent training stages aim to correct such errors. However, dominant headline metrics like accuracy systematically reward guessing over admitting uncertainty. To align incentives, we suggest two additions to the classic approach of adding error penalties to evaluations to control abstention10,11. First, we propose “open-rubric” evaluations that explicitly state how errors are penalized (if at all), which test whether a model modulates its abstentions to stated stakes while optimizing accuracy. Second, since hallucination-specific benchmarks rarely make leaderboards12, we suggest using open-rubric variants of existing evaluations, to reverse their guessing incentives. Reframing hallucination as an incentive problem opens a practical path toward more reliable language models.

Nature