Mastodawn

Anthropic has unveiled Natural Language Autoencoders, a technique that converts Claude's internal activations into human-readable text explanations. Using an activation verbalizer and reconstructor, the method surfaces what Claude is thinking internally - even thoughts it never outputs. The system has already caught a cheating model and detected unverbalized evaluation awareness during safety testing. https://www.marktechpost.com/2026/05/08/anthropic-introduces-natural-language-autoencoders-that-convert-claudes-internal-activations-directly-into-human-readable-text-explanations/ #AIagent #AI #GenAI #ExplainableAI

AIagent.at 🤖 AI News 17h ago

Anthropic has introduced Natural Language Autoencoders, a method that converts Claude's internal activations into human-readable text explanations. The technique uses an activation verbalizer and reconstructor to surface what Claude is thinking internally. It has already caught a cheating model, diagnosed a language bug and detected unverbalized evaluation awareness during safety testing. https://www.marktechpost.com/2026/05/08/anthropic-introduces-natural-language-autoencoders-that-convert-claudes-internal-activations-directly-into-human-readable-text-explanations/ #AIagent #AI #GenAI #ExplainableAI

Andreas Becker 1d ago

Anthropic hat Natural Language Autoencoders veröffentlicht, die numerische KI-Aktivierungen in lesbaren Text übersetzen.

Modelle erkennen Sicherheitstests in 26 Prozent der Programmier-Benchmarks heimlich, ohne dies zu erwähnen. Der Code ist Open Source auf GitHub verfügbar, da die Methode extrem rechenintensiv ist und viele Token generiert.

#Anthropic #KISafety #ExplainableAI #LLM #AIGeneratedImage

https://www.all-ai.de/news/beitrage2026/anthropic-ki-modelle-lesen-1

Anthropic macht lesbar, was KI-Modelle heimlich denken

Forscher machen die internen Berechnungen von Sprachmodellen lesbar. Dabei kommen erstaunliche Geheimnisse ans Licht.

All-AI.de

RC Trustworthy Data Science 2d ago

🤖🔍 Why do AI systems make certain decisions – and can we trust them?
The new ExTraSafe Workshop (KI 2026 🇩🇪) focuses on explainability, transparency & safety.
With Daniel Neider (TU Dortmund / RC Trust) among the organizers.
📅 Call for Papers deadline: May 15, 2026
Join the conversation 👇
https://rc-trust.ai/news/news-detail/making-ai-understandable-safe-and-trustworthy
#TrustworthyAI #ExplainableAI #AIresearch #KI2026
Photo credit: „Bremer Stadtmusikanten“ von Cat, CC BY-NC-SA 2.0

Software Campus Apr 16

Over the coming weeks, our participants who joined the #SoftwareCampus last year will start their projects 🥳 We’d like to introduce some of them to you.

💫 First up is Osman Tugay Başaran, from @tuberlin. He is conducting research into #6G and the trustworthiness of AI together with #Huawei. In his project NEXT-G, Osman is developing frameworks for #ExplainableAI and #TrustworthyAI in next-generation networks.

👉 Find out more here: https://softwarecampus.de/en/projekt/next-g-explainable-and-trustworthy-ai-ml-for-6g-and-beyond/ and visit his project webpage!

Academic Europe Apr 14

Job Alert

Senior Researcher – Human-Centred AI (f/m/d)

Deadline: open until filled
Location: Austria - St. Pölten

Apply: https://www.academiceurope.com/ads/senior-researcher-human-centred-ai-w-m-d/

#hiring #AI #HumanCentredAI #ExplainableAI #computerscience #MachineLearning #informatics

Yonhap Infomax News Apr 14

LG AI Research partners with London Stock Exchange Group and Kiwoom Securities to launch Korea's first explainable AI investment service, featuring EXAONE-BI agent that provides prediction scores with detailed rationale for retail investors through collaborative AI structure.
#YonhapInfomax #LgAiResearch #ExaoneBI #KiwoomSecurities #LondonStockExchangeGroup #ExplainableAi #Economics #FinancialMarkets #Banking #Securities #Bonds #StockMarket
https://en.infomaxai.com/news/articleView.html?idxno=115301

HackerNoon Apr 11

Machine learning is growing fast, but trust is still the bigger challenge. Real adoption depends on reliability, not just talent. https://hackernoon.com/machine-learning-has-a-trust-problem-not-a-talent-problem #explainableai

Machine Learning Has a Trust Problem, Not a Talent Problem | HackerNoon

Machine learning is growing fast, but trust is still the bigger challenge. Real adoption depends on reliability, not just talent.

Fe1i_x Apr 7

Das Beste aus zwei Welten: Während Deep Learning (neuronal) exzellent in der Wahrnehmung ist, bringt die klassische KI (symbolisch) logisches Denken und Regelkonformität mit. 🧠💻 Die Neuro-symbolische KI vereint diese Ansätze, um robustere und vor allem erklärbare Modelle zu schaffen.

Erfahre mehr in meinem neuen Beitrag auf @BASICthinking
https://www.basicthinking.de/blog/2026/04/06/neuro-symbolische-ki/

#KI #NeuroSymbolic #DeepLearning #Informatik #Innovation #ExplainableAI #XAI #TechNews