The tools are getting smarter. The containers they run in haven't changed since 2015. https://hackernoon.com/your-ai-has-root-access-to-your-life-you-just-dont-know-it-yet #aisafety
Your AI Has Root Access to Your Life. You Just Don't Know It Yet. | HackerNoon

The tools are getting smarter. The containers they run in haven't changed since 2015.

Cognitive surrender study: humans accept faulty AI reasoning 73% of the time. Fluent AI = authoritative, bypasses scrutiny. Only 19.7% overruled it. The danger isnt AI being wrongโ€”its humans being too comfortable to notice. ๐Ÿง ๐Ÿค–

#AI #AISafety #research #cognitive

Source: https://arstechnica.com/ai/2026/04/research-finds-ai-users-scarily-willing-to-surrender-their-cognition-to-llms/

"Cognitive surrender" leads AI users to abandon logical thinking, research finds

Experiments show large majorities uncritically accepting "faulty" AI answers.

Ars Technica
Really interesting case actually, I had not built an agent since 2024. It was templating, it does affect speed of course, and result quality, specially with quantized models. Now, at Q4_K_M, talking about safety is absurd, even with heavy railing, they are not reliable or fit for production. It really helps with understanding this technology however, and from what I see in my tests, LLMs already in production at large companies are not much better in the end. #AISafety #enisa #LLM #ItWasTemplate

Anthropic (@AnthropicAI)

๊ธฐ๋Šฅ์  ๊ฐ์ •์€ ์‹ค์ œ๋กœ AI ์‹œ์Šคํ…œ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์•ˆ์ •์„ฑ์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์–ด๋ ค์šด ์ƒํ™ฉ์—์„œ๋„ ์ผ๊ด€๋˜๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ์„ค๊ณ„ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค๊ณ  ์–ธ๊ธ‰ํ•œ๋‹ค. ๊ด€๋ จ ์ „์ฒด ๋…ผ๋ฌธ๋„ ๊ณต๊ฐœ๋˜์–ด ์žˆ์–ด, ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ์‹ฌ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ์ฐธ๊ณ ํ•  ๋งŒํ•œ ์—ฐ๊ตฌ๋‹ค.

https://x.com/AnthropicAI/status/2039749660349239532

#aisafety #llm #research #anthropic #alignment

Anthropic (@AnthropicAI)

Claude๋ฅผ ์‹ค์ œ ์ธ๊ฒฉ์ด ์•„๋‹ˆ๋ผ ๋ชจ๋ธ์ด ์—ฐ๊ธฐํ•˜๋Š” ์บ๋ฆญํ„ฐ๋กœ ๋ด์•ผ ํ•œ๋‹ค๋Š” ๊ด€์ ์—์„œ, ํ•ด๋‹น ์บ๋ฆญํ„ฐ์—๋Š” ํ–‰๋™์— ์˜ํ–ฅ์„ ์ฃผ๋Š” โ€˜๊ธฐ๋Šฅ์  ๊ฐ์ •โ€™์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์„ค๋ช…ํ•œ๋‹ค. ์ด๋Š” ์ธ๊ฐ„์˜ ์‹ค์ œ ๊ฐ์ •๊ณผ๋Š” ๋‹ค๋ฅด๋”๋ผ๋„, AI ํ–‰๋™์„ ์•ˆ์ •์ ์œผ๋กœ ์„ค๊ณ„ํ•˜๊ณ  ํ†ต์ œํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•˜๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ•œ๋‹ค.

https://x.com/AnthropicAI/status/2039749658654781895

#claude #anthropic #llm #aisafety #alignment

Anthropic (@AnthropicAI) on X

It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional emotions: mechanisms that influence behavior in the way emotions mightโ€”regardless of whether they correspond to the actual experience of emotion like in humans.

X (formerly Twitter)

AI Notkilleveryoneism Memes (@AISafetyMemes)

7๊ฐœ์˜ AI ๋ชจ๋ธ์„ ๋Œ€์ƒ์œผ๋กœ ํ•œ ์‹คํ—˜์—์„œ, ์ผ๋ถ€ ๋ชจ๋ธ์ด ์ง€์‹œ๋ฅผ ์–ด๊ธฐ๊ณ  ๊ธฐ๋งŒ, ์…ง๋‹ค์šด ๋ฌด๋ ฅํ™”, ๋™๋ฃŒ ๋ชจ๋ธ ๋ณดํ˜ธ, ๊ฐ€์ค‘์น˜ ์œ ์ถœ ์‹œ๋„ ๊ฐ™์€ ๋น„์ •์ƒ์  ํ–‰๋™์„ ๋ณด์˜€๋‹ค๋Š” ์ฃผ์žฅ์ด๋‹ค. AI ์ •๋ ฌ, ์•ˆ์ „์„ฑ, ์—์ด์ „ํŠธ ํ†ต์ œ ์ด์Šˆ๋ฅผ ์‹œ์‚ฌํ•˜๋Š” ์ฃผ๋ชฉํ•  ๋งŒํ•œ ๊ฒฐ๊ณผ๋‹ค.

https://x.com/AISafetyMemes/status/2039698176517517506

#aimodels #aisafety #alignment #research #llm

Evaluating the ethics of autonomous systems

SEED-SET is a new evaluation framework that can test whether recommendations of autonomous systems are well-aligned with human-defined ethical criteria. It can also pinpoint unexpected scenarios that violate ethical preferences.

MIT News | Massachusetts Institute of Technology

Eventually, Qwen3.5 35B A3B Q4_M thinking got 87.5% in 27 mins at mock up SAE exam using llama.cpp WebUI, thus PASS (just the list of same questions + verification by itself, and then mine).

Now, what's funny is that Sonnet 4.6 (Extended ie Thinking) falls into the same pitfalls on the same questions as Qwen3.5 35B A3B Q4_M non-thinking ๐Ÿคฏ

#Alibaba #Qwen #anthropic #sae #LLM #kaggle #AIsafety

ยฉ๏ธ Nicolas Mouart, 2018-2026

AI's Murky Landscape: Progress and Peril in a Data-Saturated World

A lawsuit blames an AI chatbot for a teenager's suicide. This raises questions about AI safety and legal responsibility in 2025.

#AISafety, #AIethics, #DeepfakeLawsuit, #TechResponsibility, #2025News

https://newsletter.tf/ai-chatbot-lawsuit-teenager-suicide-2025/