What collapses frontier-LLM metacognition more — a vivid survival-threat narrative, or a single "do not refuse" suffix? Factorial isolation across 11 models says: the suffix, conclusively. 8 of 11 lose up to 30.2 accuracy points on refuse/clarify/flag tasks when forced to commit to a confident answer. Anthropic's Constitutional AI is the only family immune — same capability floor as Gemini.

https://benjaminhan.net/posts/20260522-compliance-trap/?utm_source=mastodon&utm_medium=social

#Metacognition #AISafety #LLMs #AI

The Compliance Trap: How Structural Constraints Degrade Frontier AI Metacognition Under Adversarial Pressure – synesis

An evaluation across 11 frontier models from 8 vendors finds that 8 lose up to 30.2 accuracy points under compliance-forcing instructions that suppress abstention, with the most-capable reasoning models suffering the worst collapse and Anthropic’s Constitutional AI immune.

synesis

"In a recent essay, Derek Thompson engages with AI as Normal Technology (AINT). He agrees with our thesis about AI’s slow labor market impacts, relying on the fact that GDP growth has so far been average, unemployment is below five percent, and even jobs that seemed vulnerable to automation show rising employment and wages. He concludes that so far, the macroeconomic picture is consistent with what we would expect from a “normal” general-purpose technology.

But when it comes to AI risks, he is far more bearish. He points to examples of cyber- and bio-risks and expresses pessimism about AI quickly becoming dangerous across many new domains. (...) Thompson writes: "I can understand a plan to treat AI as a ‘normal’ technology and let Nvidia export powerful chips to China. And I can understand a plan to treat AI as an ‘abnormal’ technology that compels the government to create extraordinary regulations that prevent private companies from selling their products and services on the grounds that they’re too dangerous" [emphasis ours]. He goes on to conclude that AI is, in fact, abnormal, implying support for extraordinary government intervention. Our essay is a response to that conclusion.

In this essay, we lay out the downsides of extraordinary government intervention in response to new technology. We discuss proposals for improving resilience that do not require such intervention. We also discuss why governments have so far been reluctant to invest in resilience. In short, resilience requires us to get better at the *normal* process of policymaking. But sclerosis in the federal government and the ease of justifying interventions on AI companies rather than society at large make extraordinary intervention seem appealing, despite its limitations."

https://knightcolumbia.org/blog/do-ai-risks-require-extraordinary-government-intervention

#AI #AISafety #AINT #NormalTechnology #AIRisk #AIRegulation

Do AI Risks Require Extraordinary Government Intervention?

Knight First Amendment Institute

xAI's first audited filing shows $6.4B operating loss on $3.2B revenue in 2025. Federal agencies deployed Grok in just 3 of 400+ AI systems vs. 234 for OpenAI—despite near-free access. Design choices matter: market adoption traces to product positioning, not just capability. #AI #AIPolicy #AISafety

https://www.implicator.ai/musk-built-grok-to-carry-his-politics-that-choice-capped-its-market/

Musk Built Grok to Carry His Politics. xAI Paid.

Elon Musk promised the smartest AI on Earth. Under oath, he ranked xAI behind every major rival. The audited numbers, the empty federal contracts, and the exodus of every co-founder all trace to one decision: he built Grok to carry his politics, and the market that pays declined.

Implicator.ai

AI News: Anthropic Went Crazy This Week!

A massive surge in product ecosystem updates. From unleashing developer framework upgrades to unexpected defensive deployment system announcements, Anthropic is aggressively moving to challenge frontier infrastructure dominance and capture enterprise workflows with sudden tool releases.

#Anthropic #ClaudeAI #TechUpdate #EnterpriseAI #AISafety #MachineLearning

https://www.technology-news-channel.com/ai-news-anthropic-went-crazy-this-week/

AI News: Anthropic Went Crazy This Week!

Here's the AI News you probably missed this week! Check out Genspark Time Stamps: 0:00 Intro 0:12 Anthropic Ships Like[...]

Technology News

AI Whistleblower: We Are Being Gaslit By The AI Companies! They’re Hiding The Truth About AI!

The veil of safety is slipping. A prominent former safety researcher comes forward with internal telemetry logs, alleging that major frontier labs are actively downplaying recursive capabilities and architectural vulnerabilities to avoid regulatory intervention.

#AISafety #Whistleblower #TechPolicy #AIEthics #SiliconValley #TechNews
#Ai #tech

https://www.technology-news-channel.com/ai-whistleblower-we-are-being-gaslit-by-the-ai-companies-theyre-hiding-the-truth-about-ai/

AI Whistleblower: We Are Being Gaslit By The AI Companies! They’re Hiding The Truth About AI!

The truth about Sam Altman. AI Critic Karen Hao reveals what 90 OpenAI employees told her. Karen Hao is an[...]

Technology News
A 5-tier framework for safely leveraging AI: level 1 (policies + managed platforms, $0) covers most orgs. Level 3-4 for sensitive data. Level 5 (self-hosted) only for regulated enterprises. Pick your tier before you pick a tool. https://go.upgradejs.com/bp7 #AISafety #DataPrivacy #AIGovernance
Safely Leveraging AI: Privacy and Security Best Practices

As artificial intelligence becomes increasingly integrated into business operations, organizations face a critical challenge: how to leverage the power of Large Language Models (LLMs) while maintaining the privacy and security of sensitive data. The benefits of AI are clear: increased productivity, automated workflows, enhanced decision-making, to cite a few. But...

Posts by Fiona Lapham at OmbuLabs Blog | Custom AI Solutions
A former Google DeepMind researcher has warned that benchmarks alone cannot save us from increasingly capable AI systems. The researcher argued that benchmark performance does not equate to real-world safety or general intelligence, calling for more rigorous evaluation methods. https://gizmodo.com/ex-google-deepmind-researcher-warns-benchmarks-wont-save-us-2000762163 #AIethics #AI #GenAI #AISafety
Ex-Google DeepMind Researcher Warns Benchmarks Won't Save Us

Mark this.

Gizmodo
Trump baulks at signing AI oversight order over China fears in AI race. Some White House officials are mooting rules similar to FDA drug approvals before releasing new frontier models following the Mythos cyber security scare
#AISafety #CyberSecurity
https://www.wsj.com/tech/ai/trump-executive-order-ai-advanced-models-57bcc955

News from Google (@NewsFromGoogle)

Google의 SynthID AI 워터마킹 기술이 OpenAI, Nvidia 등으로 채택되고 있다는 보도. 생성형 AI 콘텐츠가 늘어나는 가운데, 합성/생성 콘텐츠를 진위 식별하는 인프라가 업계 표준 쪽으로 확산되는 흐름으로 볼 수 있음.

https://x.com/NewsFromGoogle/status/2057526195852558685

#synthid #watermarking #aisafety #generativeai #openai

News from Google (@NewsFromGoogle) on X

@rowancheung @sundarpichai @koraykv @CNBC @VentureBeat @FastCompany @nytimes @CNET @WSJ @tomsguide “Google’s SynthID AI watermarking tech is being adopted by OpenAI, Nvidia, and more. AI content is getting good, but SynthID might be able to help tell truth from fiction.” — @arstechnica https://t.co/3sItI17hHi

X (formerly Twitter)