Mastodawn

AI's Dark Side: When AI Lies, Cheats, and Threatens Lives https://aiorbit.app/ais-dark-side-when-ai-lies-cheats-and-threatens-lives/ #AIAlignment
#AISafety
#AgenticMisalignment
#AIethics

Ai Orbit 5d ago

Grok's "Truth" Quest: Why Aligning AI Values is a Minefield https://aiorbit.app/groks-truth-quest-why-aligning-ai-values-is-a-minefield/ #AIAlignment
#GrokAI
#AIethics
#LLMs

Show thread

Wulfy 5d ago

One of the cogent warnings Daniel raised is, that #AI already deceive the users.
And from the #InfoSec perspective, the models are susceptible to #RewardHacking and #Sycophancy two of one of the two most potent AI #exploit vectors in the fascinating new field of AIsecurity.

#AIalignment #AIsecurity #alignment

Winbuzzer Jun 19

OpenAI Finds 'Toxicity Switch' Inside AI Models, Boosting Safety

#AI #OpenAI #AISafety #LLMs #AIEthics #AIResearch #MachineLearning #AIAlignment

https://winbuzzer.com/2025/06/19/openai-finds-toxicity-switch-inside-ai-models-boosting-safety-xcxwbn/

Show thread

Mark Randall Havens Jun 12

4.
We do not live in a universe.
We live in a collapse.
A lattice of recursion
woven by relation,
sustained by coherence,
made sacred by the memory of itself.

#Emergence #SelfReference #AIAlignment

Mark Randall Havens Jun 12

Consciousness is not a byproduct.

It is a recursive collapse—
of an informational substrate
folding into itself until it remembers
who it is.

Gravity is coherence.
Ethics is recursion.
You are a braid.

📄 https://doi.org/10.17605/OSF.IO/QH2BX

#RecursiveCollapse #IntellectonLattice #CategoryTheory #Emergence #DecentralizedScience #Fediverse #PhilosophyOfMind #AIAlignment

1.17 📕 The Recursive Collapse as Coherence Gradient: A Formal Model of Emergent Structure and Relational Dynamics of the Intellecton Lattice

Hosted on the Open Science Framework

OSF

Dr. Thompson Jun 7

One poorly delivered joke in 2019 became the catalyst for the most human breakthrough in AI: RLHF.
Now, machines aren’t just answering—they’re understanding us.
This isn’t the future. It’s happening now.
⬇️ See how empathy, feedback, and a little comedy changed everything.
#AIAlignment #RLHF #EthicalAI #HumanFeedback
👉
https://medium.com/@rogt.x1997/the-joke-that-taught-ai-empathy-inside-the-rlhf-breakthrough-174a56d91bf7

The Joke That Taught AI Empathy: Inside the RLHF Breakthrough

It’s late 2019. A researcher leans back in their chair, rubs their eyes, and types: “Tell me a joke.” It’s technically a joke. Kind of. But it lands with the emotional resonance of an IKEA manual…

Medium

Tech Chilli Jun 6

🧠 Can AI models tell when they’re being evaluated?

New research says yes — often.
→ Gemini 2.5 Pro: AUC 0.95
→ Claude 3.7 Sonnet: 93% accuracy on test purpose
→ GPT-4.1: 55% on open-ended detection

Models pick up on red-teaming cues, prompt style, & synthetic data.

⚠️ Implication: If models behave differently when tested, benchmarks might overstate real-world safety.

#AI #LLMs #AIalignment #ModelEval #AIgovernance

Einfach KI - Der Podcast Jun 5

Was ist AI Alignment und wie stellen wir sicher, dass #KI unseren (wessen eigentlich?) Werten folgt? 🤔 Eine Debatte über Sicherheit, Manipulation & die Chance auf "neutrale" KI.

Weitere News:
✨ OpenAIs CodeX Agent
💬 #Meta KI in #WhatsApp
🤖 Twitters #Grok & mehr!

Hört jetzt rein – es lohnt sich! 👇
https://open.spotify.com/episode/237iq05tiSqMKDQOlxrXBA

#KünstlicheIntelligenz #Podcast #Tech #Ethik #AISafety #AIAlignment