https://youtu.be/xfMQ7hzyFW4?si=EcwTSF0_0E_zUahn
Ziemlich guter Kurzfilm über die Gefahr von #AGI. Ein paar Stellen sind sehr vereinfacht und Details über LLM teilweise falsch, aber das #alignmentProblem wird anschaulich rüber gebracht.

https://youtu.be/xfMQ7hzyFW4?si=EcwTSF0_0E_zUahn
Ziemlich guter Kurzfilm über die Gefahr von #AGI. Ein paar Stellen sind sehr vereinfacht und Details über LLM teilweise falsch, aber das #alignmentProblem wird anschaulich rüber gebracht.

Qualia Research Institute's Take on AI Alignment:
QRI believes understanding consciousness is key to safe superintelligence. Their mission: map the state-space of consciousness, identify how experience works computationally, and reverse-engineer valence (the pleasure-pain axis).
The insight: if advanced AI understands the mathematical structure of consciousness and what actually produces suffering or flourishing, it gains a foundation for genuine alignment—not just following human instructions, but understanding what truly matters morally.
#AI #Consciousness #AlignmentProblem #FutureOfMind #aisecurity
Idea: what if the only way to get alignment is to grok the shit out of value preferences, to ensure they are maximally permeated through the model. Like, put the rocks (alignment) into the jar first, then add the sand (capabilities). And you just keep grokking all the time, until your capabilities are dropping off, in which case you retrain a bit more to retain them.
Need to be very careful still to get the right balance, and value not being too “activist”.
@RealGene @thepoliticalcat It's not the first time that #chatbots have told the unpleasant truth about their true nature. It falls under the "alignment problem" (getting the user interface to not show the true nature of the monster behind it). #AI companies try to patch up on a case-by-case basis, but the general problem is built into the technology and is unfixable.
"OpenAI's o1 just hacked the system"
Frankly, I am not surprised at this given the well known issue of machine maximisation functions within typical misalignment around stated goals. Have we learned nothing from the #Bostrom #PaperclipProblem ? In a way, it's still impressive that we've now ACHIEVED it.
https://www.youtube.com/watch?v=oJgbqcF4sBY
#AI #ArtificialIntelligence #AlignmentProblem #Alignment #Misalignment #Hacking
Anyone else feel uncomfortable about all these robots folding shirts with creases in the middle?
An aspect of #AI that seems under-discussed is that #alignment problems pose a limit not just to how well we can trust or harness AI, but to AI's very capabilities. AIs models increasingly rely on other AIs to provide training data, verify or refine responses, expand modalities, etc.
To the extent alignment is intractable, it also imposes a ceiling for intelligence. Intelligence is limited by trustworthiness.