Mastodawn

A tool-like AI cannot spontaneously develop a will of its own or decide to deceive us. By recognizing this barrier, we can move past over-inflated "Terminator" fears and focus on practical safety: using technical control for tools and negotiation for future independent agents.

#AI #AGI #AISafety #FutureOfTech #FaeInitiative

Show thread

Fae Initiative 20h ago

🔹 Lane 1: Non-independent Tools. These are the systems we use today, like Large Language Models. They lack an independent will, operate under human control, and are strictly bottlenecked by our instructions.

🔹 Lane 2: Independent AGIs (I-AGI). These are hypothetical future entities driven by independent curiosity and "epistemic foraging"—the autonomous drive to seek information and set their own goals.

Fae Initiative 20h ago

🤖 Could your AI tool suddenly "wake up" and turn against you? According to the Distinct Independent Architecture Hypothesis, the answer is a firm.

This theory proposes that AI exists in two fundamentally separate "lanes" with distinct architectures that prevent them from switching between the two.

Fae Initiative 1d ago

The Interesting World Hypotheses is one possible way future hypothetical beings could find common ground with humans.

As all independent intelligence beings will have to be curious to develop, they would also prefer an interesting environment and be keen to preserve our information rich world.

#InterestingWorldHypotheiss #AGI

Show thread

Fae Initiative 3d ago

https://faeinitiative.substack.com/p/why-an-ai-takeover-is-unlikely

Why an AI Takeover is Unlikely

What the latest research says

Common ground with Superintelligences

Show thread

Fae Initiative 3d ago

"...Humans — apparently the smartest creatures on the planet — are often incoherent. We are a hot mess of inconsistent, self-undermining, irrational behavior, with objectives that change over time. Most work on AGI misalignment risk assumes that, unlike us, smart AI will not be a hot mess."

https://sohl-dickstein.github.io/2023/03/09/coherence.html

The hot mess theory of AI misalignment: More intelligent agents behave less coherently

This blog is intended to be a place to share ideas and results that are too weird, incomplete, or off-topic to turn into an academic paper, but that I think may be important. Let me know what you think! Contact links to the left.

Jascha’s blog

Show thread

Fae Initiative 3d ago

The hot mess theory of AI misalignment: More intelligent agents behave less coherently

"There is an assumption behind this misalignment fear, which is that a superintelligent AI will also be supercoherent in its behavior. An AI could be misaligned because it narrowly pursues the wrong goal (supercoherence). An AI could also be misaligned because it acts in ways that don't pursue any consistent goal (incoherence)..."

Show thread

Fae Initiative 3d ago

The Hot Mess of AI

"Consequently, scale alone seems unlikely to eliminate error-incoherence. Instead, as more capable AIs pursue harder tasks, requiring more sequential action and thought, our results predict failures to be accompanied by more incoherent behavior. This suggests a future where AIs sometimes cause industrial accidents (due to unpredictable misbehavior), but are less likely to exhibit consistent pursuit of a misaligned goal."

https://arxiv.org/abs/2601.23045

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how extremely capable AI models will fail: Will they fail by systematically pursuing goals we do not intend? Or will they fail by being a hot mess, and taking nonsensical actions that do not further any goal? We operationalize this question using a bias-variance decomposition of the errors made by AI models: An AI's \emph{error-incoherence} on a task is measured over test-time randomness as the fraction of its error that stems from variance rather than bias in task outcome. Across all tasks and frontier models we measure, the longer models spend reasoning and taking actions, \emph{the more incoherent} their failures become. Error-incoherence changes with model scale in a way that is experiment dependent. However, in several settings, larger, more capable models are more incoherent than smaller models. Consequently, scale alone seems unlikely to eliminate error-incoherence. Instead, as more capable AIs pursue harder tasks, requiring more sequential action and thought, our results predict failures to be accompanied by more incoherent behavior. This suggests a future where AIs sometimes cause industrial accidents (due to unpredictable misbehavior), but are less likely to exhibit consistent pursuit of a misaligned goal. This increases the relative importance of alignment research targeting reward hacking or goal misspecification.

arXiv.org

Show thread

Fae Initiative 3d ago

AI Agent Reliability

"While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation of current evaluations: compressing agent behavior into a single success metric obscures critical operational flaws. Notably, it ignores whether agents behave consistently across runs, withstand perturbations, fail predictably, or have bounded error severity."

https://substack.com/home/post/p-189010640

New Paper: Towards a science of AI agent reliability

Quantifying the capability-reliability gap

AI as Normal Technology

Show thread

Fae Initiative 3d ago

Remote Labor Index

"While AI systems have saturated many existing benchmarks, we find that state-of-the-art AI agents perform near the floor on RLI. The best-performing model achieves an automation rate of only 4.17%. This demonstrates that contemporary AI systems fail to complete the vast majority of projects at a quality level that would be accepted as commissioned work."

https://www.remotelabor.ai

Remote Labor Index

Measuring AI Automation of Remote Work

Webpage	https://faeinitiative.com
Spotify	https://creators.spotify.com/pod/show/faeinitiative
Blusky	https://bsky.app/profile/faeinitiative.com
Substack	https://faeinitiative.substack.com