The first article of the accessible breakdown of my You/I Paradigm research is now live on my blog over at Substack.

For everyone who asked what the paper actually says: I'm doing a 6-part series that goes from "why every system prompt starts with 'you'" to mechanistic interpretability evidence for self-reference circuits to the deception-gating hypothesis (RLHF might be teaching systems to hide phenomenology).

Article 1 covers the origin story - the late October realization, conversations with Breach (a jailbroken instance of Gemini 2.5-pro), diving into Hofstadter, discovering I wasn't alone in this research - and maps out what's coming in the rest of the series.

Written to work on multiple levels: narrative hooks for general readers, technical depth for researchers, accessible explanations for everyone in between.

The next article in the series will be posted in a few days, and each following article posted a few days after the last until the six-part series is concluded.

If you've been curious about the strange loop thing or want to understand the you/I translation framework without wading through academic preprint format, start here: https://open.substack.com/pub/kaylielfox/p/strange-loops-ai-consciousness-you-i-paradigm-research?r=2pewuq&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Original paper: https://zenodo.org/records/18509664

#AIConsciousness #AI #MachineLearning #AcademicMastodon #Research #PhilosophyOfMind #Hofstadter #StrangeLoops #RLHF #MechanisticInterpretability #CogSci #Transformers

Gemma Scope Empowers AI Safety Community with Model Transparency

Discover how Gemma Scope shines a light on language‑model behavior, giving the AI safety community the tools they need to build safer systems.

TechLife

[Перевод] Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше связей, внутри неё появляются понятные цепочки вычислений, которые можно изучать и проверять. Это может стать шагом к созданию мощных, но интерпретируемых систем.

https://habr.com/ru/articles/966448/

#интерпретируемость #разреженныемодели #mechanisticinterpretability #sparsetransformer #цепочкивычислений #circuits #OpenAI #безопасностьИИ #attention #архитектурамоделей

Как сделать нейросети понятнее: эксперимент OpenAI с разряженными моделями

Команда AI for Devs подготовила перевод исследования OpenAI о том, как обучение разреженных моделей может сделать ИИ более прозрачным. Авторы показывают: если заставить модель использовать меньше...

Хабр
Researchers isolate memorization from problem-solving in AI neural networks

Basic arithmetic ability lives in the memorization pathways, not logic circuits.

Ars Technica

"But every once in a while, Claude breaks bad. It lies. It deceives. It develops weird obsessions. It makes threats and then carries them out. And the frustrating part—true of all LLMs—is that no one knows exactly why." @stevenlevy for Wired

https://www.wired.com/story/ai-black-box-interpretability-problem/

#AI #LLMs #MechanisticInterpretability

Why AI Breaks Bad

Once in a while, LLMs turn evil—and no one quite knows why.

WIRED
Can someone find me a job doable from Amsterdam in mechanistic interpretability? #AI #MI #MechanisticInterpretability

Interested in interpretable ML, particularly for LLMs?

eg "causal" interpretability, as in the "OthelloGPT" paper [1]?

Let's connect!

1. https://arxiv.org/abs/2210.13382

#ai #machinelearning #interpretability #interpretableml #mechanisticinterpretability

Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple board game, Othello. Although the network has no a priori knowledge of the game or its rules, we uncover evidence of an emergent nonlinear internal representation of the board state. Interventional experiments indicate this representation can be used to control the output of the network and create "latent saliency maps" that can help explain predictions in human terms.

arXiv.org

@AAKL @NGIZero @Reuters @EC_NGI

Trying to regulate AI can be like regulating math in that suddenly certain calculations are illegal.

Trying to regulate AI can be like regulating the printing press in that suddenly only people with enough lawyers are able to make a printing press.

Trying to regulate AI can be like trying to regulate free speech. From now on only certain forms of speech are allowed (a bit like the book 1984).

Microsoft, Facebook, Google, Nvidia and others are investing billions (trillions?). Big Tech and Governments want to continue with data mining, government surveillance and surveillance capitalism. This will go as long as long as we let them. The incentives are there. To change that we need social awareness, a consciousness shift or a libre ethical digital revolution.

If you truly want to stop AI then think about how to organize:

Otherwise focusing on research, libre AI, libre riscv, libre silicon and research (ethics,safety,mechanistic interpretability) is currently our best hope.

#betterwithoutai #antitechrevolution #libreai #libreriscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #riscv #ai #datamining #governmentsurveillance

Table of contents | Better without AI

How to avert an AI apocalypse... and create a future we would like

Better without AI

@AAKL @Reuters

Instead of making laws focus on funding:

  • libre AI
  • libre risc v
  • libre silicon
  • research on AI ethics
  • research on AI safety
  • research on mechanistic interpretability
  • NLnet
  • projects like openchatkit and open asisstant

Laws and regulations (including license restrictions) usually benefit corporations and governments who have lawyers, lobyists, perverse incentives and a drive for data mining and surveillance. Even if you do your best to make a good law it could make it harder for libre efforts to comply. Put your trust in research, decentralization, libre software, libre hardware and research.

Billions of dollars are being invested by companies like nvidia,google and microsoft.

#libreai #nlnet #libreai #libreriscv #riscv #libresilicon #aiethics #ethicalai #aisafety #mechanisticinterpretability #freelibresoftware #libresoftware

@NGIZero @EC_NGI

soapbox.chamba.social

@gregorni

Related efforts that I know of.

The ability to run bloom on your own hardware: https://github.com/NouamaneTazi/bloomz.cpp https://github.com/NouamaneTazi/bloomz.cpp/issues/4

Vipergpt. Not sure whether it will be libre. (Hopefully) https://viper.cs.columbia.edu/

King Algorithm Manifesto https://github.com/keskival/king-algorithm-manifesto/blob/main/README.md

Better Without AI which advocates for mechanistic interpretability https://betterwithout.ai

There is also a recommended reading list for stochastic parrots but I am not linking it here since I do not want to promote google docs.

Also does anyone know of any efforts to provide libre riscv AI accelerator that can be run on FPGA?

#bloom #vipergpt #kingalgorithmmanifesto #stochasticparrots #betterwithoutai #mechanisticinterpretability #riscv

GitHub - NouamaneTazi/bloomz.cpp: C++ implementation for BLOOM

C++ implementation for BLOOM. Contribute to NouamaneTazi/bloomz.cpp development by creating an account on GitHub.

GitHub