Are you impressed by all the new developments in AI over the last few months? Between Meta's stylish glasses with integrated Llama 3, ChatGPT's new o1 model, etc., we're getting closer and closer to a world where ultra-accessible AIs think for us via thought chains, and we're slowly moving towards conscious models. All this may sound promising, dystopian or just plain alarmist, but the fact remains. Whether we like it or not, at this very moment, private companies with colossal resources, which in the past have already committed moral or criminal offences, sometimes even denounced by whistle-blowers, are in the process of demonstrating that all their data infrastructures acquired over the years now allow them to become the monopoly of a revolutionary technology that they don't always fully control or comprehend.
Fortunately, the worst imaginable tragedies didn't occur yet, and nobody wants to be the first. To avoid this, everyone has more or less relevant methods, but in case you've missed it like too many people in my opinion, I'd like here to highlight the work of the company Anthropic. This brand came to the attention of a wider public fairly recently with the release of their newest AI model “Claude”, but that's not the most interesting part. The project was founded by former OpenAI researchers, who left when OpenAI should have been renamed “MicrosoftAI”.
With this small thread, I'd like to try and shed some light on the hyper-complex but fascinating work of these researchers, who deserve hundred times more attention than the trendy new feature of your favorite model or the image I put in as an eye catch. In October 2023, Anthropic published a first research paper entitled “Towards monosemanticity: Decomposing Language Models With Dictionary Learning”, followed by the second opus in May 2024. In my opinion, the methodology presented by these papers and the results are as major an advance in the field as the famous “Attention Is All You Need”, and even more.
I'll spare you here from describing the complex beauty of this paper, (which I invite you to go and consult if you like AI, or rather read an explanation of it by the brilliant “Astral Codex Ten”), but to simplify, it suggests a method using one AI as a tool to analyze another one in order to understand its inner workings, analyze and map what it has learned, and how it “reasons”. Among other things, they highlighted the fact that an AI “creates” within itself lots of little models specialized on specific tasks, in a similar fashion as the brain's regions.
These results are so credible, reliable and important that the same methodology was very recently applied in an article published on the second of October 2024 in Nature, entitled “Largest brain map ever reveals fruit fly's neurons in exquisite detail”. They make it easier for us to understand how a brain and AI model works, and also to detect any malfunctions in it by viewing inner relationship between concepts and connections between areas.
Why is this so much more important than all the mainstream news about AI, or even this fascinating nature paper about fly's brain mapping that I suggest you to read, you might ask? Because without these open publications, available to all and enabling these major advances, companies would each be condemned to do their own research in private without an external advice afraid of being robbed of their secrets, which, in the field of security and learning, is never a desirable practice. Security through obscurity is the WORST of all. So, it's very important to bring these advances to light, to popularize them, to make them spread to as many people as possible, to finance them if possible, and to encourage by every possible means other private companies to contribute publicly to the safety and knowledge of all, and to share their research so that everyone wins in the end.