We found an undocumented bug in the Apollo 11 guidance computer code

https://www.juxt.pro/blog/a-bug-on-the-dark-side-of-the-moon/

JUXT Blog: A bug on the dark side of the Moon

How a specification found what fifty-seven years of scrutiny missed.

Super interesting. I wish this article wasn’t written by an LLM though. It feels soulless and plastic.
I'm starting to develop a physiological response when I recognize AI prose. Just like an overwhelming frustration, as if I'm hearing nails on chalkboard silently inside of my head.

I feel ya.... and i have to admit in the past i tried it for one article in my own blog thinking it might help me to express... tho when i read that post now i dont even like it myself its just not my tone.

therefor decided not gonne use any llm for blogging again and even tho it takes alot more time without (im not a very motivated writer) i prefer to release something that i did rather some llm stuff that i wouldnt read myself.

For what it’s worth, Pangram thinks this article is fully human-written: https://www.pangram.com/history/f5f68ce9-70ac-4c2b-b0c3-0ca8...
A bug on the dark side of the | Pangram Labs

Does "A bug on the dark side of the " contain AI-generated text? Pangram finds that this document is We believe that this document is fully human-written

Then pangram isn't very good, because that article is full of Claude-isms.

Is it possible for a tool to know if something is AI written with high confidence at all? LLMs can be tuned/instructed to write in an infinite number of styles.

Don't understand how these tools exist.

The WikiEDU project has some thoughts on this. They found Pangram good enough to detect LLM usage while teaching editors to make their first Wikipedia edits, at least enough to intervene and nudge the student. They didn’t use it punatively or expect authoritative results however. https://wikiedu.org/blog/2026/01/29/generative-ai-and-wikipe...

They found that Pangram suffers from false positives in non-prose contexts like bibliographies, outlines, formatting, etc. The article does not touch on Pangram’s false negatives.

I personally think it’s an intractable problem, but I do feel pangram gives some useful signal, albeit not reliably.

Generative AI and Wikipedia editing: What we learned in 2025

Like many organizations, Wiki Education has grappled with generative AI, its impacts, opportunities, and threats, for several years. As an organization that runs large-scale programs to bring new e…

Wiki Education

It has Claude-isms, but it doesn't feel very Claude-written to me, at least not entirely.

What's making it even more difficult to tell now is people who use AI a lot seem to be actively picking up some of its vocab and writing style quirks.

Pangram doesn't reliably detect individual LLM-generated phrases or paragraphs among human written text.

It seems to look at sections of ~300 words. And for one section at least it has low confidence.

I tested it by getting ChatGPT to add a paragraph to one of my sister comments. Result is "100% human" when in fact it's only 75% human.

Pangram test result: https://www.pangram.com/history/1ee3ce96-6ae5-4de7-9d91-5846...

ChatGPT session where it added a paragraph that Pangram misses: https://chatgpt.com/share/69d4faff-1e18-8329-84fa-6c86fc8258...

These are just some of the goo | Pangram Labs

Does "These are just some of the goo" contain AI-generated text? Pangram finds that this document is We believe that this document is fully human-written

This is useful, thanks! TIL
So you're saying Pangram isn't worth much?

The AI writing detectors are very unreliable. This is important to mention because they can trigger in the opposite direction (reporting human written text as AI generated) which can result in false accusations.

It’s becoming a problem in schools as teachers start accusing students of cheating based on these detectors or ignore obvious signs of AI use because the detectors don’t trigger on it.

It's not setting off any LLM alarm bells to me. It just reads like any other scientific article, which is very often soulless
"Written by an LLM" based on what data or symptom?

Has someone verified this was an actual bug?

One of AI’s strengths is definitely exploration, f.e. in finding bugs, but it still has a high false positive rate. Depending on context that matters or it wont.

Also one has to be aware that there are a lot of bugs that AI won’t find but humans would

I don’t have the expertise to verify this bug actually happened, but I’m curious.

It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means. What is not clear is how they found themselves modeling the gyros software of the apollo code to begin with.

But, I do think their explanation of the lock acquisition and the failure scenario is quite clear and compelling.

> It's not even clear if AI was used to find the bug

The intro says “We used Claude and Allium”. Allium looks like a tool they’ve built for Claude.

So the article is about how they used their AI tooling and workflow to find the bug.

> It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means.

Could the "AI native language" they used be Apache Drools?
The "when" syntax reminded me of it...

https://kie.apache.org/docs/10.0.x/drools/drools/language-re...

(Apache Drools is an open source rule language and interpreter to declaratively formulate and execute rule-based specifications; it easily integrates with Java code.)

Rule Language Reference (Traditional) :: Drools Documentation

>It's not even clear if AI was used to find the bug

It's not even clear you read the article

How did you pick out AI native and miss the rest of the SAME sentence?

> We found this defect by distilling a behavioural specification of the IMU subsystem using Allium, an AI-native behavioural specification language.

Software that ran on 4KB of memory and got humans to the moon still has undiscovered bugs in it. That says something about the complexity hiding in even the smallest codebases.

My guess is that in such low memory regimes, program length is very loosely correlated with bug rate.

If anything, if you try to cram a ton of complexity into a few kb of memory, the likelihood of introducing bugs becomes very high.

For anyone who liked this, I highly suggest you take a look at the CuriousMarc youtube channel, where he chronicles lots of efforts to preserve and understand several parts of the Apollo AGC, with a team of really technically competent and passionate collaborators.

One of the more interesting things they have been working on, is a potential re-interpretation of the infamous 1202 alarm. It is, as of current writing, popularly described as something related to nonsensical readings of a sensor which could (and were) safely ignored in the actual moon landing. However, if I remember correctly, some of their investigation revealed that actually there were many conditions which would cause that error to have been extremely critical and would've likely doomed the astronauts. It is super fascinating.

And that's why it's harder (or easier?) to make the same landing again -- we taking way less chances. Today we know of way more failure modes than back then.