Mastodawn

Regarding Leo’s post, it was in poor taste but no poorer than anyone who knows him has come to expect. My comment is that I bet Lean is inconsistent too. I don’t know exactly where and how, all I am saying is that I don’t doubt it. These are very sophisticated pieces of software.

In fact, I don’t doubt that those working in the trenches of the FRO could point to some dark corners and examples of unpublicised sources of inconsistency.

This isn’t to dunk on Lean. Bugs can happen, and having a small kernel is a good way to limit their scope! But there are many ways for assurances to fail and this can happen inside and outside the kernel.

Jon Sterling Mar 17

I am also not too impressed by the existence of second-party checkers in the context of a language that lacks decidable type checking. Because something has to give, these checkers will be (necessarily) incomplete in one way or another, most likely each in different ways. This is because lean’s failure of completeness of conversion almost certainly manifests in heuristic ways that are not predictable or replicable…

For a fail safe checker to provide confidence, it needs to be complete. This is not possible with Lean’s core language as designed, and it will not be possible for Lean itself until they find new leadership.

Jon Sterling Mar 17

I am very glad the “light of Claude” is shining on Rocq. I hope the Rocq team is learning some lessons and it sounds like they are.

Agda is safe because it has not aimed to be consistent in many years, if ever. ;-)

Jon Sterling Mar 17

For a system without decidable type checking, the de bruijn criterion is somewhat useless and really the only viable option for the kinds of assurances wanted is an LCF-style kernel.

Unfortunately, LCF-style kernels are not so easy to do with dependent type theory. You can get some of the way (I did the easy part for Pterodactyl just for fun), but there are still invariants that are not "automatic" from the abstraction. This is in contrast to systems like HOL where the situation is considerably easier.

Jon Sterling Mar 17

The de bruijn criterion remains a very good idea for systems that are decidable, or undecidable systems that produce full derivations. The latter is not practical for a full type theory, so nobody does it. So in practice, de bruijn is useful only for decidable systems. Not Lean.

⏚ Antoine Chambert-Loir Mar 17

@jonmsterling Would you please give a link to Leo's post?

Jon Sterling Mar 17

@antoinechambertloir https://leodemoura.github.io/blog/2026-3-16-who-watches-the-provers/

Who Watches the Provers? — Leonardo de Moura

Leonardo de Moura — Creator of Lean and Z3

RanaldClouston Mar 17

@jonmsterling In his post he explicitly says "The checking is deterministic: run it again, get the same answer". Is there something I can read on why this is false? If it is false, why is he saying the opposite?

Jon Sterling Mar 17

@RanaldClouston It is deterministic as far as I know. The issue is that if you are implementing a checker for a broken/incomplete system, there are a million different places where you can put the incompleteness, because syntactic invariants are like squeezing a balloon. When one is broken, you can sometimes make it hold but then another thing breaks. Because there is so much room for movement here, it becomes very difficult to even specify what it would mean to have an independent checker. Even if this is fully specified, which would be possible to do, the result would be of limited value in my opinion.

Meven Lennon-Bertrand Mar 17

@jonmsterling Afaiu, the situation is even weirder: all the independent checkers essentially reimplement the exact same algorithm, with the exact same incompleteness, essentially by following the original code. At least that was the case for Lean4Lean. So it's independent only in a rather weak sense of the term.

Arthur Adjedj Mar 18

@mevenlennonbertrand @jonmsterling
AFAIK Lean4Lean and Nanoda differ from the core Kernel at least in terms of universe equality checking, but defeq/infer/whnf, and more importantly the checks on inductive types and the construction of recursors, are certain to be very similar between all systems, which make them sensible to the same logic errors the core kernel may have.

@mevenlennonbertrand @jonmsterling I believe they all also assume unique typing for Lean's type theory, but the proof of this property in Mario Carneiro's Master's thesis is incorrect and it remains open. For true diversity in type checker implementation, it would be nice to have a checker that doesn't itself rely on unproven conjectures about the type theory.

Meven Lennon-Bertrand Mar 18

@zwarich @jonmsterling Claim: this is extremely hard. My main takeaway from verifying kernels is that every single thing you do implicitly relies on invariants, which to be established need difficult meta-theoretic properties. At the very least injectivity of type constructors, which is wildly open for a type theory like Lean's. This is baked in everywhere, from the fact that your evaluator does not need to re-infer and compare the type of sub-expressions every time it does a reduction step (for an example of what happens if you want to do this safely, you can look at Anja Petković Komel's PhD thesis), to the fact that you don't re-check the context every time you reach a leaf in a derivation tree. I don't think you could re-check mathlib with a type-checker of this kind, at least not without serious performance engineering, which would complicate your kernel and kind of defeat the point.

Jon Sterling Mar 18

@mevenlennonbertrand @zwarich It would be really good to clarify in public just how badly EVERYTHING breaks if you don't have these basic syntactic properties. It's not just some pointy-headed thing... it's one of the important pre-requisites for getting to the level of assurances that Leo is recklessly claiming about Lean.

Meven Lennon-Bertrand Mar 18

@jonmsterling @zwarich If I had a dollar every time someone does something that is only safe if type constructors are injective… But yeah I don't know what to do to make this more visible at this point.

Jon Sterling Mar 18

@mevenlennonbertrand @zwarich It's so frustrating. I feel we are chafing under the pre-scientific nature of type theory at this time... It remains, even after the revolutions of the past decade, really difficult to speak about these things in generally applicable terms that aren't specific to a particular system.

Owen Lynch Mar 18

@jonmsterling @mevenlennonbertrand @zwarich I want to plug SOGATs here, and in particular @rafaelbocquet's work on synthetic approaches to metatheorems with SOGATs https://rafaelbocquet.gitlab.io/pdfs/thesis.pdf

Jon Sterling Mar 18

@olynch @mevenlennonbertrand @zwarich @rafaelbocquet I mean, I know that stuff well and have made many contributions to it over the years and my entire PhD thesis was about scaling those metatheorems, but my point is that IN SPITE of these tools we still can’t really talk about these syntactic problems in a clear and general purpose way because semantic properties tend to splinter into a zillion interrelated properties of syntax-in-the-computer and the algorithms surrounding it. The whole discussion of “subject reduction” is a good example of how something non-semantic can actually be related in complex ways to semantic things (including both injectivity of type constructors and decidability).

Naïm Camille Favier Mar 18

@mevenlennonbertrand @jonmsterling @zwarich concrete examples might help

Meven Lennon-Bertrand Mar 18

@ncf Of things that go wrong without injectivity of type constructors?

Naïm Camille Favier Mar 18

@mevenlennonbertrand yeah. the only examples i'm aware of (but i know there must be more) are for the purposes of elaborating Π-types without annotations, so if someone is mostly concerned with having a sound kernel (so that they can be confident that any accepted proof is valid) and not much else, they might not see the point (note that this is not my perspective)

Jon Sterling Mar 18

@ncf @mevenlennonbertrand I mean, you would need far more annotations than any system uses.

Naïm Camille Favier Mar 18

@jonmsterling @mevenlennonbertrand no i mean that such a person might be fine with the elaborator not being complete or having unique solutions or whatever actually breaks if you assume injectivity of Π-types when it doesn't hold.

Jon Sterling Mar 18

@ncf @mevenlennonbertrand I don't want to rehash the several times this point has been answered in the various threads...

Naïm Camille Favier Mar 18

@jonmsterling @mevenlennonbertrand ok, after reading up on the de Bruijn criterion i see that this really needs complete typechecking.

Meven Lennon-Bertrand Mar 19

@ncf Two examples.

Soundly reducing a well-typed beta redex requires the domain which was used to type the abstraction's body to match the type of the argument, and the return type of the abstraction to match the type of the entire term. You get this for free only if Pi is injective, otherwise you need to do it by hand. The classic example is nat -> nat = nat -> bool which is a sound (definitional) equation (it's true in the cardinal model, for instance), which lets you type (fun x => 0) 0 : bool, but you need to prevent this from reducing by noticing that bool <> nat.

More frightening (and less known): congruence of application!! The issue here is more subtle, and depends of how exactly you set things up. But the idea is that the reasonable invariant for neutral comparison is to only assume that both sides are well-typed, but _not necessarily at the same type_ (assuming they have the same type cannot be preserved recursively, because eg knowing f u = f' u' have the same type does not tell you f and f' do). Your post-condition will then assert the neutrals are defeq at a given type. Now if you again think about comparing f u = f' u', you assume both applications are well-typed, thus also f and f' are, recursively compare them, learn they are defeq at some (x : A) -> B. Now you want to know you can compare u to u'. But these are normal forms, so now you must provide a type at which to do that comparison. Surely enough, A should do? Well now you have to establish u' : A knowing only f' u' is well-typed and f' : (x : A) -> B. Without some sort of Pi injectivity (+ maybe an argument on uniqueness of types which typically follows from it), you're toast.

Meven Lennon-Bertrand Mar 19

@ncf This sort of subtle argumentation is really all over the place when you seriously try to verify things and have to think hard about invariants. And it's really easy to do something that relies on some inversion principle without realising.

I mentioned it in another thread, but the one example I know which grapples seriously with this is the implementation of Andromeda 2, and particularly the conversion-checking algorithm described in Anja Petković Komel's PhD. This works on rather generic theories, only relying on simple properties that one can prove generically (things like weakening or presupposition/boundary lemmas).

The ingredients are : heavily annotated syntax (essentially GAT) but, more importantly, _actually constantly checking the defeq of these annotations_ which cannot be taken for granted. The two issues of the previous toot dissolve if you have fully annotated abstraction and application nodes and you check that the types on the application node coincide with those on the abstraction node, or that the type annotations on both application nodes coincide.

Jon Sterling Mar 19

@mevenlennonbertrand @ncf And, by the way, if defeq isn’t decidable, then if your plan is to actually implement the sound reducer, you’re toast — because you have to check all these equations that you can’t actually check. That is why in this scenario there are only two options: checking full derivations, or LCF. Lean does neither, of course…

Meven Lennon-Bertrand Mar 19

@jonmsterling @ncf Yep, Anja's implementation has this, you can end up looping/wrongly failing on the simplest beta-redex because it triggered a hardcore defeq problem.

Jon Sterling Mar 19

@mevenlennonbertrand @ncf Yeah... It's a seriously hard problem, and Anja did a very good job grasping all the difficulties involved.

It is kind of funny how this is "day one" stuff for us, but we are just dismissed as pesky type theorists who are getting in the way of progress or whatever. Bozos rule the whole world, so I guess it makes sense that they should also be lording it over small things like proof assistants.

AndrasKovacs Mar 19

@mevenlennonbertrand @jonmsterling @ncf AFAIK Nils Anders Danielsson wrote a sound type checker that supports equality reflection, that's probably somewhere here: https://github.com/graded-type-theory/graded-type-theory @ncf do you know where exactly the code is?

Naïm Camille Favier Mar 19

@AndrasKovacs @mevenlennonbertrand @jonmsterling in there https://github.com/graded-type-theory/graded-type-theory/blob/master/Definition/Typed/Decidable/Internal.agda

graded-type-theory/Definition/Typed/Decidable/Internal.agda at master · graded-type-theory/graded-type-theory

A Logical Relation for Martin-Löf Type Theory in Agda - graded-type-theory/graded-type-theory

GitHub

Jon Sterling Mar 19

@ncf @AndrasKovacs @mevenlennonbertrand Can someone give me the tl;dr on what this actually does with equality reflection in relation to type checking?

Naïm Camille Favier Mar 19

@jonmsterling @AndrasKovacs @mevenlennonbertrand I think the gist was that you don't have injective Π-types, so instead whenever you have Π A B ≡ Π C D you check recursively that A ≡ C and B ≡ D (and presumably fail if they aren't).

Jon Sterling Mar 19

@ncf @AndrasKovacs @mevenlennonbertrand But I am having trouble understanding what it means to have a type checker in the presence of equality reflection. Sorry if I misunderstood something

Naïm Camille Favier Mar 19

@jonmsterling @AndrasKovacs @mevenlennonbertrand I don't think it's meant to be complete, just something to avoid writing long typing derivations in Agda in those cases where they can be inferred

Rafaël Bocquet Mar 19

@ncf @jonmsterling @AndrasKovacs @mevenlennonbertrand
I'm doing something that seems very similar in a normalization proof that I am working on: some parts of the normalization model depends on the injectivity of Π. As a way around this, it is possible to add new neutral terms that correspond to "computations blocked due to the lack of injectivity".

The new neutral constructors look like this:
castΠΠ : (p : Π A₁ B₁ ＝ Π A₂ B₂) ∧ ¬ (nf(A₁) = nf(A₂) ∧ nf(B₁) = nf(B₂)) → Nf (Π A₁ B₁) f → Ne (Π A₂ B₂) f
(the second occurrence of f is transported over p)
(this is simplified, the negative condition is actually described using a frontier of instability)

(There are similar constructors for "computations blocked due to the lack of disjointness").

Jon Sterling Mar 19

@rafaelbocquet @ncf @AndrasKovacs @mevenlennonbertrand Very cool idea... I'm curious, why does injectivity not hold in your system?

Rafaël Bocquet Mar 19

@jonmsterling @ncf @AndrasKovacs @mevenlennonbertrand

Injectivity holds as a consequence of normalization, but we don't know this yet when defining the normalization model.

I'm proving normalization for a type theory with a non-linear equation, meaning that normalization has to be interleaved with decidability of equality. For example, the equality "transport(b,x,x,p) = b", seen as a conditional reduction "transport(b,x,y,p) = b when x and y are convertible". I discussed the difficulties with that in section 6.5.3 of my thesis, and finally found a promising approach recently.

The main obstacle was that at some point, we have two elements x', y' of (proof-relevant) logical predicates over terms x,y, and we can decide the equality of (the normal forms of) x and y, but we don't know how x' and y' are related when (x = y).

The solution is to define an operation "cast", similar to the cast of observational type theory, but at the level of logical predicates. As in obstt, it should compute by double recursion on the normalization structure of types (this requires a closed universe of normalization structures), and needs some form of injectivity and disjointness of type formers.

Meven Lennon-Bertrand Mar 19

@rafaelbocquet @ncf @jonmsterling @AndrasKovacs Yes!

I feel that the "transport hell" that many people complain about when working with QIITs is merely surfacing these sort of issues that are hidden in the standard presentations where the use of defeq is invisible, but that one should think about.

Meven Lennon-Bertrand Mar 19

@ncf Case in point: I smuggled a use of no-confusion and uniqueness of types in the proof for confluence of application. Can you spot it?

Naïm Camille Favier Mar 19

@mevenlennonbertrand "learn they are defeq at some (x : A) → B"? what you learn is that they are defeq at some type E with (x : A) → B ≡ E ≡ (x : C) → D or something.

Meven Lennon-Bertrand Mar 19

@ncf Indeed ! Depending how exactly you structure things you might be able to learn that E is/has a normal form, in which case by no-confusion that can only be a Π. If not, then you need some sort of weak-head normalisation on top (but if you are careful you can arrange for this to only show up in completeness arguments, at least I managed to in the FSCD '25 paper).

Naïm Camille Favier Mar 19

@mevenlennonbertrand fantastic, thanks

@mevenlennonbertrand @ncf an exercise for myself that I'm gonna write down as a reply here in case others are interested as well: how does your first example relate to what is being proven in Cockx et al's Taming of the Rew (https://hal.science/hal-02901011v2/document)

Max S. New ⚜️Mar 19

@mevenlennonbertrand @ncf

Is "congruence" of application the right word here? This sounds more like an inversion principle to me

Meven Lennon-Bertrand Mar 19

@maxsnew I don't think so? Here this is really about using in the conversion-checker the rule that says that f = f' and u = u' then f u = f' u'?

Max S. New ⚜️Mar 19

@mevenlennonbertrand

Maybe I misunderstood, I was talking about this sentence:

> Well now you have to establish u' : A knowing only f' u' is well-typed and f' : (x : A) -> B.

Which sounds like an inversion principle to me

Meven Lennon-Bertrand Mar 19

@maxsnew Ah, right! So indeed the point is that to be able to apply the rule you need a bunch of things, which you can obtain either by inversion principles or by re-checking a number of things. The stronger the inversion principles, the less things you need to re-check.

(To be precise the point of your citation the typing of f' u' is coming from the precondition, and that of f' from the postcondition of the first recursive call and the "boundary lemma" which says that if t = t' : A then t' : A, which typically only needs very weak forms of inversion, what the PTS people call "generation")

@mevenlennonbertrand @ncf For an example that deduces false from a consistent context, consider a context with two equations
Nat x (Nat x Nat)) x (Nat x Nat) = ((Nat x Nat) x (Nat x Nat)) x Nat
forall xyzwv. ((x,(y,z)), (w,v)) = (((x,y), (z,w), v)
We can validate it by a Set model where the product is strictly associative (there is a math stackexchange question for this). But now snd (snd (fst p)) returns different results, if we don't check for types before reducing.

Fernando Chu Mar 18

@jonmsterling Why is completeness required for providing confidence? It seems to me that it'd be enough if all checkers agree on accepting/rejecting the code of interest, which afaik always remains on the decidable fragment of type checking.

Josh Chen Mar 18

@jonmsterling As an aside, Leo's post also forgot to mention the kernel soundness bug found just last year ;) https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/Soundness.20bug.3A.20hasLooseBVars.20is.20not.20conservative/near/521286338

Public view of Lean | Zulip team chat

Browse the publicly accessible channels in Lean without logging in.

Zulip

Jon Sterling Mar 18

@jaycech3n Amazing. And from the comment thread on Leo’s PRs it looks like the issue wasn’t entirely fixed…