Regarding Leo’s post, it was in poor taste but no poorer than anyone who knows him has come to expect. My comment is that I bet Lean is inconsistent too. I don’t know exactly where and how, all I am saying is that I don’t doubt it. These are very sophisticated pieces of software.

In fact, I don’t doubt that those working in the trenches of the FRO could point to some dark corners and examples of unpublicised sources of inconsistency.

This isn’t to dunk on Lean. Bugs can happen, and having a small kernel is a good way to limit their scope! But there are many ways for assurances to fail and this can happen inside and outside the kernel.

I am also not too impressed by the existence of second-party checkers in the context of a language that lacks decidable type checking. Because something has to give, these checkers will be (necessarily) incomplete in one way or another, most likely each in different ways. This is because lean’s failure of completeness of conversion almost certainly manifests in heuristic ways that are not predictable or replicable…

For a fail safe checker to provide confidence, it needs to be complete. This is not possible with Lean’s core language as designed, and it will not be possible for Lean itself until they find new leadership.

I am very glad the “light of Claude” is shining on Rocq. I hope the Rocq team is learning some lessons and it sounds like they are.

Agda is safe because it has not aimed to be consistent in many years, if ever. ;-)

For a system without decidable type checking, the de bruijn criterion is somewhat useless and really the only viable option for the kinds of assurances wanted is an LCF-style kernel.

Unfortunately, LCF-style kernels are not so easy to do with dependent type theory. You can get some of the way (I did the easy part for Pterodactyl just for fun), but there are still invariants that are not "automatic" from the abstraction. This is in contrast to systems like HOL where the situation is considerably easier.

The de bruijn criterion remains a very good idea for systems that are decidable, or undecidable systems that produce full derivations. The latter is not practical for a full type theory, so nobody does it. So in practice, de bruijn is useful only for decidable systems. Not Lean.
@jonmsterling Would you please give a link to Leo's post?
Who Watches the Provers? — Leonardo de Moura

Leonardo de Moura — Creator of Lean and Z3

@jonmsterling In his post he explicitly says "The checking is deterministic: run it again, get the same answer". Is there something I can read on why this is false? If it is false, why is he saying the opposite?
@RanaldClouston It is deterministic as far as I know. The issue is that if you are implementing a checker for a broken/incomplete system, there are a million different places where you can put the incompleteness, because syntactic invariants are like squeezing a balloon. When one is broken, you can sometimes make it hold but then another thing breaks. Because there is so much room for movement here, it becomes very difficult to even specify what it would mean to have an independent checker. Even if this is fully specified, which would be possible to do, the result would be of limited value in my opinion.
@jonmsterling Afaiu, the situation is even weirder: all the independent checkers essentially reimplement the exact same algorithm, with the exact same incompleteness, essentially by following the original code. At least that was the case for Lean4Lean. So it's independent only in a rather weak sense of the term.
@mevenlennonbertrand @jonmsterling
AFAIK Lean4Lean and Nanoda differ from the core Kernel at least in terms of universe equality checking, but defeq/infer/whnf, and more importantly the checks on inductive types and the construction of recursors, are certain to be very similar between all systems, which make them sensible to the same logic errors the core kernel may have.
@mevenlennonbertrand @jonmsterling I believe they all also assume unique typing for Lean's type theory, but the proof of this property in Mario Carneiro's Master's thesis is incorrect and it remains open. For true diversity in type checker implementation, it would be nice to have a checker that doesn't itself rely on unproven conjectures about the type theory.
@zwarich @jonmsterling Claim: this is extremely hard. My main takeaway from verifying kernels is that every single thing you do implicitly relies on invariants, which to be established need difficult meta-theoretic properties. At the very least injectivity of type constructors, which is wildly open for a type theory like Lean's. This is baked in everywhere, from the fact that your evaluator does not need to re-infer and compare the type of sub-expressions every time it does a reduction step (for an example of what happens if you want to do this safely, you can look at Anja Petković Komel's PhD thesis), to the fact that you don't re-check the context every time you reach a leaf in a derivation tree. I don't think you could re-check mathlib with a type-checker of this kind, at least not without serious performance engineering, which would complicate your kernel and kind of defeat the point.
@mevenlennonbertrand @zwarich It would be really good to clarify in public just how badly EVERYTHING breaks if you don't have these basic syntactic properties. It's not just some pointy-headed thing... it's one of the important pre-requisites for getting to the level of assurances that Leo is recklessly claiming about Lean.
@jonmsterling @zwarich If I had a dollar every time someone does something that is only safe if type constructors are injective… But yeah I don't know what to do to make this more visible at this point.
@mevenlennonbertrand @zwarich It's so frustrating. I feel we are chafing under the pre-scientific nature of type theory at this time... It remains, even after the revolutions of the past decade, really difficult to speak about these things in generally applicable terms that aren't specific to a particular system.
@jonmsterling @mevenlennonbertrand @zwarich I want to plug SOGATs here, and in particular @rafaelbocquet's work on synthetic approaches to metatheorems with SOGATs https://rafaelbocquet.gitlab.io/pdfs/thesis.pdf
@olynch @mevenlennonbertrand @zwarich @rafaelbocquet I mean, I know that stuff well and have made many contributions to it over the years and my entire PhD thesis was about scaling those metatheorems, but my point is that IN SPITE of these tools we still can’t really talk about these syntactic problems in a clear and general purpose way because semantic properties tend to splinter into a zillion interrelated properties of syntax-in-the-computer and the algorithms surrounding it. The whole discussion of “subject reduction” is a good example of how something non-semantic can actually be related in complex ways to semantic things (including both injectivity of type constructors and decidability).
@mevenlennonbertrand @jonmsterling @zwarich concrete examples might help
@ncf Of things that go wrong without injectivity of type constructors?
@mevenlennonbertrand yeah. the only examples i'm aware of (but i know there must be more) are for the purposes of elaborating Π-types without annotations, so if someone is mostly concerned with having a sound kernel (so that they can be confident that any accepted proof is valid) and not much else, they might not see the point (note that this is not my perspective)
@ncf @mevenlennonbertrand I mean, you would need far more annotations than any system uses.
@jonmsterling @mevenlennonbertrand no i mean that such a person might be fine with the elaborator not being complete or having unique solutions or whatever actually breaks if you assume injectivity of Π-types when it doesn't hold.
@ncf @mevenlennonbertrand I don't want to rehash the several times this point has been answered in the various threads...
@jonmsterling @mevenlennonbertrand ok, after reading up on the de Bruijn criterion i see that this really needs complete typechecking.

@ncf Two examples.

Soundly reducing a well-typed beta redex requires the domain which was used to type the abstraction's body to match the type of the argument, and the return type of the abstraction to match the type of the entire term. You get this for free only if Pi is injective, otherwise you need to do it by hand. The classic example is nat -> nat = nat -> bool which is a sound (definitional) equation (it's true in the cardinal model, for instance), which lets you type (fun x => 0) 0 : bool, but you need to prevent this from reducing by noticing that bool <> nat.

More frightening (and less known): congruence of application!! The issue here is more subtle, and depends of how exactly you set things up. But the idea is that the reasonable invariant for neutral comparison is to only assume that both sides are well-typed, but _not necessarily at the same type_ (assuming they have the same type cannot be preserved recursively, because eg knowing f u = f' u' have the same type does not tell you f and f' do). Your post-condition will then assert the neutrals are defeq at a given type. Now if you again think about comparing f u = f' u', you assume both applications are well-typed, thus also f and f' are, recursively compare them, learn they are defeq at some (x : A) -> B. Now you want to know you can compare u to u'. But these are normal forms, so now you must provide a type at which to do that comparison. Surely enough, A should do? Well now you have to establish u' : A knowing only f' u' is well-typed and f' : (x : A) -> B. Without some sort of Pi injectivity (+ maybe an argument on uniqueness of types which typically follows from it), you're toast.

@ncf This sort of subtle argumentation is really all over the place when you seriously try to verify things and have to think hard about invariants. And it's really easy to do something that relies on some inversion principle without realising.

I mentioned it in another thread, but the one example I know which grapples seriously with this is the implementation of Andromeda 2, and particularly the conversion-checking algorithm described in Anja Petković Komel's PhD. This works on rather generic theories, only relying on simple properties that one can prove generically (things like weakening or presupposition/boundary lemmas).

The ingredients are : heavily annotated syntax (essentially GAT) but, more importantly, _actually constantly checking the defeq of these annotations_ which cannot be taken for granted. The two issues of the previous toot dissolve if you have fully annotated abstraction and application nodes and you check that the types on the application node coincide with those on the abstraction node, or that the type annotations on both application nodes coincide.

@mevenlennonbertrand @ncf And, by the way, if defeq isn’t decidable, then if your plan is to actually implement the sound reducer, you’re toast — because you have to check all these equations that you can’t actually check. That is why in this scenario there are only two options: checking full derivations, or LCF. Lean does neither, of course…
@jonmsterling @ncf Yep, Anja's implementation has this, you can end up looping/wrongly failing on the simplest beta-redex because it triggered a hardcore defeq problem.

@mevenlennonbertrand @ncf Yeah... It's a seriously hard problem, and Anja did a very good job grasping all the difficulties involved.

It is kind of funny how this is "day one" stuff for us, but we are just dismissed as pesky type theorists who are getting in the way of progress or whatever. Bozos rule the whole world, so I guess it makes sense that they should also be lording it over small things like proof assistants.

@mevenlennonbertrand @jonmsterling @ncf AFAIK Nils Anders Danielsson wrote a sound type checker that supports equality reflection, that's probably somewhere here: https://github.com/graded-type-theory/graded-type-theory @ncf do you know where exactly the code is?
graded-type-theory/Definition/Typed/Decidable/Internal.agda at master · graded-type-theory/graded-type-theory

A Logical Relation for Martin-Löf Type Theory in Agda - graded-type-theory/graded-type-theory

GitHub
@ncf @AndrasKovacs @mevenlennonbertrand Can someone give me the tl;dr on what this actually does with equality reflection in relation to type checking?
@jonmsterling @AndrasKovacs @mevenlennonbertrand I think the gist was that you don't have injective Π-types, so instead whenever you have Π A B ≡ Π C D you check recursively that A ≡ C and B ≡ D (and presumably fail if they aren't).
@ncf @AndrasKovacs @mevenlennonbertrand But I am having trouble understanding what it means to have a type checker in the presence of equality reflection. Sorry if I misunderstood something
@jonmsterling @AndrasKovacs @mevenlennonbertrand I don't think it's meant to be complete, just something to avoid writing long typing derivations in Agda in those cases where they can be inferred

@ncf @jonmsterling @AndrasKovacs @mevenlennonbertrand
I'm doing something that seems very similar in a normalization proof that I am working on: some parts of the normalization model depends on the injectivity of Π. As a way around this, it is possible to add new neutral terms that correspond to "computations blocked due to the lack of injectivity".

The new neutral constructors look like this:
castΠΠ : (p : Π A₁ B₁ = Π A₂ B₂) ∧ ¬ (nf(A₁) = nf(A₂) ∧ nf(B₁) = nf(B₂)) → Nf (Π A₁ B₁) f → Ne (Π A₂ B₂) f
(the second occurrence of f is transported over p)
(this is simplified, the negative condition is actually described using a frontier of instability)

(There are similar constructors for "computations blocked due to the lack of disjointness").

@rafaelbocquet @ncf @AndrasKovacs @mevenlennonbertrand Very cool idea... I'm curious, why does injectivity not hold in your system?

@jonmsterling @ncf @AndrasKovacs @mevenlennonbertrand

Injectivity holds as a consequence of normalization, but we don't know this yet when defining the normalization model.

I'm proving normalization for a type theory with a non-linear equation, meaning that normalization has to be interleaved with decidability of equality. For example, the equality "transport(b,x,x,p) = b", seen as a conditional reduction "transport(b,x,y,p) = b when x and y are convertible". I discussed the difficulties with that in section 6.5.3 of my thesis, and finally found a promising approach recently.

The main obstacle was that at some point, we have two elements x', y' of (proof-relevant) logical predicates over terms x,y, and we can decide the equality of (the normal forms of) x and y, but we don't know how x' and y' are related when (x = y).

The solution is to define an operation "cast", similar to the cast of observational type theory, but at the level of logical predicates. As in obstt, it should compute by double recursion on the normalization structure of types (this requires a closed universe of normalization structures), and needs some form of injectivity and disjointness of type formers.

@rafaelbocquet @ncf @jonmsterling @AndrasKovacs Yes!

I feel that the "transport hell" that many people complain about when working with QIITs is merely surfacing these sort of issues that are hidden in the standard presentations where the use of defeq is invisible, but that one should think about.

@ncf Case in point: I smuggled a use of no-confusion and uniqueness of types in the proof for confluence of application. Can you spot it?
@mevenlennonbertrand "learn they are defeq at some (x : A) → B"? what you learn is that they are defeq at some type E with (x : A) → B ≡ E ≡ (x : C) → D or something.
@ncf Indeed ! Depending how exactly you structure things you might be able to learn that E is/has a normal form, in which case by no-confusion that can only be a Π. If not, then you need some sort of weak-head normalisation on top (but if you are careful you can arrange for this to only show up in completeness arguments, at least I managed to in the FSCD '25 paper).
@mevenlennonbertrand @ncf an exercise for myself that I'm gonna write down as a reply here in case others are interested as well: how does your first example relate to what is being proven in Cockx et al's Taming of the Rew (https://hal.science/hal-02901011v2/document)

@mevenlennonbertrand @ncf

Is "congruence" of application the right word here? This sounds more like an inversion principle to me

@maxsnew I don't think so? Here this is really about using in the conversion-checker the rule that says that f = f' and u = u' then f u = f' u'?

@mevenlennonbertrand

Maybe I misunderstood, I was talking about this sentence:

> Well now you have to establish u' : A knowing only f' u' is well-typed and f' : (x : A) -> B.

Which sounds like an inversion principle to me

@maxsnew Ah, right! So indeed the point is that to be able to apply the rule you need a bunch of things, which you can obtain either by inversion principles or by re-checking a number of things. The stronger the inversion principles, the less things you need to re-check.

(To be precise the point of your citation the typing of f' u' is coming from the precondition, and that of f' from the postcondition of the first recursive call and the "boundary lemma" which says that if t = t' : A then t' : A, which typically only needs very weak forms of inversion, what the PTS people call "generation")

@mevenlennonbertrand @ncf For an example that deduces false from a consistent context, consider a context with two equations
Nat x (Nat x Nat)) x (Nat x Nat) = ((Nat x Nat) x (Nat x Nat)) x Nat
forall xyzwv. ((x,(y,z)), (w,v)) = (((x,y), (z,w), v)
We can validate it by a Set model where the product is strictly associative (there is a math stackexchange question for this). But now snd (snd (fst p)) returns different results, if we don't check for types before reducing.
@jonmsterling Why is completeness required for providing confidence? It seems to me that it'd be enough if all checkers agree on accepting/rejecting the code of interest, which afaik always remains on the decidable fragment of type checking.
Public view of Lean | Zulip team chat

Browse the publicly accessible channels in Lean without logging in.

Zulip
@jaycech3n Amazing. And from the comment thread on Leo’s PRs it looks like the issue wasn’t entirely fixed…