Hypothesis, Antithesis, synthesis

https://antithesis.com/blog/2026/hegel/

Hypothesis, Antithesis, synthesis

Introducing Hegel, our new family of property-based testing libraries.

Not that it matters at this point but the hegelian dialectic is not thesis, antithesis and synthesis. Usually attributed to Hegel but as I understand it he actually pushed back on this mechanical view of it all and his views on these transitory states was much more nuanced.

Conversation with Will (Antithesis CEO) a couple months ago, heavily paraphrased:

Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."

Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"

Will: "Well someone will definitely complain about it on hacker news."

Me: "That's true. Is that a problem?"

Will: "No, probably not."

(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)

If I had been wearing my fiendish CEO hat at the time, I might have even said something like: "somebody pointing this out will be a great way to jumpstart discussion in the comments."

One of the evilest tricks in marketing to developers is to ensure your post contains one small inaccuracy so somebody gets nerdsniped... not that I have ever done that.

Seth Godin made the case that its more important for people to make remarks than to be favorable (https://en.wikipedia.org/wiki/Purple_Cow:_Transform_Your_Bus...)

Trump did this a lot with the legacy media in his first term. He would make inaccurate statements to the media on the topic he wanted to be in the spotlight, and the media would jump to "fact check" him. Guess what, now everyone is talking about illegal immigration, tariffs, or whatever subject Trump thought was to their advantage.

Purple Cow: Transform Your Business by Being Remarkable - Wikipedia

"No such thing as bad publicity" is a very old idea. That quote is usually attributed to PT Barnum, but the idea is much older than him.
If that's not motivation enough for you to rename it, well, TypeScript already has a static type checker called Hegel. https://hegel.js.org/ (It's a stronger type system than TypeScript.)
Index

Feel power of types

We looked at it and given that the repo was archived nearly two years ago decided it wasn't a problem.

I think it's more that Hegel was fine with "dialectics" but that the antithesis/synthesis stuff is not actually what's going on in his dialectic. It's a bit of a popular misconception about the role of negation and "movement" in Hegel.

I believe (unless my memory is broken) they get into this a bunch in Ep 15 of my favourite podcast "What's Left Of Philosophy": https://podcasts.apple.com/gb/podcast/15-what-is-dialectics-...

Also if you're not being complained about on HN, are you even really nerd-ing?

15 | What is Dialectics? Part II: We Need to Talk about Hegel

Podcast Episode · What's Left of Philosophy · 4 June 2021 · 1h 16m

Apple Podcasts
"Not that it matters ...", What? Of course it matters! I only come to HN for extended arguments on the meaning of the Dialectic.
I gave you one in a sibling ;)

From what I understand, it's a proof technique (other techniques include Kant's Transcendental Deduction or Descartes's pure doubt) that requires generating new conceptual thoughts via internal contradiction and showing necessarily that you lead from one category to the next.

The necessity thing is the big thing - why unfold in this way and not some other way. Because the premises in which you set up your argument can lead to extreme distortions, even if you think you're being "charitable" or whatever. Descartes introduced mind-body dualisms with the method of pure doubt, which at a first glance seemingly is a legitimate angle of attack.

Unfortunately that's about as nuanced as I know. Importantly this excludes out a wide amount of "any conflict that ends in a resolution validates Hegel" kind of sophistry.

>other techniques include Kant's Transcendental Deduction or Descartes's pure doubt

This is not quite accurate. Kant says very explicitly in the (rarely studied) Transcendental Doctrine of Method (Ch 1 Section 4, A789/B817) that this kind of proof method (he calls it "apagogic") is unsuitable to transcendental proofs.

You might be thinking of the much more well studied Antinomies of Pure Reason, in which he uses this kind of proof negatively (which is to say, the circumscribe the limits of reason) as part of his proof against the way the metaphysical arguments from philosophers of his time (which he called "dogmatic" use of reason) about the nature of the cosmos were posed.

The method he used in his Deduction is a transcendental argument, which is typically expressed using two things, X and Y. X is problematic (can be true but not necessarily so), and Y is dependent on X. So then if Y is true, then X must necessarily be true as well.

Sorry I meant "proof method" as more like "this was this guy's angle of attack", not that they would've thought each others angles were valid at all or that they're commensurable with say, 20th century formal proof logic (or Aristotelian logic for example). Descartes and Leibniz were squarely the rationalists that Kant wanted to abolish, and Hegel rejected Kants distinction between noumena and phenomena entirely, so they're already starting from very different places.

I guess it would be more accurate to state Kants actual premises here as making the distinction between appearance and thing-in-itself rather than the deduction, but the deduction technique itself was fascinating when I first learned it so that's what I associate most with Kant lol.

I guess I have not thought critically why we couldn't use a Transcendental argument to support Descartes. I just treated it as a vague category error (to be fair I don't actually know Descartes philosophy that well, even less than I know Kants lol). Could be a fun exercise when I have time.

>I guess I have not thought critically why we couldn't use a Transcendental argument to support Descartes.

The previous section within the Transcendental Dialectic that focuses on the nature of the soul goes into a refutation of Descartes' statement. Kant basically finds "I think therefore I am" to be a tautology that only works by equivocating the "I" in each clause. "I think" pretends that the "I" there is an object in the world which it then compares to the "I am" which is an object in the world. Kant argues that "I think" does not actually demonstrate an "I" that is an object but rather a redundant qualification of thinking.

I am being a bit imprecise, so here is SEP's summary:

>For in each case, Kant thinks that a feature of self-consciousness (the essentially subjectival, unitary and identical nature of the “I” of apperception) gets transmuted into a metaphysics of a self (as an object) that is ostensibly “known” through reason alone to be substantial, simple, identical, etc. This slide from the “I” of apperception to the constitution of an object (the soul) has received considerable attention in the secondary literature, and has fueled a great deal of attention to the Kantian theory of mind and mental activity.

>The claim that the ‘I’ of apperception yields no object of knowledge (for it is not itself an object, but only the “vehicle” for any representation of objectivity as such) is fundamental to Kant’s critique of rational psychology.

[1] https://plato.stanford.edu/entries/kant-metaphysics/#SouRatP...

Kant’s Critique of Metaphysics (Stanford Encyclopedia of Philosophy)

I remember first learning about Hegel when playing Fallout NV. Caesar made it seem so simple.
This is 100% true and a major pet peeve of mine.

> property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.

There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.

And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.

I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.

I actually think there's another angle here where PBT helps, which wasn't explored in the blog post.

That angle is legibility. How do you know your AI-written slop software is doing the right thing? One would normally read all the code. Bad news: that's not much less labor intensive as not using AI at all.

But, if one has comprehensive property-based tests, they can instead read only the property-based tests to convince themselves the software is doing the right thing.

By analogy: one doesn't need to see the machine-checked proof to know the claim is correct. One only needs to check the theorem statement is saying the right thing.

> But the problem remains verifying that the tests actually test what they're supposed to.

Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.

I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.

> So "a huge part" - possibly, but there are other huge parts still missing.

No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.

> We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.

Yeah, I know. Just an opportunity to talk about some of the delusions we're hearing from the "CEO class". Keep up the good work!

> I have generally found that agents with property-based tests are much better at not lying to themselves

I also observed the cheating to increase. I recently tried to do a specific optimization on a big complex function. Wrote a PBT that checks that the original function returns the same values as the optimized function on all inputs. I also tracked the runtime to confirm that performance improved. Then I let Claude loose. The PBT was great at spotting edge cases but eventually Claude always started cheating: it modified the test, it modified the original function, it implemented other (easier) optimizations, ...

Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem.

My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.

> There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included.

Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?

By no means I want to dismiss PBTs - but it seems that this could be both faster and more reliable.

And how do you know if it has proven the property you want, instead of something that's just complicated looking but evaluates to true?

Proofs are a form of static analysis. Static analysis can find interesting bugs, but how a system behaves isn't purely a property of source code. It won't tell you whether the code will run acceptably in a given environment.

For example, if memory use isn't modelled, it won't tell you how big the input can be before the system runs out of memory. Similarly, if your database isn't modelled then you need to test with a real database. Web apps need to test with a real web browser sometimes, rather than a simplified model of one. Databases and web browsers are too complicated to build a full-fidelity mathematical model for.

When testing with real systems there's often the issue that the user's system is different from the one you use to test. You can test with recent versions of Chrome and Firefox, etc, which helps a lot, but what about extensions?

Nothing covers everything, but property tests and fuzzers actually run the code in some test environment. That's going to find different issues than proofs will.

> t took humans years to write the tests by hand, and the agents still failed to converge.

I think there is some hazard in assuming that what agents fail at today they will continue to fail on in the future.

What I mean is, if we take the optimistic view of agents continuing to improve on the trajectory they have started at for one or two years, then it is worth while considering what tools and infrastructure we will need for them. Companies that start to build that now for the future they assume is coming are going to be better positioned than people who wake up to a new reality in two years.

This is the first time in my HN membership where I was excited to read about the dialectic, only to be disappointed upon finding out the article is about Rust.

PBT is for sure the future - which is apparently now? 10 years ago when I was talking about QuickCheck [0] all the JS and Ruby programmers in my city just looked at me like I had two heads.

[0] https://github.com/ryandv/chesskell/blob/master/test/Test/Ch...

chesskell/test/Test/Chess/BitboardSpec.hs at master · ryandv/chesskell

Chess. In Haskell! Contribute to ryandv/chesskell development by creating an account on GitHub.

GitHub

TBF PBT has been the present in Python for a while now.

10 years ago might have been a little early (Hypothesis 1.0 came out 11 years ago this coming Thursday), but we had pretty wide adoption by year two and it's only been growing. It's just that the other languages have all lagged behind.

It's by no means universally adopted, but it's not a weird rare thing that nobody has heard of.

It isn't used by anyone besides me, but I wrote a property-testing library for Deno [1] that has a form of "sometimes" assertions (inspired by Antithesis) and uses "internal shrinking" (inspired by Hypothesis).

But it's still a "blind" fuzzer and it would be nice to write one that gets feedback from code coverage somehow. Instead, you have to run code coverage yourself and figure out how to change test data generation to improve it.

[1] https://jsr.io/@skybrian/repeat-test

@skybrian/repeat-test - JSR

@skybrian/repeat-test on JSR: new, experimental library for writing property tests

JSR

Hi David, congratulations on the release! I'm excited to play around with Hypothesis's bitstream-based shrinking. As you're aware, prop_flat_map is a pain to deal with, and I'd love to replace some of my proptest-based tests with Hegel.

I spent a little time looking at Hegel last week and it wasn't quite clear to me how I'd go about having something like a canonical generator for a type (similar to proptest's Arbitrary). I've found that to be very helpful while generating large structures to test something like serialization roundtripping against — in particular, the test-strategy library has derive macros that work very well for business logic types with, say, 10-15 enum variants each of which may have 0-10 subfields. I'm curious if that is supported today, or if you have plans to support this kind of composition in the future.

edit: oh I completely missed the macro to derive DefaultGenerator! Whoops

Yep, `#[derive(DefaultGenerator)]` and `generators::default<T>()` are the right tools here.

This is one of the areas we've dogfooded the least, so we'd definitely be happy to get feedback on any sharp corners here!

I think `from_type` is one of Hypothesis's most powerful and ergonomic strategies, and that while we probably can't get quite to that level in rust, we can still get something that's pretty great.