@sebastian Very important problem indeed. What I couldn't read from your article is this: did you try to use AI to verify the implementation?
I can think of multiple ways how tools like Claude Code etc. can help you gain confidence in the implementation -- or to find flaws.
(1) Ask it for a written proof (whatever kind of "proof" it will generate, maybe you can work with it).
(2) Ask it for a written walkthrough. @simon built a tool for that: https://simonwillison.net/guides/agentic-engineering-patterns/linear-walkthroughs/
(3) Go interactive, enter a conversation with your agent, guide it through your algorithm, and let you show step by step how the implementation matches it.
This kind of problem will be very important in future, so I'm looking forward to any insights here!