@whitequark pro tip: do not try to be smart with comments (natural language). hard code some consistent rules and make the user responsible for them after aggressive reformats. (or you'll spend months on it)
@whitequark i tried to do something like this at my old job! checking equivalency post-facto is really brittle, the longer the file the bigger the chance for a single bad sample to fuck it up. either restrict next-token sampling using the checker, or generate sematic-preserving edit actions instead of text tokens.