Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8% on logical tasks to -44.5% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-dependent (e.g., grammar entropy for logic, AUROC>0.93). Finally, a lightweight fusion of these signals enables selective verification, drastically reducing errors (14-100%) with minimal abstention, transforming LLM-driven formalization into a reliable engineering discipline.

arXiv.org
What do you get if you combine #grammars, #constraints, #evolutionary algorithms, and #Python in one? A mighty fuzzer! Check out our latest #FANDANGO work, to appear at #ISSTA2025:
https://publications.cispa.de/articles/standard/FANDANGO_Evolving_Language-Based_Testing/28769252?file=53591066
To try out Fandango yourself, check out its home page: https://fandango-fuzzer.github.io/
FANDANGO: Evolving Language-Based Testing

Language-based fuzzers leverage formal input specifications (languages) to generate arbitrarily large and diverse sets of valid inputs for a program under test. Modern language-based test generators combine grammars and constraints to satisfy syntactic and semantic input constraints. ISLa, the leading input generator in that space, uses symbolic constraint solving to solve input constraints. Using solvers places ISLa among the most precise fuzzers but also makes it slow. In this paper, we explore search-based testing as an alternative to symbolic constraint solving. We employ a genetic algorithm that iteratively generates candidate inputs from an input specification, evaluates them against defined constraints, evolving a population of inputs through syntactically valid mutations and retaining those with superior fitness until the semantic input constraints are met. This evolutionary procedure, analogous to natural genetic evolution, leads to progressively improved inputs that cover both semantics and syntax. This change boosts the efficiency of language-based testing: In our experiments, compared to ISLa, our search-based FANDANGO prototype is faster by one to three orders of magnitude without sacrificing precision. The search-based approach no longer restricts constraints to constraint solvers' (miniature) languages. In FANDANGO, constraints can use the whole Python language and library. This expressiveness gives testers unprecedented flexibility in shaping test inputs. It allows them to state arbitrary goals for test generation: "Please produce 1,000 valid test inputs where the field follows a Gaussian distribution but never exceeds 20 mV."

figshare

#lispyGopherClimate #podcast #archive https://archives.anonradio.net/202501150000_screwtape.mp3 0UTC Wed
#interview ing @shizamura author of the popular #scifi https://sarilho.net/en/ #comic

#climateCrisis #haiku by @kentpitman

#liveChat in #lambdaMOO as always
telnet lambda.moo.mud.org 8888
co guest
@join screwtape

#Questions for @shizamura please!

#lisp, #fortran, #grammars, oh my.

#Murderbots #bookstodon @nosrednayduj @dougmerritt @ratxue @mdhughes I finally read 90% of it (no spoilers please)

LL and LR in Context: Why Parsing Tools Are Hard

In my last blog entry LLand LR Parsing Demystified, we explored LL and LR parsers from ablack-box perspective. We arrived at a model for these parsers whereb...

The best explanation 👌🏽 of LL & LR #parsing techniques for non-theorists that I’ve seen so far:

“LL and LR Parsing Demystified” [2013], Josh Haberman (https://blog.reverberate.org/2013/07/ll-and-lr-parsing-demystified.html).

On HN [2016]: https://news.ycombinator.com/item?id=12552298

#Parsers #Compilers #Grammars #ComputerScience #DFA

LL and LR Parsing Demystified

My first adventures in parsing theory came when I was doing anindependent study of programming languages in college. When I got tothe part about algorithms s...

What are some underappreciated superpowers that #Perl and/or #rakulang has EXCLUDING #regex and #grammars?

#programming #compsci

Latest addition to Grammar Watch: Terhart's A Grammar of Paunaka, a revised version of a dissertation that won the annual Research Award at the Europa-Universität Flensburg, now published  by @langscipress (ht @haspelmath ) https://linguistic-typology.org/grammarwatch/ #linguistics #grammars

@dpom Great article! Thank you for sharing. ❤️

If you want to go _really_ deep into #parsing and #grammars (without being Rust-specific), I can highly recommend Strumenta

https://tomassetti.me/

It is such a treasure trove of articles deeply explaining all there is to parsing and grammars. I absolutely love it!💖

See e.g. this one

A #Guide to Parsing: Algorithms and Terminology

https://tomassetti.me/guide-parsing-algorithms-terminology/

Urgh, it is just so good!🥰

#Parser #Algorithms #Grammar

Our articles - Strumenta

Strumenta
Algorithmic Pattern residents announcement

Introducing our two selected Algorithmic Pattern residents, Anuradha Reddy and Geraldine Jones.

Then Try This

This week on #bewilderment I wrote about a landscape I love, and about how I feel drawn to #landscape painting, and drawn to refuse landscape painting, and about what being among white pines did to my #grammars of seeing and being and saying. First published in Arnoldia, the quarterly journal of the #ArnoldArboretum at #Harvard, in 2022:

https://bewilderment.substack.com/p/among-white-pines-at-the-foot-of

among white pines, at the foot of them

how else could I see where I am?

Bewilderment