This is my first Mastodon post! Very happy to share Reify.

Recently, we've been working on a new paradigm for random program generation (RPG), which we call *semantic reification*.

Unlike typical syntax-first RPG, semantic reification is semantics first.

We consider program semantics at two dimensions:

- at compile time, a CFG describes what a program *can* do;

- at runtime, an execution path determines what it *actually* does.

Given any CFG and a selected execution path on top of it, semantic reification constructs

- a CFG-matching program,
- an input, and
- an expected output,

such that execution

- follows that path,
- avoids UB,
- terminates, and
- produces the known output.

This brings new possibilities due to its natural support for *arbitrary* control flow:

The CFG can be in any shape (eg nested unbounded loops or irreducible regions), but we only reify one terminating well-defined execution path.

Unexecuted blocks can have any legitimate code.

Muiltiple techniques exist to implement semantic reification.

Our implementation Reify separates leaf function generation from whole program generation, via a lightweight symbolic execution and EMI-based peephole rewrite.

Till now, Reify found more than 60 bugs in GCC & LLVM.

For more details, please refer to our:

PLDI26 paper: https://connglli.github.io/pdfs/reify_pldi26.pdf

GitHub repository: https://github.com/connglli/Reify

RefractIR (by-product): https://github.com/connglli/RefractIR

With Kavya Chopra, Thodoris Sotiropoulos, and Zhendong Su.
@connglli nice thread, I need to read the paper in detail! It sounds related to how pypy is fuzzing its tracing jit: it's easier because we only generate traces, ie linear code. But the traces could still have ub or be internally inconsistent, eg contain something like guard(x == 1) ... guard(x == 2). To prevent these problems, we generate the traces operation by operations, together with example values. the trace generator stores an example value per SSA variable while it generates the trace
@connglli in this way we can make sure that the trace is consistent with itself and ub free, at least for the example values. If the example value of a variable is 0, we don't use the variable in a division. When we generate a guard, the condition is generated to be consistent with the example value. In this way, we always know the results of the trace so have something to compare against, and also the trace is consistent and ub free.
@connglli Welcome! Your Semantic Reification paper was a great read :) I'm excited to read further posts of yours