Dan Wallach

537 Followers
839 Following
414 Posts
Program Manager, DARPA Information Innovation Office (on leave from Rice University)
Twitterhttps://twitter.com/danwallach
Githubhttps://github.com/danwallach
Homepagehttps://www.cs.rice.edu/~dwallach/
Mediumhttps://medium.com/@dwallach

I've recently been playing around with vibe coding some basic tree-like data structures (treaps, red-black trees, AVL trees, and hash-array mapped tries) in Rust, and then twisting the arm of the LLM to do an optimization from Sarnak and Tarjan (1986) that lets you keep a version history without paying O(log n) path copying costs. This is the sort of thing that, in the old days, might have made for a useful undergraduate senior thesis that they'd crank on for a semester.

I'm at a point where I now have modest confidence in the correctness of my vibe code (e.g., it's got property-based tests that check the invariants, and also doesn't crash under load, despite lots of internal calls to Option::expect()), but I'm not confident enough to share it. It's not bad but not great.

At some point, I'll write up something useful about what I've learned about how to vibe code (in short, write vicious unit tests or you're doomed), but meanwhile I thought I'd skip straight to the data.

For comparison, I also included Rust's "im_rc" crate, which includes a human-written HAMT by
@bodil; I'm using the faster "Rc" version since that's how I vibe-coded all those others.

Punchline 1: @bodil wins. Their work is the "IM HashMap" in the graph. Higher is better. X-axis is problem size and Y-axis is throughput. The benchmark was 80% reads and 20% a mix of inserts, deletions, and updates. Every "version" is saved, creating (hopefully) real memory pressure.

Punchline 2:
The Sarnak-Tarjan optimization definitely helps for the various binary trees, but my attempt to do it for HAMT ended up costing a factor of two in perf. Yikes.

Graph below generated by Criterion.rs. Yes the colors are horrible. Measured on my M1 MacBook Air, because why not.

I gave a keynote talk last week at NDSS. I spent the front half talking about memory safety and how we can, once and for all, eliminate things like buffer overflows. In the back half, I talked about how DARPA works, saying all the things that I wish I'd known about DARPA when I was starting my own academic career. I also wore my favorite vintage aloha shirt.
A useful explainer about why static type systems, in general, and Rust, in particular, are super useful for debugging code.
https://blog.daniel-beskin.com/2025-12-22-the-compiler-is-your-best-friend-stop-lying-to-it
The Compiler Is Your Best Friend, Stop Lying to It - Daniel Beskin's Blog

The compiler is a powerful tool, yet many developers have a painful relationship with it. Can we do better?

@mattblaze @SteveBellovin @cstross @Migueldeicaza

DARPA's TRACTOR (for which I'm the program manager) is focused on C to Rust, not C++. The Microsoft effort is unrelated to our effort.

TRACTOR performer teams have been rolling for about six months, and their first engagement with our test & evaluation team is going on now. As soon as it's ready, we'll push everything out for public release.

There are many challenges with code translation: correctness, idiomaticity, performance. And there are many approaches. By the time TRACTOR is done, which will take several years, we'll hopefully have good answers and good tools.

(I could spend hours just on the topic of "C programmers do the darndest things", where it's sometimes unclear why something even compiles, much less what it's suppose to mean.)

Why not C++? First we need to show we can do C, since (approximately) every valid C program is also a valid C++ program.

This is great: https://blog.trailofbits.com/2025/11/25/constant-time-support-lands-in-llvm-protecting-cryptographic-code-at-the-compiler-level/

LLVM 22 (and presumably all the subsequent versions) now have a constant time select intrinsic to enable cryptography algorithms to tell the compiler exactly what they need (i.e., evaluate both sides of a conditional expression then select the one you want), replacing gross bit hacking expressions that newer optimizers would unravel.

https://blog.trailofbits.com/2025/12/02/introducing-constant-time-support-for-llvm-to-protect-cryptographic-code/

@david_chisnall Your scenario 4 is intriguing. For it to be true, I'm thinking you need the following assumptions:

- Something reassembling Moore's Law continues to be true (ergo, it gets cheaper over time to do both training and inference). Alternatively or additionally, the algorithms will get more efficient over time, letting you go faster with the same hardware.

- The size of these models, and the complexity of training and inference, stays about the same. If there's no benefit from going bigger, or simply no more data to train on, then that says today's workloads are it.

If both of those hold, then you eventually get a proliferation of cheap models, tuned to specific use cases, that can run anywhere.

A related question follows: what happens to these enormous gigawatt datacenters after a hypothetical AI crash? If you can buy them for pennies on the dollar, that starts looking like a cheap way to compete for general purpose cloud computing cycles. Of course, the way you build a general purpose datacenter and the way you build an AI datacenter are not the same, but for plenty of workloads, I'll bet they can do a fine job.

@allanfriedman My go-to is usually some variation on sambal oelek (i.e., fermented pepper mash, without so much vinegar, but maybe adding garlic). Can be quite hot if you lay it on thick, or gives a pleasant mild burn when you mix it in with something like tzatziki.
@adrian I don't know a name for it, but it would be fun to discuss in the context of the Rust type system, where those would have be mutable borrows, and therefore guaranteed not to alias, versus C or C++ where you don't have apriori knowledge.
@dan I can't see any possible way this could go wrong.
Security conference talks fall into two categories
* we designed a distributed entropy siphon to perform a black-box hypervisor side channel escape and chain-load a persistent rootkit into the CPU cache
* we looked behind the sofa and found an entire industry of products/services that have made no attempt at security at all and are therefore vulnerable to the most basic issues that we've been finding in everything for the past 30 years, and no-one else had bothered to look.