Dan Wallach

537 Followers
839 Following
414 Posts
Program Manager, DARPA Information Innovation Office (on leave from Rice University)
Twitterhttps://twitter.com/danwallach
Githubhttps://github.com/danwallach
Homepagehttps://www.cs.rice.edu/~dwallach/
Mediumhttps://medium.com/@dwallach

I've recently been playing around with vibe coding some basic tree-like data structures (treaps, red-black trees, AVL trees, and hash-array mapped tries) in Rust, and then twisting the arm of the LLM to do an optimization from Sarnak and Tarjan (1986) that lets you keep a version history without paying O(log n) path copying costs. This is the sort of thing that, in the old days, might have made for a useful undergraduate senior thesis that they'd crank on for a semester.

I'm at a point where I now have modest confidence in the correctness of my vibe code (e.g., it's got property-based tests that check the invariants, and also doesn't crash under load, despite lots of internal calls to Option::expect()), but I'm not confident enough to share it. It's not bad but not great.

At some point, I'll write up something useful about what I've learned about how to vibe code (in short, write vicious unit tests or you're doomed), but meanwhile I thought I'd skip straight to the data.

For comparison, I also included Rust's "im_rc" crate, which includes a human-written HAMT by
@bodil; I'm using the faster "Rc" version since that's how I vibe-coded all those others.

Punchline 1: @bodil wins. Their work is the "IM HashMap" in the graph. Higher is better. X-axis is problem size and Y-axis is throughput. The benchmark was 80% reads and 20% a mix of inserts, deletions, and updates. Every "version" is saved, creating (hopefully) real memory pressure.

Punchline 2:
The Sarnak-Tarjan optimization definitely helps for the various binary trees, but my attempt to do it for HAMT ended up costing a factor of two in perf. Yikes.

Graph below generated by Criterion.rs. Yes the colors are horrible. Measured on my M1 MacBook Air, because why not.

I gave a keynote talk last week at NDSS. I spent the front half talking about memory safety and how we can, once and for all, eliminate things like buffer overflows. In the back half, I talked about how DARPA works, saying all the things that I wish I'd known about DARPA when I was starting my own academic career. I also wore my favorite vintage aloha shirt.
A useful explainer about why static type systems, in general, and Rust, in particular, are super useful for debugging code.
https://blog.daniel-beskin.com/2025-12-22-the-compiler-is-your-best-friend-stop-lying-to-it
The Compiler Is Your Best Friend, Stop Lying to It - Daniel Beskin's Blog

The compiler is a powerful tool, yet many developers have a painful relationship with it. Can we do better?

This is great: https://blog.trailofbits.com/2025/11/25/constant-time-support-lands-in-llvm-protecting-cryptographic-code-at-the-compiler-level/

LLVM 22 (and presumably all the subsequent versions) now have a constant time select intrinsic to enable cryptography algorithms to tell the compiler exactly what they need (i.e., evaluate both sides of a conditional expression then select the one you want), replacing gross bit hacking expressions that newer optimizers would unravel.

https://blog.trailofbits.com/2025/12/02/introducing-constant-time-support-for-llvm-to-protect-cryptographic-code/

Security conference talks fall into two categories
* we designed a distributed entropy siphon to perform a black-box hypervisor side channel escape and chain-load a persistent rootkit into the CPU cache
* we looked behind the sofa and found an entire industry of products/services that have made no attempt at security at all and are therefore vulnerable to the most basic issues that we've been finding in everything for the past 30 years, and no-one else had bothered to look.

I think this needs to be repeated, since I tend to be quite negative about all of the 'AI' hype:

I am not opposed to machine learning. I used machine learning in my PhD and it was great. I built a system for predicting the next elements you'd want to fetch from disk or a remote server that didn't require knowledge of the algorithm that you were using for traversal and would learn patterns. This performed as well as a prefetcher that did have detailed knowledge of the algorithm that defined the access path. Modern branch predictors use neural networks. Machine learning is amazing if:

  • The problem is too hard to write a rule-based system for or the requirements change sufficiently quickly that it isn't worth writing such a thing and,
  • The value of a correct answer is much higher than the cost of an incorrect answer.

The second of these is really important. Most machine-learning systems will have errors (the exceptions are those where ML is really used for compression[1]). For prefetching, branch prediction, and so on, the cost of a wrong answer is very low, you just do a small amount of wasted work, but the benefit of a correct answer is huge: you don't sit idle for a long period. These are basically perfect use cases.

Similarly, face detection in a camera is great. If you can find faces and adjust the focal depth automatically to keep them in focus, you improve photos, and if you do it wrong then the person can tap on the bit of the photo they want to be in focus to adjust it, so even if you're right only 50% of the time, you're better than the baseline of right 0% of the time.

In some cases, you can bias the results. Maybe a false positive is very bad, but a false negative is fine. Spam filters (which have used machine learning for decades) fit here. Marking a real message as spam can be problematic because the recipient may miss something important, letting the occasional spam message through wastes a few seconds. Blocking a hundred spam messages a day is a huge productivity win. You can tune the probabilities to hit this kind of threshold. And you can't easily write a rule-based algorithm for spotting spam because spammers will adapt their behaviour.

Translating a menu is probably fine, the worst that can happen is that you get to eat something unexpected. Unless you have a specific food allergy, in which case you might die from a translation error.

And that's where I start to get really annoyed by a lot of the LLM hype. It's pushing machine-learning approaches into places where there are significant harms for sometimes giving the wrong answer. And it's doing so while trying to outsource the liability to the customers who are using these machines in ways in which they are advertised as working. It's great for translation! Unless a mistranslated word could kill a business deal or start a war. It's great for summarisation! Unless missing a key point could cost you a load of money. It's great for writing code! Unless a security vulnerability would cost you lost revenue or a copyright infringement lawsuit from having accidentally put something from the training set directly in your codebase in contravention of its license would kill your business. And so on. Lots of risks that are outsourced and liabilities that are passed directly to the user.

And that's ignoring all of the societal harms.

[1] My favourite of these is actually very old. The hyphenation algorithm in TeX trains short Markov chains on a corpus of words with ground truth for correct hyphenation. The result is a Markov chain that is correct on most words in the corpus and is much smaller than the corpus. The next step uses it to predict the correct breaking points in all of the words in the corpus and records the outliers. This gives you a generic algorithm that works across a load of languages and is guaranteed to be correct for all words in the training corpus and is mostly correct for others. English and American have completely different hyphenation rules for mostly the same set of words, and both end up with around 70 outliers that need to be in the special-case list in this approach. Writing a rule-based system for American is moderately easy, but for English is very hard. American breaks on syllable boundaries, which are fairly well defined, but English breaks on root words and some of those depend on which language we stole the word from.

at some point in my life i had to explain who donald knuth was to a table of not-cs type people

"he wrote a book about programs, then a program to write books, and then wrote a book about the program, in the program to write books"

🥈 We won second place in DARPA's AI Cyber Challenge with Buttercup!

After competing against 7 top teams to build autonomous AI systems, we're excited to announce that Buttercup, our Cyber Reasoning System that automatically discovers and patches vulnerabilities, is now open source.

Learn more about Buttercup: https://blog.trailofbits.com/2025/08/08/buttercup-is-now-open-source/

Please report any account that tells you that you need to verify your #Mastodon account to continue using it through a private message. It is a scam. We do not require identity verification. Real staff accounts either have a special role badge on their profile or are verified through the joinmastodon.org domain and will typically never reach out through private messages.

Here's a thoughtful essay about what AI is doing to college education. In short, it's taking away from students the process of struggling to learn. That process, where you figure it out and skill up, just doesn't happen to AI-dependent students, and wow will they hit a wall when they're actually called upon to think on their feet.

Counterpoint: I wanted to have a tabular summary of software vulnerabilities, organized by category, for C and C++ JSON parsers. (Useful to make an introductory point for a talk I'm writing about parser security.) Normally, that would mean days of searching and slogging through a ton of different websites. Instead, I threw the challenge at Google Gemini (Flash 2.0 "Deep Research") and, in not quite an hour, it delivered me a 20 page essay with several nice tables, and a ton of hyperlinks to its sources.

While previously I might have just waved my hands in the broad direction ("everybody knows this is a problem"), now I've got some fun slides to page through, illuminating the severity of the problem.

I'm entirely unsure how I feel about this, because for the first time ever, I got something genuinely useful out of an LLM.

https://nymag.com/intelligencer/article/openai-chatgpt-ai-cheating-education-college-students-school.html

Rampant AI Cheating Is Ruining Education Alarmingly Fast

In only two years, ChatGPT and the surge of AI-generated cheating from college students it has created have unraveled the entire academic project.

Intelligencer