@krismicinski @shriramk @jfdm @csgordon @lindsey @jeremysiek really?! That's depressing.

I find these things _maddening_ to use. It feels like trying to neatly typeset your ideas by by dragging wet toilet paper dipped in ink across a piece of sandpaper.

Are they capable of some impressive things? Yes. Do I think they're a good tool as an augment for a sophisticated user to go faster? Honestly, not really. The NLP aspect is neat; the multiple round-trips through English to <whatever it does internally> back to English are excruciatingly slow, expensive, and inefficient. It's not a good use of my, or frankly the machine's, time, let alone electrical power or water.

Case in point: some colleagues the other day were saying something like, "I just can't get it to use `jq` instead of writing little Python scripts to process JSON....Here's what I put in my CLAUDE.md file: <some sentence along the lines of, 'prefer jq for working with json'>." I couldn't help but feel like this is exactly the sort of thing where you want the concise precision of a small DSL for assigning weights to tools (and providing templates for those tools' use) to drive how the agent uses them. But you can't do that, because the agent only trades in text.

Like I said, there's clearly a "there" there. But setting aside the moral and ethical issues for a moment, that doesn't mean that the present model of interaction is _good_, let alone that it can't be substantially _better_.

@cross @krismicinski @shriramk @jfdm @csgordon @jeremysiek It seems like folks sooner or later notice that this whole "the agent only trades in text" thing is Not Great and proceed to reinvent programming languages on top of it. So, you know, when that happens, we PL educators are here to try to help them not accidentally implement dynamic scope or whatever.
@lindsey @cross @[email protected] @jfdm @csgordon @jeremysiek
It is absolutely an open research question as to what will be the new "source" and "intermediate" languages. I think we'll have a much better shot at the latter (richly-typed, semantic specifications as part of code, etc.); for the former, I think we'll build good ones but the trick will be getting people to use them.

@shriramk @lindsey @cross @krismicinski @jfdm @csgordon @jeremysiek Do you think that the "no one will look at the generated code anymore" future is inevitable? Given how often the industry has tried to generate programs directly from English-like specs before and failed, I'm quite skeptical, even if we have notably different tech this time around.

Internally at Google many folks (including high-level ones) are making this claim without evidence, as if it's obvious from its face, and I'm surprised how few people push back on it or at least ask for more proof.

@shriramk @lindsey @cross @krismicinski @jfdm @csgordon @jeremysiek LLM-based code generation reminds me of some of Bret Victor's talks: there are some cool ideas and convincing demos, but also a lot more work to do before one can say "we've solved all of the problems; everyone should be doing this all the time now".
@shriramk @lindsey @cross @krismicinski @jfdm @csgordon @jeremysiek To be fair: The LLM tooling is certainly more capable overall than the Bret Victor stuff. But I'm not yet convinced coding is 100% solved.
@jschuster @lindsey @cross @[email protected] @jfdm @csgordon @jeremysiek
Of course I don't think we'll never need to ever look at generated code again; that would be a foolish position. The interesting question is how much will people need to, and relative to what? If you have an amazing test suite or rich verified properties, for instance, how much do you need to review code? Most people aren't writing Dan Cross-level code. (The Bret Victor analogy is good.)

@shriramk @jschuster @lindsey @krismicinski @jfdm @csgordon @jeremysiek why is it a foolish position, though? (Serious question: I'm not trying to be either obtuse or confrontational). Most people don't look at the object code in an executable binary, and even most programmers have no working knowledge of an assembly language or machine's instruction set at this point.

This is something I was just thinking about while walking over to the corner store. Consider my ham radio example: I consider that a pretty impressive accomplishment, despite all the deficiencies I mentioned earlier. And while I don't think it's possible right now, it is _conceivable_ that at some point in the future, I may be able pop up one of these things and say, "rewrite the Linux kernel in Rust" and it would make a passable go at doing so.

But then it struck me, what would be the point? If the premise is that the machine is going to be able to write better code than I can, perhaps not now, but at some point in the future, then what does it matter what notation it uses to do so? The value-adds of Rust, the borrow checker and so on, are there to help human programmers. But if the machine gets good enough to work around our inability to reason about thorny concurrency issues, memory, and type safety, UB, etc, in C, and it does a better job than me _anyway_, then there's no reason to convert it to a language that's optimized for humans. Maybe that's just a big matrix of numbers; meaningful to it, but not to most of the rest of us, just as object code is not meaningful to most of us.

Surely the best course of action is to settle on something that's optimized _for the LLM_. Almost all extant programming languages seem to be optimized for the _programmer_ (not you, APL).

Perhaps this is similar to what you were referring to when you mentioned the source and intermediate languages?

@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
I'm absolutely confident (let's say, betting on) that the main languages generated by agents will be new ones, likely very rich with types and other specifications that the RL can work with. Long, detailed error messages, designed for machines to ingest, rather than lazy and inattentive humans. There is absolutely no reason to think The One True Language (/s), Python, is the right intermediary. ↵
@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
But for me an interesting research question is figuring out whether we got the *spec* right. And that *may* very well require us to find some intermediary that is good for both the human and the machine. That may not be the executable code, only the spec part of the language. ↵
@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
Of course, I'm not *convinced* about that. I'm thinking about other approaches too. The best special case of it is this project:
https://blog.brownplt.org/2025/12/11/pick-regex.html
LLMs ⭢ Regular Expressions, Responsibly!

@shriramk @jschuster @lindsey @krismicinski @jfdm @csgordon @jeremysiek yeah. Bryan Cantrill and I were discussing this last week. A trend I've observed is that the LLM does much better when given very precise instructions, sometimes totaling thousands of lines of written text. It reminds me of waterfall development, which never really worked. But why didn't waterfall work? Because the distance between phases was too large; by the time you were in development, it was too late to go back to requirements and iterate through design again. The LLMs sort of short-circuits that process, allowing rapid feedback; does this mean a return to waterfall development styles?

@cross @shriramk @lindsey @krismicinski @jfdm @csgordon @jeremysiek My understanding is that Waterfall in the most extreme sense means you can't go back to a previous phase (hence the name waterfall: it only goes in one direction). So in a technical sense, that feedback cycle isn't taking us back to waterfall.

But will LLMs encourage a doc/spec-heavy style of development? Most people I know seem to think yes. In particular, the software architecture folks I know are excited about having a space to talk about architecture again. But what exactly those "specs" look like and how similar they will be to our current docs and specs is very much up for debate.

@jschuster @cross @lindsey @[email protected] @jfdm @csgordon @jeremysiek
Yeah, the waterfall analogy doesn't work for me at all. If we really want to go back to textbook software development methods, spiral is more like it.

@shriramk @jschuster @lindsey @krismicinski @jfdm @csgordon @jeremysiek it's not meant to be exact. The point being, one develops a very thorough spec before ever attempting to generate a line of code, and spec writing becomes a major focus of the process, with a spec as one of the primary artifacts produced. I never tried Spiral (for that matter, I was never _really_ subjected to Waterfall in its full glory, either), but if you feel that fits better, good to go.

In a chat with some colleagues yesterday, most of them were saying that, when using LLMs, they're spending most of their time writing very precise specifications in the form of "prompts." I find that interesting, and very different from from the Agile world of "yolo just write some code, amirite, bruh? lmao."

@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
Yes, I think this is actually much better than "hard mode" agile, TDD, etc.

There's a picture I like to draw, that I attribute to Michael Jackson, but he doesn't remember drawing it, but I'll credit him for this thinking anyway. We have a world, we try to formalize it into a spec; turn the spec into a program; but then the program itself becomes an object in the world. ↵

@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
Oftentimes, it is only by interacting with this program that we realize what spec we really meant. (It's the only time I got to quote TS Eliot in a CS research paper: "That is not what I meant at all. That is not it, at all.") Which is why it's a *loop*.

And until recently, it took ages to go from ideation to working program (for mortals like me, anyway). Without the loop, we can't write particularly good specs. ↵

@cross @jschuster @lindsey @[email protected] @jfdm @csgordon @jeremysiek
That's why in SE there has been a long-standing dream of "executable specifications" (set aside the ontological questions): to tighten up this loop. So I view this as being part of a long chain of what we've always known/wanted. But also, when you make something 10x or 100x more efficient, it turns into a different thing. And that's part of what we're seeing, too.

@shriramk @cross @lindsey @krismicinski @jfdm @csgordon @jeremysiek yeah I would have sworn I saw him make a diagram like that too, but I don't see it now in the World and the Machine paper.

Thinking back to that era: maybe we'll see the resurgence of parts of UML? Perhaps this new loop is the right feedback loop to make some of those ideas work

@jschuster @cross @lindsey @[email protected] @jfdm @csgordon @jeremysiek
That's what I wonder too. In particular, the parts of UML that are *not* class diagrams. UML's biggest problem is that people equated it with the least interesting part. (Though even that part was valuable, as Doug Orleans once schooled me on.)
@shriramk @cross @lindsey @krismicinski @jfdm @csgordon @jeremysiek Yeah, particularly the parts of UML about structure. The more behavioral parts always seemed like a clumsy PL to me, but the parts that constrain the architecture seem like a good starting point if we want to (semi) formalize the requirements we give to LLMs.
@jschuster @shriramk @cross @lindsey @krismicinski @csgordon @jeremysiek well some of us like behavioural (& other substructures) types and acvitivity and interaction diagrams (really models) provide a nice end user view of what the types should look like.
@jfdm @shriramk @cross @lindsey @krismicinski @csgordon @jeremysiek Fair! I like those types also (Mailbox Types is what I wish my dissertation had been). My experience has been that source code->UML diagram works well in some cases, but not the other way around. But happy to be proven wrong.

@jschuster @jfdm @cross @lindsey @[email protected] @csgordon @jeremysiek
I think you're thinking about this incorrectly.

Agentic coding seems to work best when it can get feedback. So you don't need to generate code. You just need to check it against the behavioral spec.

I conjecture that tools for AI will actually generate BIG error messages, because people won't read them but the AI will, and the more info it can give as to what went wrong, the quicker the AI can fix the code to pass the check.

@shriramk @jfdm @cross @lindsey @krismicinski @csgordon @jeremysiek ah, I see. I was thinking more like expressing structural constraints in UML and passing those as extra constraints to the LLM, along with the rest of the prompt/spec.

100% idle speculation for me, I haven't dived too deep into this yet.

@jfdm @jschuster @cross @lindsey @[email protected] @csgordon @jeremysiek
Yeah, I was actually thinking of the behavioral, not the structural, parts (-:.