My job as a senior developer with a team of juniors is to figure out what to write, sketch a PoC as guidance, and then delegate the actual implementation to them. I'm going to look at that, explain misunderstandings or poor style choices, and guide them into implementing something that meets our standards.

I don't think LLMs can do my job yet. But I think we're getting shockingly close to them being able to do the other part. And I'm worried how we're going to get more senior developers.

@mjg59 every mid-to-large FOSS project is seeing their "Good First Issue"s getting sniped by 20 LLM bots. Those exist to feed new contributors into dedicated ones. If you cut the bottom rungs off the ladder, how is anyone going to be able to get to the top?
@greg yeah, exactly. I've helped people turn into senior devs, I don't know how to turn an LLM into one - embodying good taste is a different problem to generating code that meets a functional description

@mjg59 @greg I agree wholeheartedly with the junior pipeline problem, though I suspect that we end up with junior devs who are good at piloting the models, and learn to debug even hard problems within that context.

We didn't stop being able to computer when people stopped learning assembly or c, I hope we have a similar outcome here.

@PaulM @mjg59 @greg But we could stop learning assembly because compilers are deterministic and don’t hallucinate. That’s not the case with LLMs.
@chris_evelyn @mjg59 @greg to be pedantic, computers are only sorta kinda mostly deterministic if you squint at them just right. From the perspective of any given program executing in a modern operating system, there's a whole lot happening around it which is completely opaque, even if execution mostly proceeds in an apparently sequential fashion.

@PaulM @mjg59 @greg That argument is bullshit and I’m getting fucking tired of it.

How often did you have to check assembly output lately because a compiler did something different from what you expressed in your code?

@chris_evelyn @PaulM @greg I'm a kernel developer, this happens to me more than you'd think

@mjg59 @PaulM @greg See my other answer, I forgot that I‘m replying to professional edge case handlers in this thread so had to dial it back to „normal“ programming.

Out of curiosity: Do LLMs work well for kernel dev?

@chris_evelyn @PaulM @greg massively depends, a *lot* of the kernel is super boilerplate and it's largely fine at that, and then you reach the point where you're dealing with CPU errata and you're going to have a bad time. I wouldn't say no to it in general (and we know chunks of Linux are already LLM developed), but I'd have several concerns around its use in more specialised areas

@mjg59 @PaulM @greg Thanks!

and we know chunks of Linux are already LLM developed

Do you have a pointer to some examples handy? I‘d be interested in the process and discussions around that.

Supporting kernel development with large language models

Kernel development and machine learning seem like vastly different areas of endeavor; there are [...]

LWN.net
@mjg59 @chris_evelyn @PaulM @greg Any examples of CPU eratta being relevant, other than the obvious security holes?
@alwayscurious @chris_evelyn @PaulM @greg "You must ensure that certain things have weird alignment otherwise the CPU will fault or return garbage" is a surprisingly common thing for CPUs to insist on and also typically not present outside kernels, so there's very little training data that embodies it
@mjg59 @chris_evelyn @PaulM @greg Is this found on the big CPUs or mostly limited to embedded?
@alwayscurious @chris_evelyn @PaulM @greg Less common on big CPUs these days, but it's the kind of thing that early Ultrasparc and 90s MIPS had a bunch of
@alwayscurious @mjg59 @chris_evelyn @PaulM @greg
there's also performance related errata, like https://www.intel.com/content/www/us/en/support/articles/000055650/processors.html though that needs to be worked around in the compiler/assembler and in the kernel mostly only affects things that manipulate code (live patching, JIT, etc.).
Provides you with information about the Jump Conditional Code Erratum and how to obtain the MCU.

Provides you with information about the Jump Conditional Code Erratum and how to obtain the MCU.

Intel
@mjg59 @chris_evelyn @PaulM @greg Can the boilerplate be replaced with a (non-LLM) code generator?
@alwayscurious @chris_evelyn @PaulM @greg there's huge piles of "What does driver initialisation look like" that could be replaced with macros except that would reduce readability
@mjg59 @chris_evelyn @PaulM @greg Could it be replaced by a YAML file or similar?
@alwayscurious @chris_evelyn @PaulM @greg in theory? But that's not really what the kernel community likes
@mjg59 @chris_evelyn @PaulM @greg They finally did that for Netlink parsing.
@chris_evelyn @mjg59 @greg I've written a damn lot of code to accomodate, or exploit, or defend against computers being observably nondeterministic, mostly significantly later along in the process than the compiler producing assembly.
@PaulM @mjg59 @greg When have you last had to check the assembly output of one of your programs to make sure it did what you told it?
@chris_evelyn @mjg59 @greg in my experience, that particularly narrowly focused part almost never goes wrong. It's the parts before and afterwards that get squishy - did you compile the right bits? At scale, the computers that bring the bits together over the network to do that tend to fail in weird and unpredictable ways. Working in supercomputing, if 75% of the hardware is not OBVIOUSLY broken when a vendor first hands it over to us, that's a pretty great outcome! The maintenance techs for the computers I work with badge into a datahall at the start of their shift, and even baseline reseating/replace/recable tasks keep them busy the whole shift, and they're not even doing complex repairs that involve opening the lid.
@chris_evelyn @mjg59 @greg Or on a different topic, when I was studying bitflips at scale based on network traffic, I was able to identify bad individual devices (e.g. that particular consumer ISP DNS cache was going bad - I actually called the ISP and they replaced it), bad batches of devices, and even discern where the flips were happening (almost all in device ram, almost no corruption during the UDP DNS lookup even though that only has a 16 bit checksum).

@PaulM @mjg59 @greg Yeah, I‘m talking about programs that process everyday transactions or do the equivalent of counting the ‚r‘s in the word „Strawberry“.

I never went much deeper than having a rummage through the C code in the PHP bytecode compiler. And at that level, nondeterminism is a bug that gets reported upstream and fixed.

When I started working with message queues and distributed stuff, things got a lot more interesting but (so far) never because a compiler/interpreter „misunderstood“ my code or produced semantically wrong output.

@chris_evelyn @mjg59 @greg I think "LLMs are like a compiler" isn't the right analogy. A better way to put it is that "LLMs move median user attention [and by extension, deep understanding] elsewhere in the stack, just like interpreted languages did to compiled languages, and compiled languages did to hand-written assembler".

My point about computers being nondeterminstic wasn't that the bytecode typically gets executed out of order (as you say, we fix those bugs when we find them) but that the environment within which a program executes very quickly DOES become non-deterministic - message queues, anything involving networking, scheduling, anything involving changing wall clocks, etc. The bytecode still executes one instruction after the next, but will the filesystem still be there? Will 30 minutes of wall clock time have passed between instructions? (the latter event completely hosed a cloud once for me).

Anybody claiming LLMs produce the same output every time either has the temperature set to zero (and then they're still mistaken), or is only asking for trivial work products. They're a tool, but they certainly don't replace compilers or (relatively) deterministic code execution.

@chris_evelyn @mjg59 @greg and to extend that - if you're using the LLM itself as a replacement for "an everyday program to do something deterministic like count R's" you're holding it wrong. Using it as a message bus or a calculator is stupidly expensive, and as you noticed, doesn't return predictable results. They're not equivalent, and trying to say they are is a bit silly, even if you can in fact use three dozen calculators as a very tiny ladder.

@PaulM @mjg59 @greg But it‘s what people do, in a roundabout way.

Many LLM uses I see from people around me are for things I solved with templates for a long time now, Boilerplate. And those templates aren‘t even that complex.

(I‘m lazy and never learned to type very well, so I got good with templates, snippets, code generators what have you. The type of laziness that makes you learn M4.)

When I show those, I‘m often met with surprise. „You can do that?“

Maybe that’s why I‘m a little more skeptical and exasperated about LLMs than a lot of developers.

@chris_evelyn @mjg59 @greg oh I'll be the very first person to agree that a lot of people are doing incredibly dumb things with it. It's weird, and new, and there are a ton of very enthusiastic, extremely careless users smearing it all over the place. A ton of that is hugely negative.