Mastodawn

Sorry for the jump scare in the back half, but, this is my attempt to (A) write about The Problem without comprehensively making everything *about* The Problem all the time, and, (B) hack off a bit of useful philosophizing and writing from my Enormous Never-Going-To-Be-Finished Omnibus Post

Jeff Miller (orange hatband)Mar 4

@glyph I like point 2.

> Second — and this is actually its more important purpose — code review is a tool for acculturation.

And your points about automatic checks being the primary filters on code properties are well taken and worth repeating.

Code review is a social process, and you should treat it as such.

This is why we also like to include a "distant voice" in code review, someone within the big org who is less intimately familiar with the code/code base, just to notice things the local folk have perhaps become blind to, and to socialize more broadly coding practices and organizational affordances. (Also catches integration bugs and interface confusion early, which is a plus.)

Your weary 'net denizen Mar 4

@glyph It worries me that I see people talking about using LLMs to perform code review.

Dima Mar 5

@glyph To check the "I use AI" box at work, I've started using Copilot to "review" the code.

I kick it off at the very end: either before approving the PR when I'm a reviewer, or before requesting a review for a peer.

The goal is to keep my reviewing skills sharp by making sure that I find all the issues and the Copilot run is clean.

Dima Mar 5

Is it useful? I'm not sure. I definitely think AI reviews don't worth burning the planet.

Copilot is very good at correcting natural English. This is neat, but should be easily caught by human.

Copilot might be a mix of static code analysis tools mixed together with LLMs (is it what agents are basically are), because it found some inconsistencies.

I would rather configure dedicated tools to do this and use them as I'm writing the code.

Tamir Bahar Mar 4

@glyph this is great! Now I can send this to my team without writing it from scratch! Thank you!

Per Vognsen Mar 4

@glyph My only question is with your comment on deterministic testing. I would think randomized property testing and other forms of fuzzing are especially suitable in this case because it's harder to "code to the test" in the way you can for a deterministic test suite. And that's aside from the general advantages of property testing. (I nodded vigorously to everything you said about code review.)

John-Mark Gurney Mar 5

@pervognsen
Why not both?

Deterministic for the single variable testing, e.g. make sure specific code paths are executed and covered.

Then you can use fuzzing as a type of pair wise testing, that is, making sure to cover combinations of paths that might not be expected through normal deterministic testing.

@glyph

Glyph Mar 6

@encthenet @pervognsen sorry yes, fuzzing is a fine way to find bugs. while it is stochastic, it has a much more bounded error range on its behaviors. but it’s also less universally applicable than regression testing, and this was a very broad post. not all software has interesting broad ranges of inputs where surprises can happen, some has a quite constrained and simple set of business process rules that it follows

slotos Mar 4

@glyph The second half is exactly why I dismiss any library where author claims tests to be written by an LLM.

That’s not what tests are for. No, you cannot version prompts and expect not even the same but equivalent output all the time.

Joshua Lock Mar 4

@glyph absolutely all of this, yes. When I'm reviewing code my primary focus is: will someone be able to correctly decipher this in future under the pressure of breakage? I really don't want to be in the business of verifying code is correct and secure, there are (deterministic) tools that do a much better job of that.

Wolf480pl Mar 4

@glyph
shit, I've been doing it wrong for the last 10 years...

Wolf480pl Mar 4

@glyph
but also, I think you're missing one or two more purposes of code review (or you've implicitly folded it into the other three):

1. Ensuring the code is readable.

Source code is a way to communicate with other humans. So it's important that people other than the author understand it. If the code is too complex, difficult to follow, or relies on assumptions that are not commonly known, code review will catch that.

2. Making others aware of the changes.

Matt Campbell Mar 4

@glyph To me the most interesting part of this is actually the bit about how humans can't maintain enough concentration to review more than about 400 lines at a time. If that's true, then it has implications for the development process. I guess it means that, when a feature is too big to be practically broken into <400-line pieces, it has to be reviewed in multiple sessions with breaks in between. This is at odds with the pressure to review big features quickly to avoid being the bottleneck.

eestileib (she/hers)Mar 4

@matt @glyph

I think it's hard for people in commercial software to experience what it's like to actually solve a problem rather than to plug a gap.

It takes far longer up front but then you get to close that problem and hardly EVER think about it again.

Identifying useful things that can be done correctly and being able to get it done in a large collaborative environment only comes with experience. Chatbots aren't gonna do that.

I think de googling/de MSFTing in the next year or so makes sense just if you want to have functioning software, the guys running these shops no longer have any idea how fucking hard and contingent it is to run something like Google.

Skilled people need to come up with ingenious shit on a daily basis to keep big services like that running and legal.

One of these days a major Google service is gonna break and they're not gonna know how to bring it up.

sabik Mar 5

@matt @glyph
Mostly, the human attention span means that features do need to be split up into <400-line pieces; that's one of the key skills in software engineering

Reviewing in multiple sessions with breaks doesn't help - if the review can be split up into pieces, then so can the implementation; if it's legitimately one piece that can't be split up, then the review can't be split up either

I don't know if there are features that can't be split up into smaller pieces; if there are, we can't engineer them

poleguy looking for lost tools Mar 8

@sabik @matt @glyph isn't that a problem that is typically solved by abstraction?

Glyph Mar 5

@matt I stole the citation from the Cryptography project's developer docs, that has a soft cap on PR size for this reason. IIRC if you need more than 400 lines at a time, you do the moral equivalent of feature-flagging it off (i.e. don't expose the API until the last commit)

@glyph have you read the naur paper I've recently been ranting about yet?

@chrisjrn I don’t think so?

@chrisjrn oh wait, “programming as theory building”. yes, I have muddled through a bit of it

@glyph you're making a lot of points that echo it: your "enculturation" idea in particular is very similar, but it's holding some unstated priors (e.g. that having multiple people understand the code is useful); Naur starts from the point of view of long-term maintenance, at which point, understanding seems essential

@chrisjrn it’s definitely ideologically aligned. I didn’t read it too deeply (partially because i have been just generally distracted, but also) because it echoed another one of my faves, https://blog.nelhage.com/post/computers-can-be-understood/ particularly “building mental models” .

Computers can be understood

Computers and computer systems are build up from deterministic, comprehensible, building blocks. Their operations and behaviors can be understood and reasoned about. I relate my personal beliefs and mindset on this point, and explore some manifestations and ramifications of this philosophy.

Made of Bugs

@glyph That piece is an epistemological argument in the same way that yours is, which I feel would not be compelling to people who haven't developed priors that justify a want of understanding. Indeed, that piece acnkowledges that and implies (and almost explicitly says) that understanding as the end goal is counterproductive.

Which is not to say that the joy of understanding isn't a valid one (I care about it a lot), it's just that if that doesn't motivate people, I don't think your point holds up.

@chrisjrn I have struggled my entire life to relate to incurious people and one of the biggest challenges of my life remains understanding those who do not seek to understand

@glyph this is definitely an mood

Stephen Rosen Mar 4

@chrisjrn @glyph

"Everything is interesting if you go into it deeply enough."

I try to imagine that computing is to many folks as taxes and regulation are to me. But then I realize that I think that these topics are interesting, if approached in the right context, and I've lost my ability to empathize again...

@sirosen
There is, at least, a reasonably immediate and consequential impact to ignoring legal obligations. The long-term impact of not understanding a computing technology sufficiently is… rarely concretely explained, and the accountability aspect is rarely directly felt by the people who err.

@glyph

🇺🇦 haxadecimal 🚫👑Mar 4

@glyph @chrisjrn
One of the most surprising things I ever learned, in my.young teen years, is that there are people who are reasonably literate, but don't particularly like reading.
I didn't, and still don't, understand how that is possible.

@brouhaha @glyph

I do not find reading, as an abstract concept, enjoyable: I enjoy/have enjoyed reading specific things, and I enjoy learning things which sometimes occurs by way of reading.

🇺🇦 haxadecimal 🚫👑Mar 4

@chrisjrn @glyph
Sure, I',m selective as well.
I'm referring to.peolle who have _nothing_ that they like to read. They only read when they must.
Perhaps my.inabiloty to understand that is a failure of my.imaginstion.

Magnus Mar 4

@glyph @chrisjrn
Ok, there are people who don’t want to understand things. There’s probably a reason for that. What do I care?

@magnus @chrisjrn I see what you did there.

https://social.vivaldi.net/@cmthiede/115928554446277618

CM Thiede Mar 5

@glyph @chrisjrn

And there's those that are happy to ride the lightning, because there's no stopping it. Might as well be on the capitalizing team.

Alyssa Coghlan Mar 5

@chrisjrn @glyph One of my favourite ever PyCon AU Education Seminar talks was "Running Python on your Brain Computer", with part of the gist being that understanding is an essential part of debugging (as without it, you won't understand what any error messages are trying to tell you, and you won't know where to poke the system to get it to give you more relevant information).

Rachael L Mar 6

@glyph @chrisjrn love this article just from reading the first couple headings. Yes! Software can be understood! Computers are not magic!

I cite the Naur paper regularly and will add this one to my list.

Rachael L Mar 6

@glyph @chrisjrn Might add "Do the easy thing first" to my mottos as a variant of "Check the obvious things first" which has saved me so many times and maybe the other wording will remind me to do it when I would otherwise fail to check the obvious thing!

Raymond Neilson has moved to cosocial.ca Mar 4

@glyph @chrisjrn ooh I've been banging that drum for ages, great paper, one of my personal lodestones

Ben Ramsey Mar 4

@glyph @benjamineskola I think a third thing (and the way I use them) is that code reviews are for teaching and steering. Especially on a legacy code base that’s being slowly modernized, I’ll leave tips and hints on ways to improve the code instead of following the old anti-patterns around it. Most of these are non-blocking comments, but a lot of the time, the other devs take the recommendations to heart, make the changes, and then I see them following similar practices on future PRs.

@ramsey @benjamineskola yep, a critical perspective as well, but difficult to fit into a short piece on this. also dovetails with genAI problems too, because due to the current state of most tools I am assuming your code goes in the context window and not the training data. but once we have models that are trained or fine-tuned on your org’s codebase, that is a very specific and more immediate risk of calcification and repetition of historical mistakes

Ben Ramsey Mar 4

@glyph @benjamineskola Part snark, part honest question:

Does it really matter what the code looks like or how it’s organized when machines are the ones reading and writing the code?

@ramsey @benjamineskola serious answer I guess: “machines” is too vague a term. but while LLMs may do things that could be described as “reading” or “writing” code, it’s unambiguous that humans are the only ones *understanding* the code. Hal Abelson famously observed in the first edition of SICP, “Programs must be written for people to read, and only incidentally for machines to execute.” That’s still true today.

Kraftwerk-Das Model Collapse Mar 5

@ramsey @glyph @benjamineskola
if we had AGI it wouldn't matter, but we don't, and LLMs are a dead end towards that goal.

A lot of pro-LLM arguments hinge on what I tend to call "weaponized science fiction", a dangerous trap to fall into.

Tobias Mar 4

@glyph This is a very well put stance.
thanks!

Woke Leftist Trash Mar 4

@glyph under appreciated empirical review of code review on software quality: https://rebels.cs.uwaterloo.ca/papers/emse2016_mcintosh.pdf

i don't think this often gets talked about in systems that use Gen AI! Humans are pretty ineffectual at preventing errors through code review. What effect exists seems to disappear, as you point out, after reading so much code in a short time frame.

We're _accountable_ for the output of these systems! Code change is expensive because we're supposed to be protecting the liability of the company from shipping code changes that harm businesses, lose money, etc.

Code review, I agree, is a social process. If you like Peter Naur's theory building stuff... it's a great way to build collective understanding of the theories.

Thanks for sharing!

Dave bauer Mar 5

@glyph Just here to agree with all the commenters, great explanation of the purpose of code review.

Glyph Mar 5

@davebauerart appreciated nonetheless!

davidvedvick Mar 5

@glyph IMO, one of the best uses of a language model is to review your own writing or code, since it has a good idea of what the "norm" is, can keep its attention indefinitely, and can explain reasonably well what the norm is. It also fits well within the paradigm of "are you working for the machine, or is the machine working for you?”. But it also doesn't mean that people shouldn't be a part of the review, just that people should be doing the fun stuff, not the dry, hard to focus on stuff.

davidvedvick Mar 5

@glyph it's also much easier for a lot of us to take a critique from a machine than from a person!

John-Mark Gurney Mar 5

@glyph
Like the blog post! Thanks.