Starting with the very specific: I do not think it was an accident that the xz backdoor's exploit chain started with a modified version of a third party .m4 file to be compiled into xz's configure script.
It's possible to write incomprehensible, underhanded code in any programming language. There's competitions for it, even. But when you have a programming language, or perhaps a mashup of two languages, that everyone *expects* not to be able to understand — no matter how careful the author is — well, then you have what we might call an attractive nuisance. And when blobs of code in that language are passed around in copy-and-paste fashion without much review or testing or version control, that makes it an even easier target.
So, in my capacity as one of the last few people still keeping autoconf limping along, I'm thinking pretty hard about what could be done to replace its implementation language, and concurrently what could be done to improve development practice for both autoconf and its extensions (the macro archive, gnulib, etc.)
On the subject of implementation language, I have one half-baked idea and one castle in the air.
The half-baked idea is: Suppose ./configure continues to be a shell script, but it ceases to be a *generated* shell script. No more M4. Similarly, the Makefile continues to be a Makefile but it ceases to be generated from Makefile.am. Instead, there is a large library of shell functions and a somewhat smaller library of Make rules that you include and then use.
For ./configure I'm fairly confident it would be possible to do this and remain compatible with POSIX.1-2001 "shell and utilities". (Little known fact: for a long time now, autoconf scripts *do* use shell functions! Internally, wrapped in multiple layers of M4 goo, but still — we haven't insisted on backcompat all the way to System V sh in a long, long time.) For Makefiles I believe it would be necessary to insist on GNU Make.
This would definitely be an improvement on the status quo, but would it be *enough* of one? And would it be less work than migration to something else? (It would be a compatibility break and it would *not* be possible to automate the conversion. Lots of work for everyone no matter what.)
Suppose that's not good enough. Bourne shell is still a shitty programming language, and in particular it is really dang hard to read, especially if you're worried about malicious insiders. Which we are.
Now we have another problem. The #1 selling point for autotools vs all other build orchestrators is "no build dependencies if you're working from tarballs," and the only reason that works is you can count on /bin/sh to exist on anything that purports to be Unix. If we want to stop using /bin/sh, we're going to have to make people install something else first, and that something else needs to be a small and stable Twinkie. Python need not apply (sorry, Meson).
What's small and stable enough? Lua is already too large, and at the same time, too limited.
There's one language that's famous for being tiny, flexible, and pleasantly readable once you wrap your head around it: Forth.
If I had investments to live off, I would be sorely tempted to take the next year or so and write my own Forth that was also a shell language and a build orchestrator, and then have a look at rewriting Autoconf in *that.* This is the castle in the air.
Side bar 2: Let's table the whole "shouldn't everyone build from git nowadays?" discussion. I'm quite sure the xz insider could've found a way to hide the stage 0 exploit in a checked-in file. If you care about ways to make the output of "make dist" verifiable and reproducible, and to facilitate building from VCS checkout for those who want that, we're actually having a productive discussion about that on one of the autotools mailing lists right now.
(Not sure which list — I sort them all into one mailbox — and I have to warn you that several other less helpful conversations are happening under the same subject line.)
Moving to the more general.
I said this over on the autoconf lists earlier today: just as I think it is a mistake to focus on the stage 0 exploit having been concealed by not checking it into the VCS, I also think it is a mistake to focus on the next few stages having been concealed in a binary file. There are binary files that are naturally editable and auditable as themselves (raster images, for instance) and there are text files that nobody wants to look at at all (ever tried to fix a merge conflict in an SVG image?)
A more interesting line to draw, IMO, is between code and tests. I feel quite confident in saying that the files written to $prefix by "make install" should never need to have any sort of dependence on the project's test suite, and that is something that ought to be possible to detect mechanically (the biggest challenge is determining what files of the source repo are exclusively part of the test suite).
Last bit. Community, sustainability, and trust.
The early free software movement (1983–1994 give or take) was, as I've heard the tales, consciously revolutionary, and, as revolutions often do, it ran on the spare time of relatively young people with time and energy to spare.
I came on the scene in 1997, right about the time it became reasonably possible to run Linux as your only desktop OS if you knew what you were doing — or, to put it another way, right about the time the original goal of the GNU Project had been achieved.
Like many other revolutions, GNU had no answer, and still doesn't, to the question: now what?
This is not the only reason the young, energetic revolutionaries of 1997 are now the exhausted maintainers of an archipelago of individual "projects" that sort of add up to a computing environment that one might fairly describe as "the worst (except for all the others)". But I think it's an important reason.
Side bar 3: In the middle 1990s someone — either Eric Raymond or Guy Steele — wrote as part of the "Portrait of J. Random Hacker" appendix to the Jargon File
> [Among hackers] racial and ethnic prejudice is notably uncommon and tends to be met with freezing contempt.
This was not true even at the time, and twenty years later ESR was cheerfully making common cause with Vox Day and the Sad Puppies.
I'm a white guy (albeit some of my grandparents weren't). I already knew how to program when I got to college. If I'd made different choices in the early 2000s, I could very well now be sitting on enough investment income to take a sabbatical and invent a new shell language.
When we look around and say "where do we find the helping hands we so desperately need?" we must recognize that part of the problem is that hacking was never as inclusive a club as we claimed.
(This sidebar is not *only* a response to the commenters who saw a name like "Jia Tan" and immediately started hating on China as a whole.)
Who needs the maintenance? The devs?
Do we need better platforms for us to collect money for developers or even forks? The culture on Twitch with the subs shows, people are willing to pay, if they are used to.
But then only some may get most of the money. And the hacker scene already had enough problem with its stars.
We don't need more competition. It leads to > an archipelago of individual "projects" <.
Maybe the question is:
How can we federate open source software?
Federate like in: "I am able to do this. You are able to do that. This one has that thing. If we bring everything together and work on it, we will all have what we wanted. To work together successfully, we need to agree upon a standard."
No competition. No price. No market.
But we still need people, who know and can do stuff, and people, who have stuff. But without competition.
(This doesn't sound very convincing or conclusive, but I am still sending this post.)
@zwol
I'm not smart enough to hang here so just gonna add
> toasting in an epic bread
and if you start a patreon to fund your forth project I'm in for ten bucks
@zwol I'm hoping that a general push towards realizing that less complex code is easier for new people to contribute to helps this, but we've got a long way to go.
along with curriculums and corporate hiring practices making people think they need to know a lot more than they do to get involved with open source...
>making people think they need to know a lot more than they do
I had this exact experience the first time I made a contribution: it was so easy, very much against my expectations
@LikesCalendars @NireBryce Several times while I was teaching at CMU, people who'd taken my class (sophomore level "introduction to computer systems") told me that they went in expecting to hate it and find it incomprehensible, but then they really enjoyed the experience and now they were planning to take more systems courses
I don't know if I'll ever teach again, but if I could find a less exhausting way to convey that one piece of enlightenment—that the machine is not magic and you *can* understand it—to the general public...
@zwol What's left is taxes; computers are to a large extent the public communications infrastructure, and ought to be public in the full sense.
(And even if not in the full sense, substantially. Because having it work at all depends on dreary stuff you have to pay people to do, just like clearing clogged culverts.)
@zwol I'm not sure I'm quite prepared to accept Forth as the standard build language, but I certainly wouldn't be sad about it if that turned out to be the answer everyone went with.
That makes me curious, though, how well WebAssembly might do as a compromise. Its text representation can be reasonably clear, at least as far as stack languages go, and it's nicely explicit about what non-computational (I/O) capabilities you're asking for. I can think of various ways that could work that seem nice to me, but this is already a thought experiment on top of a thought experiment so I'll stop there.
@josh @jamey Yeah, sandboxes in general I like, except for sandboxes that are themselves a security disaster (have you ever looked at the code for firejail?
)
For clarity, I agree with "tests shouldn't write to the source directory" (indeed, *nothing* should write to the source directory) but that's not what I was saying. What I was saying was "the regular build should never need to *read* bits of the testsuite."
@zwol na, Forth too big too
But we can cheat. We can compile the build system to a binary. Or bring its own interpreter. Embrace the reality. Build systems are a programming language. So let's properly build one.
@zwol In #bootstrapping circles, we have GNU Mes and Gash (the combination of which is good enough to run ./configure scripts).
The Racket folks have switched to Zuo as their build system, also based on a minimal Scheme implementation.
Maybe not a universal option, but I can imagine a build system based on Mes/Zuo, at least in the circles I care about.
@civodul Hum the attack would be a bit more sophisticated for GNU Mes and Gash as implemented in Guix. But still…
Instead of targeting plain Bash, one needs to target the Guix package ’guile-bootstrap’. This package depends on tar, bash, mkdir and xz; it adds some surface.
Else, it would also be possible to exploit the non-deterministic Gash compilation to hide stuff.
https://simon.tournier.info/posts/2023-10-01-bootstrapping.html
The attack would be much more complicated, I guess.
@zimoun @civodul I thought about it some more and absolute size is not the most important issue here; the most important issues are (1) how difficult is it to install the thing, and (2) how much more readable than sh(+m4)+make do you get for the effort. That said, size does matter in that someone might want to audit the language they're being asked to install, on top of everything else. And the big popular interpreted languages tend to have large dependency graphs, which makes their true size even bigger, makes them harder to install, and makes problems for bootstrapping.
Python and Perl are very large (current releases are ~1.2M lines of code each according to SLOCCount), nontrivial to install from source, and problematic at the lowest levels of the bootstrap chain.
A mostly complete implementation of POSIX shell and utils, namely busybox, can be fit into 200,000 lines. bash+coreutils has important missing pieces (grep, sed, awk, find, diff are the ones I know about) and is about twice as big.
mes+gash+gash-utils is ~70,000 lines. Lua is ~20,000. Neither Scheme nor Lua feels like *enough* of a readability improvement over sh to be worth the switching costs.
I would say that 20,000 lines of C is about the upper limit for what I'd feel comfortable demanding people install before they can build the thing they actually wanted to build.
Furthermore, any such component cannot require a complex configure+build process itself lest we have a circular dependency.
@zwol @zimoun To be fair, Mes includes a C library, a C compiler with 4 backends, etc. The parts that would matter here are the interpreter, which is ~6K lines of C under src/.
Zuo has an interpreter with ~8K lines of C and ~5K lines of Zuo (Scheme).
This should be compared with the line counts of Perl + Auto{conf,make} + Make or CMake + Make/Ninja.
@zimoun @zwol Speaking of build systems: in 2008, Tom Tromey wrote Quagmire, a proof-of-concept replacement of Autoconf + Automake, mostly compatible with the latter, implemented in GNU Make (~1K lines).
https://tromey.com/blog/?cat=16
https://github.com/tromey/quagmire
It’s appealing because GNU Make is ubiquitous and ‘Quagmire’ files looked very much like ‘Makefile.am’.
The downside is that it’s hard to debug and work with (lots of ‘eval’ tricks…). Less appealing than Zuo or similar to me.
@ArneBab Hum, not really from my understanding.
There is a chicken-or-the-egg problem for the "driver" (currently guile-bootstrap). From my understanding, the only option for removing tar, bash, mkdir and xz is to have an implementation directly in binary (hex).
Well, that’s what I detail in the sections:
« Analysing guile-bootstrap derivation »
« Opinionated next steps »
from: https://simon.tournier.info/posts/2023-10-01-bootstrapping.html
@zwol @civodul @janneke
@zimoun I think I understand what you mean: this still needs something to run it.
Maybe avoiding compressions for tarballs could make xz unnecessary. Would it then suffice to have the scheme from mes capable of running as driver?
mes bootstraps from binary, IIRC, so this could alleviate more problems? https://www.gnu.org/software/mes/manual/mes.html#Full-Source-Bootstrap
@ArneBab « Would it then suffice to have the scheme from mes capable of running as driver? »
See Fig.1 https://simon.tournier.info/posts/2023-10-01-bootstrapping.html
The question is how to run ’bootstrap-seeds’, which is stage0 and M2-Planet. We need guile-bootstrap (driver) which relies on helpers (bash, tar, etc.)
Chicken-or-the-egg problem. We need a binary driver to get MES. IMHO, the only option’s binary driver written by hand, somehow.
@civodul @zwol In this context, maybe it’s worth mentioning that Racket and Chez Scheme use Zuo to replace make (keeping a stub makefile): they specifically don’t try to replace configure. Racket uses Autoconf; Chez Scheme uses a handwritten shell script.
The Zuo language certainly could be used to write a configure script. I mention this just to reaffirm that implementing ./configure does have specific challenges!