Mastodawn

witch_t *navi Jul 24, 2025

https://stackoverflow.com/questions/33051108/how-to-get-around-the-linux-too-many-arguments-limit/33278482

> I have to pass 256Kb of text as an argument to the "aws sqs"

what, uhhh, what

> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces

casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard

@navi well in the early rust for Linux days we hit this limit with the passing kconfig options to rustc. Fun times

witch_t *navi Jul 24, 2025

@kloenk as a single argument? this isn't the whole argument list, this is *just* argv[1]

kloenk Jul 24, 2025

@navi ah oh. Then we hit the other limit. Many many arguments. Way to many

kloenk Jul 24, 2025

@navi the kernel existed somewhere on the limit, and then I broke it with just adding O=build to my make flags (I like a separate build dir for the kernel)

Laurent Bercot Jul 24, 2025

@kloenk @navi Back when 128 kB was the limit for argv+envp, Google was hitting it too because they passed all the configuration for their whole software stack on the command line as --long-option=value switches.

Their solution? Compress the command line. So every binary started by ungzipping argv[1] and parsing it to get the configuration.

The person explaining this to me saw my horrified face, and said with the perfect Hide The Pain Harold smile: "a series of individually completely rational and reasonable decisions led to this." and I have been thinking a lot about it since.

kitten Jul 24, 2025

@ska @kloenk @navi nah what the fuck 😭😭😭

kloenk Jul 24, 2025

@kitten @ska @navi if google does it we all are fine :p

Laurent Bercot Jul 24, 2025

@kloenk @kitten @navi tbh that was 11 years ago and I have no idea if they're still doing it. I suspect some Googlers were behind the push for Linux to drop the limit, and the whole tech staff breathed a collective sigh of relief when it happened.

Gabe Jul 25, 2025

@ska @kloenk @kitten @navi specifically, the arguments were compressed with gzip and then base64 encoded so that they could be passed with reasonable levels of escaping through ssh. This started about 20 years ago, and the kernels at the time were already modified to allow longer command lines.

Frankly, it's just another example of scaling: there's always a bottleneck, sometimes in surprising places.

The line about individual choices is perfect, too: there are always trade-offs to be made.
./.

Gabe Jul 25, 2025

@ska @kloenk @kitten @navi The whole thing also illustrates neatly why "Google does X" or "Amazon does Y" or, especially, "the SRE book says Z" is useless. You aren't Google, and even Google isn't that particular segment of Google at that particular point in time (and infrastructure and history and scaling and pressures).

Make your own choices, and then revisit them when they no longer work for you

(And definitely tell the rest of us, so we can listen in mute horror... :))

Laurent Bercot Jul 26, 2025

@gabe @kloenk @kitten @navi Precisely, I disagree that it's just another example of scaling. Usually, when hitting a limit, Google *reconsiders its approach* and develops a new solution that scales better. Here, the opposite happened: a hack was added to the original solution in order to accommodate the limit. At some point the limit was going to be hit again, even with compressed arguments.

Maybe it was just a way to buy time while waiting for the real solution, i.e. Linux dropping the limit. But it's definitely not the example I think about when mentioning "Google scale" 😅

Gabe Jul 26, 2025

@ska @kloenk @kitten @navi Google has some odd traditions it still can't quite let go, "put it all on the command line!" Is one of them. It does invent new stuff to fix issues until the next step function of scale, yes, but (even at Google) there's often just a bigger box.

LisPi Aug 12, 2025

@ska @navi @kitten @kloenk All to avoid a long-needed refactoring.

witch_t *navi Jul 24, 2025

@ska @kloenk this broke me laughing what the fuck

kloenk Jul 24, 2025

@navi @ska same, can’t boost this enough

The Cat Collective Jul 24, 2025

@ska @kloenk @navi What the hell did I just read oh my god, that is TERRIFYING. Yet that is also so ingenious that I don't know how to feel about it

-James

Laurent Bercot Jul 24, 2025

@thecatcollective @navi @kloenk "Brilliant and cursed" applies to way too much software, and I want the exact opposite of that - I want things that work dumbly, simply, elegantly, and that can be understood by mere mortals.

kloenk Jul 24, 2025

@ska @thecatcollective @navi isn’t technically a OS not being able to be written with C as the C spec defines some required things as UB? So yes that would be nice but sometimes fear we might need new abstraction for some of those types of software we have

The Cat Collective Jul 24, 2025

@kloenk Excuse the beginner question, but: If operating systems are not to be written in C due to the C spec defining some stuff as UB, how do kernels get away with it?

As far as I know, something being defined as UB means it may work, it may not work, it may do unintended stuff, but then how do they get around this?

-James

kloenk Jul 24, 2025

@thecatcollective I don’t remember anymore. Think it was something like the C spec not defining some forms of casting or something. Spec is only a document that says please do it like this. But if all compilers just decide to do it the same way even when the spec does not define it in that way

Saphira Lohikäärmekettunen

@kloenk @thecatcollective even if only the compiler used for the specific OS kernel does it in a way the source of that kernel expects, it's fine(-ish) - no "all compilers agree" needed

UB really just gives compilers the freedom to do whatever the fuck they want

And while I don't have anything specific to say, from doing kernel dev myself, I'm pretty sure there is some UB I depend on (and lots where I could probably just write better code, but that's a different topic xD)

witch_t *navi Jul 24, 2025

@thecatcollective @kloenk for the longest time the linux kernel was only buildable with gcc because they relied on undefined behaviour that is defined in gcc (plus obviously gcc extensions to the c language)

it is possible to write an OS with only standard, defined, C though -- but what you do is pull all the ultra low level logic that can't be done in a defined manner (very few things are like that actually) and write that in platform specific assembly

often said logic needs to be assembly anyway so it's all good

kloenk Jul 24, 2025

@navi @thecatcollective there are still some configs that don’t build with clang (still didn’t get around to send a issue report). Enabling the CONFIG_MATOM option results (in my case at least) in clang exiting with some weird to many registers error

John Jul 25, 2025

@thecatcollective @kloenk operating systems commonly use features that are provided by the specific compiler(s) that they're developed with, but that are not part of the C language standard. OSes also commonly have small pieces of low level functionality implemented directly in assembly language for their target platforms.

See, for example, https://maskray.me/blog/2024-05-12-exploring-gnu-extensions-in-linux-kernel

The Cat Collective Jul 25, 2025

@jpab @kloenk We will take a look at this, thank you!

-James

Laurent Bercot Jul 24, 2025

@kloenk @thecatcollective @navi I don't think the permissiveness of C has anything to do with the beatitude (in the NetHack sense) of a piece of code. C is underspecified, yes, because it's old and used for a lot of various things including kernels and drivers and stuff where it's essentially used as a glamorized assembly language.
IO coding with C is generally very pedestrian, not anything brilliant at all, and not especially cursed either, it just... is.

No, I am referring to high-brained solutions to problems you would never have had if your design wasn't made by and FOR high-brained programmers.

LisPi Aug 12, 2025

@kloenk @navi @thecatcollective @ska You essentially require that unless you start implementing a minimal runtime in microcode like some Lisp Machines and Java Machines did.

Regardless of the language, hardware-specific details will have to be handled as compiler intrinsics (or assembly or machine code) if the hardware isn't literally made to just run the language with no further setup.

Haelwenn /элвэн/

@ska @navi And I guess null bytes in gzipped form must have been funny to handle

Laurent Bercot Jul 24, 2025

@lanodan @navi I don't think that's necessarily a problem. argv[1] doesn't have to be a string, it's a character array. Null is used as a separator when the kernel puts the whole argv on the stack, yes, but argv[1] is still just a pointer and if you know you're expecting a blob and have a way to know where the blob ends, it should work, I think.

Or they could have been base64-encoding the gzip for all I know, it's probably still smaller than the uncompressed argv.

(Edit: typo)

Cassandrich Jul 24, 2025

@ska @lanodan @navi Nope, it's a string. execve will stop processing it at the first null byte.

execve syscall essentially acts as a scatter-gather (in this case gather) operation running over the argv and environ pointer arrays in user memory and performing a string-copy-from-user operation for each one to built the object that will be prepopulated into the new process-image.

Cassandrich Jul 24, 2025

@ska @lanodan @navi And the since-abolished 128k limit was a very good thing because it put a bound on the burden the kernel could be asked to do on behalf of userspace in one uninterruptible go, and on spraying attacks you could do to suid binaries.

It was probably removed because someone doing the utterly stupid thing Google did here demanded it.

JdeBP Jul 25, 2025

@dalias @ska @lanodan @navi

One of those someones was Rob Pike in 2004.

https://interviews.slashdot.org/story/04/10/18/1153211/rob-pike-responds

And the underlying kernel change happened in 2007.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b6a2fea39318e43fee84fa7b0b90d68bed92d2ba

Interestingly, modern #FreeBSD has a sysctl() limit (kern.ps_arg_cache_limit) on how large processes can resize their argument block and environment string block and still have it show up in the ps command.

I just raised mine to 768 bytes, coincidentally. I could have got away with just 512, I think.

Rob Pike Responds - Slashdot

He starts by clearing up my error in saying he was a Unix co-creator in the original Call For Questions. From there he goes on to answer your questions both completely and lucidly. A refreshing change from the politicians and executives we've talked to so much recently, no doubt about it....

My name is Gordo Aug 11, 2025

@JdeBP @dalias @ska @lanodan @navi Pike's diplomatic non-answer of "Comparing patents to nuclear weapons is a bit extreme" gave me a good chuckle, lol

Laurent Bercot Jul 24, 2025

@dalias @lanodan @navi Oh? It's a shame, then, that there isn't an execae() primitive that takes a char array as argv and a char array as envp and splits them following null bytes.

Because I have to do that all the time in execline and I hate to do it just to follow the API when the kernel is going to do the exact same right afterwards.

Cassandrich Jul 24, 2025

@ska @lanodan @navi Having them packed one-after-another in a single array is an implementation detail of the ELF entry point. It's not a programming interface, so it makes sense that there's no syscall to do that. Even if there were, the kernel side would have to do validation and building the pointer arrays, so it really wouldn't help make anything more efficient.

LisPi Aug 12, 2025

@ska @navi @lanodan @dalias Why not just pass the data in through stdin?

Laurent Bercot Aug 12, 2025

@lispi314 https://social.treehouse.systems/@ska/114909188642891910

Laurent Bercot (@[email protected])

@[email protected] @[email protected] @[email protected] Sending the configuration to stdin is more difficult than storing it in a config file, because you have to have a process writing to the daemon's stdin. It's easier for the cluster manager to scp the config and give "-f configfile" to the daemon's command line. The point is that they didn't even want to scp a config file. The agent was just reading and running a command line and they didn't want to modify it. That, I think, is the more questionable design decision.

Treehouse Mastodon

mei Jul 25, 2025

@ska @lanodan @navi I mean, you don't need to go all the way to base64, COBS would suffice

JdeBP Jul 25, 2025

@lanodan @ska @navi

Even on Linux-based operating systems, one can get strnvisx() and strunvis(), which solves that problem, should one choose to have it in the first place.

https://libbsd.freedesktop.org/wiki/

#Linux #BSD #vis #unvis #CommandLines

libbsd

Cassandrich Jul 24, 2025

@ska @kloenk @navi Narrator: they were not individually completely rational and reasonable decisions.

d@nny disc@ mc² Jul 24, 2025

@dalias @ska @kloenk @navi individually locally rational decisions according to incentives may not be globally rational

d@nny disc@ mc² Jul 24, 2025

@ska @kloenk @navi have had to work with googlers on build tooling in depth maybe 5 years ago and this explains some things about how they work lmao

Laurent Bercot Jul 24, 2025

@hipsterelectron @kloenk @navi The main insight I've acquired about how Googlers think is that they're used to working at Google scale, which is only relevant in FAANG companies, but when you're at Google everything is designed to make you forget that the outside world exists and is important; so your mind gets used to thinking *always larger*.

If you're designing software for a single machine, or a rack of servers, Googlers won't really understand you and you'll talk right past each other. If it doesn't scale to thousands of machines (or even millions), it has no value to them.

kloenk Jul 24, 2025

@ska @hipsterelectron @navi depends. I know some of the kernel devs at google (e.g. working on binder). They also have mobile “sized” software

Laurent Bercot Jul 24, 2025

@kloenk @hipsterelectron @navi Well the Android team is different, that's for sure. I haven't had any interactions with them.

Dave Sparks (Bit Wrangler)Jul 25, 2025

@ska @kloenk @hipsterelectron @navi I worked at GOOG for 14 years. One of my projects was writing code for an 8051 with 2K of EEPROM and 128 bytes of RAM. #NotAllGooglers 🤣

Laurent Bercot Jul 25, 2025

@davidlsparks @kloenk @hipsterelectron @navi I envy you, grats for landing that project! My domain expertise was a bright red neon sign flashing "Put this guy on Borg or on some embedded stuff", but they chose to put me in Web search instead. 🤷 At least I had a very formative and educational year surrounded with incredible people.

Dave Sparks (Bit Wrangler)Jul 26, 2025

@ska @kloenk @hipsterelectron @navi I hear you. I was fortunate to be hired as an embedded with low power expertise. "There are dozens of us." 🤣

@ska @kloenk @navi that's cursed

F4GRX SÃ©bastien Jul 24, 2025

@ska @kloenk @navi oh no

0xC0DEC0DE07EA Jul 24, 2025

@ska @kloenk @navi
I love one of the first rational decisions here: command-line arguments in scripts should be long-form to minimize reader confusion. Things go off the rails well before you hit 128kB of args though. You need to throw that in a config file or something, folks.

Laurent Bercot Jul 24, 2025

@c0dec0dec0de @kloenk @navi Actually, *that* particular decision made sense: when you have a huge software stack with configuration switches, you have to use long options because you just don't have enough characters for short options. And when you have a cluster manager running a command line on thousands of machines, you don't want to have to copy a config file, it's good to have the config on the command line.

The questionable decisions were upstream (is it good to have a whole software stack with configuration switches in every binary? hmmm) and downstream (what to do if we hit the command line limit), but *that one* was sound. 😅

witch_t *navi Jul 24, 2025

@ska @c0dec0dec0de @kloenk

i would honestly take the configuration from stdin at that point, and it can even look similar to the bazillion flags in a script by using here-doc

wouldn't work if they need stdin for something else, but i kinda doubt that a program that has this many flags actually uses stdin directly

Laurent Bercot Jul 24, 2025

@navi @kloenk @c0dec0dec0de Sending the configuration to stdin is more difficult than storing it in a config file, because you have to have a process writing to the daemon's stdin. It's easier for the cluster manager to scp the config and give "-f configfile" to the daemon's command line.

The point is that they didn't even want to scp a config file. The agent was just reading and running a command line and they didn't want to modify it. That, I think, is the more questionable design decision.

witch_t *navi Jul 24, 2025

@ska @kloenk @c0dec0dec0de

oh, okay, yeah i agree

d@nny disc@ mc² Jul 24, 2025

@ska @navi @kloenk @c0dec0dec0de also google's build system caches process executions by argv and it does also have checksummed file inputs but it's more effort to provide a config file since it needs to be paired with the command line every time and their tool's API is not as good as pants (horrible to use) so this is more difficult because their systems are not as good as they pretend basically. that's in the case of build processes at least

Garrett Wollman Jul 24, 2025

@ska @navi @kloenk @c0dec0dec0de Putting the config in the command line *is* copying a config file everywhere, they just already implemented a protocol to do that.

Laurent Bercot Jul 24, 2025

@wollman @navi @kloenk @c0dec0dec0de Yup. The thing is, the way files are managed on Google servers is... peculiar, like everything they do, because everything needs to scale so immensely; I don't remember if servers even had local storage space. Can't give many more details without risking 1. being inaccurate and 2. going into NDA-protected territory, but it's likely that "accessing a file in the filesystem before having initialized the part of the software stack that does files the Google way" wasn't a trivial task at all. Whereas even they could not, despite their best efforts, make the command line more complicated than it is, so it was a good bootstrapping medium.

Trouble Aug 11, 2025

@ska @wollman @navi @kloenk @c0dec0dec0de before borg (circa Pentium II), it was common for web index servers to have a HDD failure, but keep running, because once you've loaded everything into memory, you don't need a disk.

Laurent Bercot Aug 11, 2025

@trouble @wollman @navi @kloenk @c0dec0dec0de even after Borg, IIRC, all the index servers had incredible amounts of RAM because serving from cache is so much faster than anything else.

Leeloo Jul 25, 2025

@ska @navi @kloenk @c0dec0dec0de
"The agent" here would be the one writing to the daemons stdin, no?