https://stackoverflow.com/questions/33051108/how-to-get-around-the-linux-too-many-arguments-limit/33278482

> I have to pass 256Kb of text as an argument to the "aws sqs"

what, uhhh, what

> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces

casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard
@navi well in the early rust for Linux days we hit this limit with the passing kconfig options to rustc. Fun times

@kloenk @navi Back when 128 kB was the limit for argv+envp, Google was hitting it too because they passed all the configuration for their whole software stack on the command line as --long-option=value switches.

Their solution? Compress the command line. So every binary started by ungzipping argv[1] and parsing it to get the configuration.

The person explaining this to me saw my horrified face, and said with the perfect Hide The Pain Harold smile: "a series of individually completely rational and reasonable decisions led to this." and I have been thinking a lot about it since.

@ska @kloenk @navi
I love one of the first rational decisions here: command-line arguments in scripts should be long-form to minimize reader confusion. Things go off the rails well before you hit 128kB of args though. You need to throw that in a config file or something, folks.

@c0dec0dec0de @kloenk @navi Actually, *that* particular decision made sense: when you have a huge software stack with configuration switches, you have to use long options because you just don't have enough characters for short options. And when you have a cluster manager running a command line on thousands of machines, you don't want to have to copy a config file, it's good to have the config on the command line.

The questionable decisions were upstream (is it good to have a whole software stack with configuration switches in every binary? hmmm) and downstream (what to do if we hit the command line limit), but *that one* was sound. 😅

@ska @c0dec0dec0de @kloenk

i would honestly take the configuration from stdin at that point, and it can even look similar to the bazillion flags in a script by using here-doc

wouldn't work if they need stdin for something else, but i kinda doubt that a program that has this many flags actually uses stdin directly

@navi @kloenk @c0dec0dec0de Sending the configuration to stdin is more difficult than storing it in a config file, because you have to have a process writing to the daemon's stdin. It's easier for the cluster manager to scp the config and give "-f configfile" to the daemon's command line.

The point is that they didn't even want to scp a config file. The agent was just reading and running a command line and they didn't want to modify it. That, I think, is the more questionable design decision.

@ska @navi @kloenk @c0dec0dec0de also google's build system caches process executions by argv and it does also have checksummed file inputs but it's more effort to provide a config file since it needs to be paired with the command line every time and their tool's API is not as good as pants (horrible to use) so this is more difficult because their systems are not as good as they pretend basically. that's in the case of build processes at least
@ska @navi @kloenk @c0dec0dec0de Putting the config in the command line *is* copying a config file everywhere, they just already implemented a protocol to do that.
@wollman @navi @kloenk @c0dec0dec0de Yup. The thing is, the way files are managed on Google servers is... peculiar, like everything they do, because everything needs to scale so immensely; I don't remember if servers even had local storage space. Can't give many more details without risking 1. being inaccurate and 2. going into NDA-protected territory, but it's likely that "accessing a file in the filesystem before having initialized the part of the software stack that does files the Google way" wasn't a trivial task at all. Whereas even they could not, despite their best efforts, make the command line more complicated than it is, so it was a good bootstrapping medium.
@ska @wollman @navi @kloenk @c0dec0dec0de before borg (circa Pentium II), it was common for web index servers to have a HDD failure, but keep running, because once you've loaded everything into memory, you don't need a disk.
@trouble @wollman @navi @kloenk @c0dec0dec0de even after Borg, IIRC, all the index servers had incredible amounts of RAM because serving from cache is so much faster than anything else.
@ska @navi @kloenk @c0dec0dec0de
"The agent" here would be the one writing to the daemons stdin, no?
@leeloo @navi @kloenk @c0dec0dec0de It could, but me point is that involving the agent for this is strictly more complex than letting the daemon open and read a file.

@ska @navi @kloenk @c0dec0dec0de

Copying a config file has a number of additional failure modes, mostly around disk/filesystem function (not full, corrupted, file cleanup, etc) which are avoided if the execution is simply an exec from ssh. Of course, that justification was already old when I joined Google, so it may have been post-hoc rather than designed.

If I recall correctly (and it has been many years), the SSH command actually called a local script which would restart the binary with the appropriate command line flags. This meant that there were actually two levels of supervision of the process: one local, and then a remote one for updating command line flags or if the local one failed for some unexpected (reboot, missing file, new server) reason. The first zip version was actually motivated by the local (python?) supervisor script being too long, and resulted in zipping the flags to the application as one of the parameters to the supervisor.

@evana @navi @kloenk @c0dec0dec0de I swear I practically begged them to let me work on babysitter, given that local process supervision was my domain of expertise, but they wouldn't let me 😅

@navi @kloenk @c0dec0dec0de @ska

Considering that a lot of commands parse arguments in order (and sometimes don't even need to store anything from previous arguments) streaming them could be more efficient.

And `exec foo "$@"` would not need to store the whole argument list just to pass it on again.

@sertonix @kloenk @c0dec0dec0de @ska `exec foo "$@"` doesn't help though, since the issue is too many args, so i'm confused a bit

@navi @kloenk @c0dec0dec0de @ska

I mean if all arguments (including some $@ equivalent) were done via pipes on the OS level. A bit off topic

@sertonix @navi @kloenk @c0dec0dec0de @ska unlike many uses of pipes, you generally want to know when the command line is done, because you want that configuration complete before you start initializing the rest of the system (and I recall chasing down demons in one of those initialization subsystems in a place where we were somewhat misusing the system -- again, best of intentions, but threads and fork+exec were not good friends at that time).
@[email protected] @c0dec0dec0de @ska @kloenk if it needs stdin also and "compressing argv with gzip that the binary begins by uncompressing" is a reasonable option, wouldn't it also be equally reasonable to just pass the config in an extra file descriptor, since you're already patching the binary to accept arcane configuration hacks?

@sodiboo @navi @c0dec0dec0de @kloenk Remember the context: it's a cluster manager that needs to execute the same program with the same config on N machines, so it sends a command line to local agents.

Anything that sounds reasonable *locally*, such as reading from a file descriptor, omits the fact that now you need to transfer the config to the agent, and the agent has to pass the file descriptor to the program, etc.

I understand why the whole system was designed the way it was. As the person said, every step was rational and reasonable (more or less). The issue is that, contrary to their habits, they did not reconsider their design when they found it could not scale.

@ska @sodiboo @navi @kloenk and this necessarily predates k8s mechanisms like ConfigMaps that become a file in the local context before the local program actually runs
@c0dec0dec0de @sodiboo @navi @kloenk Absolutely. And it is likely that when k8s came out, Googlers looked at it and what it was supposed to do and at what scale, and thought: "Haha. Cute."
@ska @c0dec0dec0de @sodiboo @navi @kloenk uh what? K8s originated at Google. It was written by former Borg devs

@fraggle @c0dec0dec0de @sodiboo @navi @kloenk Oh was it? Good to know, thanks.

It would have been great if these former Borg devs had learned a bit more from the experience about the value of simplicity 😜