Mastodawn

witch_t *navi Jul 24, 2025

https://stackoverflow.com/questions/33051108/how-to-get-around-the-linux-too-many-arguments-limit/33278482

> I have to pass 256Kb of text as an argument to the "aws sqs"

what, uhhh, what

> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces

casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard

Show thread

kloenk Jul 24, 2025

@navi well in the early rust for Linux days we hit this limit with the passing kconfig options to rustc. Fun times

Show thread

Laurent Bercot Jul 24, 2025

@kloenk @navi Back when 128 kB was the limit for argv+envp, Google was hitting it too because they passed all the configuration for their whole software stack on the command line as --long-option=value switches.

Their solution? Compress the command line. So every binary started by ungzipping argv[1] and parsing it to get the configuration.

The person explaining this to me saw my horrified face, and said with the perfect Hide The Pain Harold smile: "a series of individually completely rational and reasonable decisions led to this." and I have been thinking a lot about it since.

Show thread

0xC0DEC0DE07EA Jul 24, 2025

@ska @kloenk @navi
I love one of the first rational decisions here: command-line arguments in scripts should be long-form to minimize reader confusion. Things go off the rails well before you hit 128kB of args though. You need to throw that in a config file or something, folks.

Show thread

Laurent Bercot Jul 24, 2025

@c0dec0dec0de @kloenk @navi Actually, *that* particular decision made sense: when you have a huge software stack with configuration switches, you have to use long options because you just don't have enough characters for short options. And when you have a cluster manager running a command line on thousands of machines, you don't want to have to copy a config file, it's good to have the config on the command line.

The questionable decisions were upstream (is it good to have a whole software stack with configuration switches in every binary? hmmm) and downstream (what to do if we hit the command line limit), but *that one* was sound. 😅

Show thread

witch_t *navi Jul 24, 2025

@ska @c0dec0dec0de @kloenk

i would honestly take the configuration from stdin at that point, and it can even look similar to the bazillion flags in a script by using here-doc

wouldn't work if they need stdin for something else, but i kinda doubt that a program that has this many flags actually uses stdin directly

Show thread

Laurent Bercot Jul 24, 2025

@navi @kloenk @c0dec0dec0de Sending the configuration to stdin is more difficult than storing it in a config file, because you have to have a process writing to the daemon's stdin. It's easier for the cluster manager to scp the config and give "-f configfile" to the daemon's command line.

The point is that they didn't even want to scp a config file. The agent was just reading and running a command line and they didn't want to modify it. That, I think, is the more questionable design decision.

Show thread

d@nny disc@ mc²

@ska @navi @kloenk @c0dec0dec0de also google's build system caches process executions by argv and it does also have checksummed file inputs but it's more effort to provide a config file since it needs to be paired with the command line every time and their tool's API is not as good as pants (horrible to use) so this is more difficult because their systems are not as good as they pretend basically. that's in the case of build processes at least