https://stackoverflow.com/questions/33051108/how-to-get-around-the-linux-too-many-arguments-limit/33278482

> I have to pass 256Kb of text as an argument to the "aws sqs"

what, uhhh, what

> MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
> The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
> I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces

casually patching the kernel to send a quarter megabyte as a *single* argument oh my god i'm laughing hard
@navi well in the early rust for Linux days we hit this limit with the passing kconfig options to rustc. Fun times

@kloenk @navi Back when 128 kB was the limit for argv+envp, Google was hitting it too because they passed all the configuration for their whole software stack on the command line as --long-option=value switches.

Their solution? Compress the command line. So every binary started by ungzipping argv[1] and parsing it to get the configuration.

The person explaining this to me saw my horrified face, and said with the perfect Hide The Pain Harold smile: "a series of individually completely rational and reasonable decisions led to this." and I have been thinking a lot about it since.

@ska @navi And I guess null bytes in gzipped form must have been funny to handle

@lanodan @navi I don't think that's necessarily a problem. argv[1] doesn't have to be a string, it's a character array. Null is used as a separator when the kernel puts the whole argv on the stack, yes, but argv[1] is still just a pointer and if you know you're expecting a blob and have a way to know where the blob ends, it should work, I think.

Or they could have been base64-encoding the gzip for all I know, it's probably still smaller than the uncompressed argv.

(Edit: typo)

@ska @lanodan @navi Nope, it's a string. execve will stop processing it at the first null byte.

execve syscall essentially acts as a scatter-gather (in this case gather) operation running over the argv and environ pointer arrays in user memory and performing a string-copy-from-user operation for each one to built the object that will be prepopulated into the new process-image.

@dalias @lanodan @navi Oh? It's a shame, then, that there isn't an execae() primitive that takes a char array as argv and a char array as envp and splits them following null bytes.

Because I have to do that all the time in execline and I hate to do it just to follow the API when the kernel is going to do the exact same right afterwards.

@ska @lanodan @navi Having them packed one-after-another in a single array is an implementation detail of the ELF entry point. It's not a programming interface, so it makes sense that there's no syscall to do that. Even if there were, the kernel side would have to do validation and building the pointer arrays, so it really wouldn't help make anything more efficient.
@ska @navi @lanodan @dalias Why not just pass the data in through stdin?
Laurent Bercot (@[email protected])

@[email protected] @[email protected] @[email protected] Sending the configuration to stdin is more difficult than storing it in a config file, because you have to have a process writing to the daemon's stdin. It's easier for the cluster manager to scp the config and give "-f configfile" to the daemon's command line. The point is that they didn't even want to scp a config file. The agent was just reading and running a command line and they didn't want to modify it. That, I think, is the more questionable design decision.

Treehouse Mastodon