This bug is a masterpiece and you owe it to yourself to read this. So much effort for such a situational bug, it's heartbreakingly beautiful.

https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt

The key to it: OpenSSH will open and close dylibs in response to the agent protocol, because that's how it implements smart cards.

Then: ELF lets you mark functions in your dylib that get invoked on open (__attribute__(constructor)) and close (destructor).

Finally: lots of libraries in /usr/lib have constructors with side effects (like registering system call handlers that don't get unregistered because they're never expected to be unloaded, or invoking system call handlers by crashing when you randomly load them). So: UAF primitive.

There's like several thousand words of exposition about how they found the right sequence of opens and closes to set up a signal handler, groom the address space, set the stack executable (another dlopen side effect!), and trigger the signal.

But what makes it :art: is this bit:

"As a last and extreme example of a remote attack against ssh-agent forwarding, we noticed that one shared library's constructor function (which can be invoked by a remote attacker via an ssh-agent forwarding) starts a server thread that listens on a TCP port, and we discovered a remotely exploitable vulnerability (a heap-based buffer overflow) in his server's implementation.”

@tqbf On SCW ya'll joke about stunt cryptography exploitation; this was truly a beautiful example of stunt vulnerability work by Qualys. :cook: :kiss:

The WTF for me was the whole "the PKCS#11 API consists of dlopen() and pray, with no 'here's how you recognize a legit and safe provider' mechanism". Where did this 90's level trust leak through from? Was it NSS/NSPR? Or a Solaris thing?

@guenther It sort of does figure out real PKCS#11 providers --- it dlsym's the PKCS#11 entrypoint function after it opens, and closes the library if it's not there.

Not only that, but they found actual PKCS#11 handler libraries that had these side effects, so even parsing ELF and doing the symbol lookup outside of dlsym wouldn't have completely killed this bug class! It's wild!

@tqbf Sorry, I meant no "side-effect free" recognition mechanism. I think ELF had DT_INIT from the start but maybe they weren't used enough back then to appear a risk.

The "your legit PKCS#11 handler library contains an exploitable vulnerability" problems feels more like the generic "all software sucks and we must continue to fix it" and not something that, well, OpenSSH or other users of PKCS#11 can really do anything about. I mean, the scope of a PKCS#11 handler is not something you can easily bound by limiting privileges, is it?

@guenther @tqbf IMO the real recognition mechanism should be "libraries registered with config files" (like invent some $PREFIX/etc/pkcs.whatev/*.json registry like there is for Vulkan layers) or at least "libraries placed under $PREFIX/lib/pkcs.thingy". Just an allow-list consisting of /usr/lib and /usr/local/lib seems ridiculously permissive.

Also: why doesn't openssh just throw the whole pkcs helper process away instead of doing a
dlclose()?
@guenther @tqbf Yeah, the need for dlopen-ing anything still isn't obvious to me and it just looks like unnecessary ambient authority.

@tqbf

Let me put my Plan 9 hat on... done.

Dynamic linking was an error. 😇

@Shamar @tqbf What did plan9 do instead which provided similar reduction in resource usage?

I am indeed assuming it did something because I'd just be disappointed if the answer was just "lol suck it up".

Its lack would also make dynamic FFI profoundly annoying to implement, since it'd require the generation of (throwaway?) stub programs and IPC (with predictable performance implications).

@tqbf Oh my gosh this really is a thing of beauty. I'm in awe.
@nelhage It's so great. I figure you'd dig it. :)
@tqbf No such UAF primitive in #musl 😎
@tqbf dlopen is such a cursed weird machine
@tqbf the lolbin-style chain reminds me of payloads for exploiting unsafe deserialisation bugs in .NET
@tqbf TIL. From the referenced Project Zero bug: "The OpenSSH agent permits its clients to load PKCS11 providers ... if OpenSSH was compiled with the ENABLE_PKCS11 flag (normally enabled) and the agent isn't locked. For these commands, the client has to specify a provider name. The agent passes this provider name to a subprocess... and the subprocess receives it and passes it to dlopen()..." When using agent forwarding the client may run in an attacker-controlled environment.

@tqbf

I suspect that the real problem is in glibc.

@tqbf
>*While browsing through ssh-agent's source code, we noticed that a remote attacker, who has access to the remote server where Alice's ssh-agent is forwarded to, can load (dlopen()) and immediately unload (dlclose()) any shared library in /usr/lib* on Alice's workstation (via her forwarded ssh-agent, if it is compiled with ENABLE_PKCS11, which is the default).*

Oh no, this will not end well

@tqbf @glyph The 1960s called: even *they* don’t want their text formatting engine back.
@tqbf @glyph That was great indeed. I don’t know about most of what they talk about, but I was still able to follow well enough for it to be satisfying. Thanks for the post.
@tqbf I’m working in IT (for less than 2 years admittedly) and this reads like a foreign language to me almost. Nonetheless had fun trying to understand the issue and reading the replies.
@balssh That's what's so great about vulnerability research; there is _always_ something at that difficulty level to grind through, and you come out knowing random new things.