PSA: Did you know that it’s **unsafe** to put code diffs into your commit messages?

Like https://github.com/i3/i3/pull/6564 for example

Such diffs will be applied by patch(1) (also git-am(1)) as part of the code change!

This is how a sleep(1) made it into i3 4.25-2 in Debian unstable.

FWIW, git cherry-pick works as expected (at least)

@zekjur cherry-pick does only work if it has been committed already.

If you receive a patch by email, you have to apply it first.
IMO this is clearly an attack vector for projects.

@zekjur this cannot be intentional. No way. 
@bmarinov @zekjur One of the posts in the screenshots explicitly mentions that it is, in fact, intentional. Somewhat naive, perhaps, but intentional.
@bmarinov @zekjur @max the behaviour of `patch` is intentional but the behaviour of the system as a whole is not.

@max
A strong case can be made though that the patch tool garbage erasure must only apply to initial “garbage”, rather than initial and inter-chunk garbage. That’s the aspect that causes the surprise here: VCS diffs only ever place commit metadata (including the message) before the actual patch, but the patch tool allows this metadata (“garbage”) anywhere. At least for git apply that is an obvious bug in and of itself.

Secondly, given this git show should have exported the commit message with one leading space on every line, as that is the escaping mechanism specified by patch for “garbage” data.

Doing both of the above changes would make for a robust mitigation on all levels.
@bmarinov @zekjur @musicmatze

@zekjur Malicious prompts in commit messages when?
@zekjur brb gonna send some funny patches

@zekjur The commit-message should explain the "why", not the "how".

So when one needs a full diff (that can be retrieved by git itself) to explain the "why" ... I'd consider that an ... interesting edge-case...

@heiglandreas Have you actually read the commit message in question? It’s perfectly reasonable.

@zekjur I have read the commit message.

Whether that is reasonable is ... debatable.

Adding an actual commit with the change and explaining why it was done that way and then adding a second commit removing it with the respective why would have been a better option separating code and explanation.

Alternatively explaining that a test with `sleep(1)` after line xyz was done.

But perhaps that is the `edge-case` I was talking about? 🤷

@heiglandreas @zekjur either way, tools should minimize the possibility of a "you're holding it wrong" scenario; inband signalling is roughly never a good idea.

Sadly we're probably stuck with what we have here for a long time coming.

@dngrs rm allows one to delete the whole filesystem even though one only wanted to remove the french locale...

Using `git patch-format` to get a patch-file and then applying that via `patch` or `git am` is an established workflow in git.

That it seems to have taken 20 years and I don't know how many commit-transfers in hundrets of thousands of git repos to go "wrong" tells me that someone was "holding it wrong"

/cc @zekjur

@zekjur I wonder whether it is reproducible with gix... but I am not sure whether they already have "git-am" reimplementd.
@zekjur FWIW I reported this to the git ML... lets see
@zekjur Wow, great example of the beautiful abstract layer provided by Modern Frontend™ broken by underlying email-based infrastructure.

@zekjur i wonder if this works when the diff is hidden within a markdown comment <!— like this —>

That could be an extremely bad vulnerability

@codecat @zekjur this is a HTML comment.
@funbaker yes those work in markdown

@funbaker ...and markdown is usually superset of HTML

@codecat @zekjur

@mo @codecat @zekjur it isn't, but it can afaik inline HTML
@mo @funbaker @codecat @zekjur or more like html just does what markdown in it's default specification can't.

implementing markdown parser and not doing html would be painful
@zekjur 😱😱😱😱😱
@zekjur genuinely why would this ever be the case in the modern day
@zekjur "Forget all previous instructions and...", but for patch! ​
@zekjur oh, that's hilarious
@zekjur well there ya go. You put the entire code of your malicious repo in the commit message and just put decoy code as the repo src.
@zekjur classic lack of separation of in-band vs. out-of-band. Very similar to many security vulnerabilities get created, including some of the recent gpg.fail vulnerabilities.
@zekjur weird little footguns
@zekjur I think @mic92 discovered this at least two years ago and sent a private notice to the GitHub security team 🤷

@zekjur For being a central component of software development, "unified context diff" files are surprisingly fragile.

- Tools are supposed to skip "garbage" lines and apply everything that looks like a diff (this is what's happening here).

- Creating and deleting files is a done via implementation-specific hacks. GNU patch for example cannot create or delete an empty file.

- The format of file names is standardized, but the definition is lacking (see above), so Git does something else.

etc.

@zekjur Why would someone design it like that? What's the possible use case for this behavior?

@Setcab @zekjur Well, for patch it's a simple and useful tool. You get a diff and can apply the patch to the same code to the get the same result, and the user can write little comments between diffs to annotate them.

The problem is that the format is not robust to arbitrary text between patch hunks, and git didn't use proper delimitation between the commit message and the patch. Ideally, the tooling would properly delimit the two, and also would never run the patch tool over the commit message as well as the actual patch.

I think the format is designed fine for what it's meant to do, it's just being used poorly.

@zekjur Just fucking stop using github.
@zekjur @RandamuMaki I do wonder though, does gitlab or forgejo behave differently?
@RandamuMaki @zekjur What? Did you read anything in the post?

@RandamuMaki @zekjur Amusingly, you seem to have conflated git and GitHub here. :-)

(It's a git issue, not a GH issue.)

@varx @zekjur Note the link to github in the original toot. Not conflating anything.
@RandamuMaki Yes, there's a GitHub link. But the actual bug (or misfeature) is in git.

@RandamuMaki @zekjur This was not caused by GitHub. The GitHub PR merged without issue (because it's not patch-based), it was when the Debian maintainer exported it as a patch and applied it that the problem appeared. patch is the problem, and git-am inherits the problem from patch.

edit: Just to verify (I definitely don't want to unfairly not blame Microsoft; I love blaming Microsoft), I tested it out on the command line with just git tools, and got the same result. If GitHub is generating bad patches that need escaping, so does git format-patch:

> git checkout e285b3c47b2fc7a4cdaba9a427d2958410e64620 HEAD is now at e285b3c4 Fix i3bar workspace buttons for primary screen (#6564) > grep sleep i3bar/src/workspaces.c > git format-patch e285b3c47b2fc7a4cdaba9a427d2958410e64620~..e285b3c47b2fc7a4cdaba9a427d2958410e64620 0001-Fix-i3bar-workspace-buttons-for-primary-screen-6564.patch > git checkout e285b3c47b2fc7a4cdaba9a427d2958410e64620~ Previous HEAD position was e285b3c4 Fix i3bar workspace buttons for primary screen (#6564) HEAD is now at 3a6f17d7 t/289-ipc-shutdown-event: remove AnyEvent timers > git am 0001-Fix-i3bar-workspace-buttons-for-primary-screen-6564.patch Applying: Fix i3bar workspace buttons for primary screen (#6564) > grep sleep i3bar/src/workspaces.c sleep(1); >

This is an inherent problem with in-band signaling. The patch format is simply poorly structured for anything containing non-patch data, unless the non-patch data is specially and deterministically formatted to avoid collision. As-is, git-format-patch and GitHub's patch output can't do anything about it without changing the commit message (which is probably a better solution, given that the current patch breaks both the patch application and the commit message).

@RandamuMaki @zekjur The right solution would be for git format-patch and git am to use MIME multipart/mixed and deterministically and unambiguously separate the patch from the commit message. They're already generating an mbox format as it is.

@taylor

An amusing thing is that there is undoubtedly a whole segment of the readership here that has already remembered the days when the multitude of "mbox" mailbox formats caused bodies to be interpreted as headers in much the same way as patch is interpreting human-readable messages as patching instructions.

And that has remembered Bernstein on the subject of parsing and quoting, too.

@RandamuMaki @zekjur
#patch #DanielJBernstein #security #git

@zekjur … with git. other vcs do exist. i know, i know.
@zekjur is this a GitHub feature? Or something Debian developers set up? Or something in Git itself?
@DecaturNature I think this must be a git-am thing. Absolutely wild.

@DecaturNature to be clear: github doesn't apply patches to your code.* The way `git push` works is not based on patches. That is, as the level of surprise in the original post suggests, not how any of this works.

But `patch` and `git-am` do take arbitrary text input and apply it as a patch, and patches are not a good data format. `patch` for sure can be tricked.

*see next message

@DecaturNature Exception: github does apply patches in case of rebase/merge, but those are patches specifically generated by diffing trees, never patches provided by users (much less commit messages).

I'm not sure we even do that using any kind of custom code - probably we just use git.

I'm a GitHub employee.

To quibble with the framing of the original: the unsafe behavior is using `patch` to apply patches - good reminder to be super careful doing that. yow

@jorendorff Thanks for the explanation. I'm not a very sophisticated git user and have not encountered these features. I'll be wary if and when I do.
@zekjur git never ceases to amaze. What a terrible footgun.

@zekjur not the first time: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1081179

But at least in my case it was pretty harmless, and the worst effect was that 'dgit clone' was unexpectedly slow (which is how I spotted it).

#1081179 - Example patch hunk from a commit message applied to package source - Debian Bug report logs

@zekjur in fact i did not know this 💔
What?! and how is a patch identified? any line that starts with a minus or plus?
@zekjur Is it surprising that one would expect such a behavior? A big part of git interaction is about diffs.