Imagine the following situation: your company receives a ZIP file with an invoice, and you're the person responsible for checking if all the details are correct, before sending it off to the payment department. You open the archive, and there's a single PDF inside. You view it, and all the details match—your company's details, seller's company's details, items and total amount are what's expected, and even the bank account number is the same as on previous invoices from this company. As everything looks good, you forward the ZIP with the invoice to the payment team, and move onto reviewing other incoming invoices.

A few days later you receive the same invoice again, but you already have it in the system. Just in case you reach out to the payment department whether it's been paid, and they confirm it has—great, no action required.

Another month passes by, and you get a "payment due" reminder. What's this? You remember it being paid already, so what gives. You ask the payment team, they again confirm the invoice was settled. You phone the seller about this, but they say they received nothing. So you head down the hall to the payment department, you open the invoice on your laptop, and start going through the details with them. But what's this? The destination account number and amount in the wire transfer and the invoice don't match! The payment team manager's face gets a bit red—seems like it was their mistake? But no! They show you the invoice, and the amount and account number match the actual payment... but it doesn't match what you see on your screen! How can this be?

Both of you re-download the ZIP archive from the email you've forwarded and open the PDF inside. And there it is—you see two different invoices. What in the world is happening?

Immediately you report it up the chain, and your boss's boss gets a pair of IT forensics consultants on the job. They investigate, and later you learn that your company has been scammed with a pair of different invoices hidden inside a schizophrenic ZIP file. This means that you—on your work laptop running a certain software stack—saw and approved the correct invoice. But the payment team—running a different software stack—saw the fake invoice inside the ZIP, which they thought was what you had approved. Even later on you find out that the seller's company has been partially compromised and a lot of their customers got fake invoices. But that's water under the bridge at that point, and the money your company transferred is long gone.

Technical details → https://hackarcana.com/article/yet-another-zip-trick

@gynvael I didn't think I'd see TOCTOU used against human checks and human use

@gynvael wait, so how many different ways of reading zipfiles do we know so far?

- from the start, only LFHs
- from the end, find last* EOCD use CDH size
- from the end, find last* EOCD, use CDH offset
- from EOF-65557, find first EOCD , use CDH size
- from EOF-65557, find first EOCD
- from start, find first CDH

but any of the 5 that deal with CDH also have to deal with redundancy between CDH and LFH, so x2

that'd be 11?

*could be inside a comment

@wolf480pl @gynvael not sure if it'd call this TOCTOU — it's more like a parser differential

(yeah I'm probably overthinking your joke...)

@mei @gynvael no i mean

If instead of parsing it once, checking the parsed form, and then immediately using it, or, alternatively, serializing it into a fresh file that contains only verified data and can be used later

If instead of doing that you parse the input file at two different times, once to check and once to use

That's called TOCTOU, right?

@wolf480pl @gynvael usually TOCTOU means that the attacker has the capability to change the thing you're checking. here it doesn't change, you're just unknowingly looking at it differently

@mei @gynvael
hmm ok, i guess I broadened the definition in my headcanon too much.

But still, you defend from both the same way, right? You only read/parse the input once?

@mei @gynvael
and in this case, you'd extract the zip, open the invoice, check it, screenshot it, and send the screenshot to the payments department?
@wolf480pl @gynvael @mei Given email supports multiple attachments, simply banning the use of zip files for transfers of few files (<=10 attachments is more than reasonable) or any number of invoice files would be the proper step (a policy change, instead of technical).

I'd considered technical changes to prevent this (without spec changes) but that assumes everyone uses non-broken software (lol) that does verify all redundant fragments & fields (requiring new program flags to handle files from unreliable storage/transfer) and still leaves the issue of update/deletion/hiding being a valid operation by spec (there is no explicit user-visible versioning).

@lispi314 @mei @gynvael
I think you're taking this too practically.

My proposed solution wasn't meant to be a serious proposal, but an example of applying the general principle of TOCTOU-resistant systems to the system of two humans who deal with invoices.

Banning zip files only protects from ambiguous zip files. One day someone may come up with a way to make an ambiguous PDF thay displays differently in pdf.js than in Adobe Reader, and then what?

@wolf480pl @gynvael @mei PDF is indeed a problematic format for a number of reasons. Its specification has some major issues and it should probably be deprecated.

djvu could somewhat substitute in the meantime, though something else meant to actually fill in PDF's role but doing it right would be preferable.
@mei @gynvael @wolf480pl > My proposed solution wasn't meant to be a serious proposal, but an example of applying the general principle of TOCTOU-resistant systems to the system of two humans who deal with invoices.

Yeah, I did notice the cached-rendering/resolution. For static data that is appropriate, though it ultimately relies on the existence of data with a non-broken parser.

The re-creation of the data, I think, also potentially moves some of the liability onto the converting user.
@lispi314 @mei @gynvael the whole point is that the re-creatong user is the one with knowledge and authority go tell which data is good. The goal is that the second person will only receive data that the verifier person understood, so any hidden bits that the attacker can manipulate without the verifier's knowledge don't get transmitted
@wolf480pl @gynvael @mei I suppose that is one way of viewing it.

I simply prefer the option of a more static & reliable format where, if it is corrupt, liability and blame for the mistake (or malicious action) falls unambiguously to the external provider.

In this case though, why not have the re-creator just file the data in some internal form/filling system (submitting said form to payments)?
@lispi314 @mei @gynvael yeah, that's a more faithful imolementation of that idea
@gynvael Has this scam actually been pulled off? I guess it would require the scammer to know which software stack the approver is using and which one the payment department is using.
@matt @gynvael would DNS dumpster (SaaS) and LinkedIn profiles be enough?
@gynvael This is really cool, thanks for making a write-up like this!

it reminds me of a similar attack I've seen used (mom fell for it) that isn't nearly as technical:

You, end consumer or owner of a very small business, make a purchase that will be paid with an invoice (in Brazil terms, a "Boleto Bancario") in your email. You open the PDF on your browser, since it is where you check email and browsers do that, and it takes a second to load (or may even flash), but all the readable data is correct, so you scan the barcode or use the numbered code and make the payment. Days later you receive an email that the payment never happened and the purchase was cancelled.

What happened was a rogue browser extension, that would identify when a PDF of an invoice was being opened, found the correct parts of the code that identify the bank account and overwrite that with the attacker's bank account, then generates a new barcode with the new code. That would happen to any and all invoices opened on the browser, but if the PDF was downloaded and opened on an independent program, the PDF would be correct.

I only diagnosed it because I saw it flashing, and eventually found the extension (it was a decade ago or so, if this is still around it will have a completely different name). Since end consumers aren't likely to think to check, or maybe even know how to check, the bank account that things are going to, this can be quite a hard thing to spot.
@gynvael at first I thought this is a metaphor for the ZUGFeRD format :-)
@gynvael Not a fan of the usage of the word "schizophrenic" to describe something being done maliciously. I think "two-faced" or "double agent" would work better in this specific context.
@miru @gynvael seconding this

@whitequark @miru @gynvael it refers to the ZIP having (at least) two aspects or 'personalities', and they agree it is not ideal:

A schizophrenic file is a file that is interpreted in at least two different ways by two different parsers. The name, as Wikipedia likes to put it (in an unrelated article), is a "metaphor with the public confusion of dissociative identity disorder with the psychiatric diagnosis of schizophrenia", so personally I prefer to refer to it as a multiple personality files or multiple personality disorder file, but I think the original name stuck.

Schizophrenia (object-oriented programming) - Wikipedia

@NotThatDeep @gynvael @miru I am aware of what it refers to, and I think all of these names are equally terrible
@NotThatDeep @gynvael @miru like the fundamental issue is that it paints a (unjustifiedly maligned) psychiatric condition as being inherently linked to deception or malice. _which_ condition you pick doesn't matter here. if you called it "bipolar ZIP files" it would be just as bad

@whitequark
In the previous century, this would have been simply labeled a trojan

We don't really need a bunch complicated (and inaccurate) names

What next? Is the ghost of Linneaus going to appear and create a taxonomy of exploits?
@NotThatDeep @gynvael @miru

@whitequark @gynvael @miru i didn't mean to endorse anything, just wanted to point out the author doesn't endorse the name

@gynvael Ehm I'd say this is a deeply flawed process and a proper process would have prevented this attack vector entirely.
Also this reminds of typical TOCTOU vulnerabilities.

In most companies the payment information gets entered into the accounting software (at most a "ISO 32000-1" compliant PDF/A version of the initial bill is attached), then it is sent for approval and the thing you approve is the info in the system not in the pdf, and once confirmed what is in the system gets booked

@gynvael Fun, tested the zip file with WinRAR 7.12 (sees size_of_central_directory.txt) and Windows 11 24H2 Explorer (sees offset_of_start.txt), and neither complain about any problems.

Windows 11 24H2 tar (bsdtar 3.7.7 - libarchive 3.7.7 zlib/1.2.13.1-motley liblzma/5.4.3 bz2lib/1.0.8 libzstd/1.5.5) complains though:

E:\Temp>tar tvvf offset_or_size.zip
tar.exe: Damaged Zip archive
Archive Format: ZIP, Compression: none
tar.exe: Error exit delayed from previous errors.

Midnight Commander on Linux first shows an error: EXTFS virtual file system: uzip (list): /usr/bin/unzip failed - non-zero exit status (1), but then opens the archive and shows size_of_central_directory.txt (running unzip -l shows warning [offset_or_size.zip]: 89 extra bytes at beginning or within zipfile and lists size_of_central_directory.txt).

@gynvael I mean the attack vector is the zip here, but it's utterly insane to me that people treat pdfs as static documents in the first place.

@gynvael

What is this 1998?
Any time I get a ZIP "financial" document it goes straight to the bin.
Even if it's sealed with the word of god.

@gynvael Many issues here
1) Scammer needs to have access to the original correct invoice which means that bona fide vendor was compromised .
2)Payment department should only pay to bank account in system, not based on invoice or should query if differs, regardless if the invoice was authorised
3) Adobe is a dodgy company that has a hold on pdf, that needs to change.

Overall, lack of controls in payer company allowed this invoice to be paid to wrong bank account.

@gynvael the zipped invoice should have been a gigantic blocking red flag. Compressed archives have long been known for RCE and other shit.
Thanks. That's a new one on me.
@gynvael Using medical terms like "schizophrenic" or "multiple personality" figuratively (outside of medical contexts) is insensitive because it trivialises the challenges those people face.