-13-
Splitting the Email Atom: Exploiting Parsers to Bypass Access Controls - Gareth Heyes
Understanding email addresses is hard. Why? While the first emails were sent in the '70s, to this day it's a wildly popular way to communicate. The form of how email addresses look has been continuously appended throughout the years.
This means that today, valid addresses are defined according to half a dozen different RFCs. Usually, that doesn't bother the average programmer, as servers implementing SMTP (most common email protocol today) will parse the address properly and route it to the correct destination.
The problem arises with the (relatively recent) rise of Single Sign-On (SSO). With this technology, you can sign into Website A using your account on Website B (you've probably seen a "Sign in with Google/Apple/Microsoft 365/..." option in the past).
In many cases, sites like Website A need to understand what organization the user belongs to — so they are correctly attached to and can access their organization.
The solution to this? Extract the domain part of the email address! This makes sense as it's a strong indicator for the user's organization.
However, now we might have a problem: there are two separate systems parsing the email address — the SMTP protocol (routing the address according to specifications), and a developer parsing the domain out of the email (probably with a copy-pasted regex and code).
Discrepancies between the two could easily lead to cases where a new email address is associated with a victim organization while being routed to an attacker-controlled domain — and this is what Gareth exploited. In the talk, he covered three discrepancies:
Unicode overflows: Adding a Unicode character to the email address whose least significant byte is a valid ASCII character you'd like to smuggle. Some implementations truncate the original Unicode byte, leading to validation bypasses.
Encoded-word: An interesting feature of email addresses, allowing inclusion of parts that are encoded with some charset. This feature even allows Base64 and UTF-7!.
Punycode: An algorithm for DNS to support domain names with special characters (even emojis). It switches and inserts characters at specific positions — a sure way to confuse parsers.
Using these three techniques along with some other niche features of email addresses, Gareth was able to access private GitLab Enterprise servers, Zendesk organizations, bypass cloud GitHub email verification, domain-protected Cloudflare instances, and even steal CSRF tokens from Joomla by embedding an XSS in registered users' email addresses!
To tie it up, Gareth focuses on methodology. It's not simple bypassing parsers you can't see, using so many different combinations of potentially vulnerable features.
The methodology is great in general for all research; I recommend watching the talk and taking away from it.