RFC 9839 and Bad Unicode

ongoing by Tim Bray
Ah, RFC 9839: The quest for the "less bad" #Unicode #subset 🤦‍♂️. Because, clearly, specifying which Unicode characters to avoid is humanity's greatest achievement since sliced bread. Tim Bray and Paul Hoffman bravely venture into the Unicode abyss, emerging with a groundbreaking revelation: not all characters are created equal. Who knew? 🙄
https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839 #RFC9839 #LessBad #TimBray #PaulHoffman #UnicodeAbyss #HackerNews #ngated
RFC 9839 and Bad Unicode

ongoing by Tim Bray
RFC 9839 and Bad Unicode

ongoing by Tim Bray

«Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839 by @timbray

#programming #CharacterEncoding #LML

RFC 9839 and Bad Unicode

ongoing by Tim Bray