Three small announcements:
1. RFC 9839, a guide to which Unicode characters you should never use: https://www.rfc-editor.org/rfc/rfc9839.html
2. Blog piece with background and context, “RFC 9839 and Bad Unicode”: https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839
3. A little Go library that implements 9839’s exclusion subsets: https://github.com/timbray/RFC9839

#Unicode

RFC 9839: Unicode Character Repertoire Subsets

This document discusses subsets of the Unicode character repertoire for use in protocols and data formats and specifies three subsets recommended for use in IETF specifications.

@timbray Great work! You say the library is well tested, are there test vectors available somewhere? I would like to see support for identifying Bad Unicode as you specify added to https://www.gnu.org/software/libunistring/ and if I were to start working on that, I would like to see good test cases.
libunistring - GNU Project - Free Software Foundation (FSF)

@jas check out the unit tests in the repo, unichars_test.go - probably not in the right format for you, but the coverage is pretty exhaustive.