Fun bug in #ZBar discovered while debugging a #SegNo (#Python #QRCode generator library) test failure on #Gentoo with #musl libc.
SegNo defaults to attempting to encode strings as ISO-8859-1 if possible. ZBar defaults to trying to decode them as Big5 first. Most of the time everything works fine.
Let's take a test string from ZBar: "Märchenbücher". When we encode it as ISO-8859-1, we're going to get two high-byte, low-byte sequences: E4 72 for "är" and FC 63 for "üc". The latter sequence maps to a "user defined" character in Big5, and therefore glibc refuses to convert it. However, musl converts it just fine. As a result, ZBar decodes the string as Big5, to "M酺chenb𡡷her".
You could argue that musl behaves wrong. However, note that the former sequence is valid in Big5. So if you shorten the string to just "Märchen", glibc would happily decode its ISO-8859-1 #encoding as Big5, giving you "M酺chen". And yes, if I put that test string into SegNo, I get a QRCode that reproduces the problem on a glibc system.
Does ZBar behave wrong here? Or perhaps SegNo should avoid ISO-8859-1 altogether, and use safer UTF-8 encoding?
https://bugs.gentoo.org/923233
https://github.com/heuer/segno/issues/134
https://github.com/mchehab/zbar/issues/281
