Well this is FUCKING annoying!
On the top one the diaeresis is unicode 0x61. On the bottom one, the diaeresis is unicode 0xE4.
Trying to paste one into where the other gets written doesn't seem to fix it.
Well this is FUCKING annoying!
On the top one the diaeresis is unicode 0x61. On the bottom one, the diaeresis is unicode 0xE4.
Trying to paste one into where the other gets written doesn't seem to fix it.
@twipped Huh, that's extremely disappointing.
N.b. that NFC is required for identifiers in C++23 and later, so people writing source code with dubious input methods are going to have a bad time.
Filesystems are sometimes a crapshoot. If you're worried, avoid bytes >= 0x80.
@krans I'm honestly surprised it hasn't bit me before. I considered latinizing the urls when the first translations came in (especially the Mandarin), but it _seemed_ to be working fine with Cloudfront, so I didn't worry about it much. Most likely either the browser or the ingress was normalizing the characters.
Now I'm rewriting the static site generator and ran face first into it, because the urls in the layout json didn't match the file names.
@twipped I ask because some scripts are just not feasible to Latinize in the way you suggest. Arabic and Devanagari are examples. I honestly would use a build script that checks for NFC-ness and instructs the developer to obtain a working filesystem, but that's just me.
Thank you for caring about these details, by the way 💚