Well this is FUCKING annoying!

On the top one the diaeresis is unicode 0x61. On the bottom one, the diaeresis is unicode 0xE4.

Trying to paste one into where the other gets written doesn't seem to fix it.

#unicode #unicodeFail

Fuck it, I'm latinizing the urls on the GDB translations. This is just too unreliable between operating systems. #unicode #unicodeFail
@twipped Why not just normalize them to NFC rather than Latinizing them?
@krans These are file names in a git repo, and I just don't trust all the various operating systems and input methods that contributors may be using.

@twipped Huh, that's extremely disappointing.

N.b. that NFC is required for identifiers in C++23 and later, so people writing source code with dubious input methods are going to have a bad time.

Filesystems are sometimes a crapshoot. If you're worried, avoid bytes >= 0x80.

@krans I'm honestly surprised it hasn't bit me before. I considered latinizing the urls when the first translations came in (especially the Mandarin), but it _seemed_ to be working fine with Cloudfront, so I didn't worry about it much. Most likely either the browser or the ingress was normalizing the characters.

Now I'm rewriting the static site generator and ran face first into it, because the urls in the layout json didn't match the file names.

@twipped Is it possible to make the translations use exactly the same file names as the English files they are translated from?
@krans oh absolutely, but that feels even worse

@twipped I ask because some scripts are just not feasible to Latinize in the way you suggest. Arabic and Devanagari are examples. I honestly would use a build script that checks for NFC-ness and instructs the developer to obtain a working filesystem, but that's just me.

Thank you for caring about these details, by the way 💚