Mastodawn

Jocelynephiliac

Jun 15, 2024

Well this is FUCKING annoying!

On the top one the diaeresis is unicode 0x61. On the bottom one, the diaeresis is unicode 0xE4.

Trying to paste one into where the other gets written doesn't seem to fix it.

#unicode #unicodeFail

Show thread

Jocelynephiliac

Jun 15, 2024

Fuck it, I'm latinizing the urls on the GDB translations. This is just too unreliable between operating systems. #unicode #unicodeFail

Show thread

Peter Brett Jun 15, 2024

@twipped Why not just normalize them to NFC rather than Latinizing them?

Show thread

Jocelynephiliac

Jun 15, 2024

@krans These are file names in a git repo, and I just don't trust all the various operating systems and input methods that contributors may be using.

Show thread

Peter Brett Jun 15, 2024

@twipped Huh, that's extremely disappointing.

N.b. that NFC is required for identifiers in C++23 and later, so people writing source code with dubious input methods are going to have a bad time.

Filesystems are sometimes a crapshoot. If you're worried, avoid bytes >= 0x80.

Show thread

Jocelynephiliac

Jun 15, 2024

@krans I'm honestly surprised it hasn't bit me before. I considered latinizing the urls when the first translations came in (especially the Mandarin), but it _seemed_ to be working fine with Cloudfront, so I didn't worry about it much. Most likely either the browser or the ingress was normalizing the characters.

Now I'm rewriting the static site generator and ran face first into it, because the urls in the layout json didn't match the file names.

Show thread

Peter Brett Jun 16, 2024

@twipped Is it possible to make the translations use exactly the same file names as the English files they are translated from?

Show thread

Jocelynephiliac

Jun 16, 2024

@krans oh absolutely, but that feels even worse

Show thread

Peter Brett

@twipped I ask because some scripts are just not feasible to Latinize in the way you suggest. Arabic and Devanagari are examples. I honestly would use a build script that checks for NFC-ness and instructs the developer to obtain a working filesystem, but that's just me.

Thank you for caring about these details, by the way 💚