Mastodawn

Tekniquelly correct Mar 7

Whoa. UTF-8 is older now than ASCII was when UTF-8 was invented.

Show thread

Farce Majeure Mar 7

@tek and it still sucks

Show thread

Tekniquelly correct Mar 7

@vathpela Awww, I like UTF-8! I can pretend it's ASCII most of the time.

Show thread

Farce Majeure Mar 7

@tek I have complaints about recoverability on a mildly corrupted bitstream, but it's much too late in the evening to articulate this well.

Show thread

Мя ��

@vathpela IMHO, redundancy and/or checksums should be implemented on different layer, not in text encoding

Like, there's many, many ways to keep bits from corrupting, which are applicable in different cases
And forcing one particular inside of text encoding itself is...meh

Same for compression btw. For some texts (CJK in particular) UTF-8 is sub-optimal, but even basic deflate makes it compact enough

TL;DR: UTF-8 is not perfect, but having one encoding for every text outweighs

@tek

Show thread

Tekniquelly correct Mar 7

@mo @vathpela Also, UTF-8 is trivially easy to synchronize. If you delete a byte out of the middle of a file, at most you’ll lost the one affected character (well, code point). The ones before and after it will be fine. That’s not true of some other Unicode encodings, like double width ones where everything after would be out of sync.

Show thread

root42 Mar 7

@tek This! UTF-8 is a great encoding. Unicode can be a mess at times though. :)

Show thread

Mans R Mar 7

@mo @vathpela @tek Variable length encoding adds a little complexity at the input and output stages, but I think the benefits outweigh that, especially the 8-bit compatibility that allows a lot of software to work (at least to some extent) unmodified.