chrismorgan

0 Followers
0 Following
5 Posts
https://chrismorgan.info/, [email protected]

I’m available for hire for consulting, training and mentoring in Rust, everything web, performance and more: https://chrismorgan.info/hire-me/

[ my public key: https://keybase.io/chrismorgan; my proof: https://keybase.io/chrismorgan/sigs/rEm41-0pghG3ITW8S5HeuFLA9TznvPSlg9NKgXXRXqQ ]
This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup

You need mechanisms to avoid the possibility. The mechanisms to do such things exist by default, by both the software provider (e.g. Proton) and the software distributor (e.g. Apple for App Store, Google for Play Store, Cloudflare or AWS for web stuff), and various countries have laws that allow them to secretly compel implementing specific backdoors.

In order to block the distributor from going rogue, you need to be able to guarantee that the user device can only install/run code signed by the provider, who must never give those keys to the distributor. My impression is that Android is the only major platform that ever had this, but that Google ruined it a few years ago in the name of lighter bundles by insisting that they hold the keys. (I once had VLC from Google Play Store, but replaced it with a build from F-Droid under the same app ID; Google Play Store shows it has an update for it, but that it can’t install it.)

In order to block the provider or distributor sending specific users a different build, you need something more like Certificate Transparency logs: make it so that devices will only run packages that contains proof that they have been publicly shared. (This is necessary, but not sufficient.)

And if you’re using web tech, the mechanisms required to preclude such abuse do not at this time exist. If you’re shipping an app by some other channel, it can do a resource integrity check and mandate subresource integrity. But no one does things that way—half the reason for using web tech is specifically to bypass slow update channels and distribute new stuff immediately!

I’m currently moving my personal VPS to Oracle Cloud (for a couple of reasons). The new machine’s host name is lawnmower. I have never been so decisive and satisfied in naming a computer.

OK, checked and Ruby does seem to use scalars. Well, unless you mess with encodings. Then it’s messy. So it’s probably better and worse than Python 3.

You may not have seen this interesting article before: https://hsivonen.fi/string-length/. I agree with its assessment that scalars are really pretty useless as a measure, and Python and Ruby are foolish to have chased it at such expense.

But seriously, I can’t think of any other popular languages that count by scalars or code points—it’s definitely not most languages, it’s a minority, all a very specific sort of language. “Most” encompasses well-formed UTF-8 (e.g. Rust), recommended UTF-8 but it doesn’t actually care (e.g. Go), potentially ill-formed UTF-16 (e.g. JavaScript, Java, .NET), and total-mess (e.g. C, C++).

> indexing by bytes instead of UTF-8 code units

When the encoding is UTF-8 (which it is here), the code unit is the byte.

They called the fields byteStart and byteEnd, but a more technically precise (no more or less accurate, but more precise) labels would be utf8CodeUnitStart and utf8CodeUnitEnd.

> unicode scalars, which most languages index strings in

Very few do. Of moderately popular languages, Python is the only one I can think of. Well, Python strings are actually sequences of code points rather than scalars, which is a huge mistake, but provided your strings came from valid Unicode that doesn’t matter.

Languages like Rust and Swift make it fairly easy to access your string by UTF-8 or by scalar.

Languages like Java and JavaScript index by UTF-16 code unit and make anything else at least moderately painful.

> This is somewhat of an unfortunate tech debt thing as I understand, and it was made this way mostly because of JavaScript, which doesn’t work with UTF-8 natively. But this means you need to be extra careful with the indexes in most languages.

I’m confused here. You established indexing is by UTF-8 code unit, then said it’s because of JavaScript which… doesn’t do UTF-8 so well? If it were indexed by UTF-16 code unit, I’d agree, that’s bad tech debt; but that’s not the case here.

Bluesky made the decision to go all in on UTF-8 here <https://docs.bsky.app/docs/advanced-guides/post-richtext#tex...>—after all, the strings are being stored and transferred in UTF-8, and UTF-8 is increasingly the tool of choice, and UTF-16 is increasingly reviled, almost nothing new has chosen it for twenty years, and nothing major has chosen it for ten years, it’s all strictly legacy. Hugely popular legacy, sure, but legacy.

Links, mentions, and rich text | Bluesky

Posts in Bluesky use rich text to handle links, mentions, and other kinds of decorated text.