I'm conflicted about how I should implement UTF8 support for strings in my language, #ArkScript
There seem to be two options:
1. Every string is UTF8, thus every access to a char is O(n) and not O(1) anymore (have to decode the codepoints to count them). Length is O(n) too. That pretty much pessimizes all strings, even ASCII ones, but makes working with UTF8 codepoints easier
2. Every string is just a series of bytes, as it is right now, and a (@ string index) returns a potentially invalid character (on 8 bits). Indexing and length are O(1), but we need a function to get the codepoints, like (string:codepoints str) or (string:graphemes str) or something else
3. third hidden option that I want to avoid and that doesn't really count: introduce another string type that's different from normal strings. That's bad because the C++ API will be impacted, and the internals will need to handle all the different string types
At first, I thought option 1 was better because then everything is easy, since the language is high-level. But now I lean toward option 2 because UTF8 support won't hinder the performance of programs that don't need it, and doing such a thing should be intentional

