Amazon web services is not yet fully year 1988 compliant
https://cgit.freebsd.org/src/commit/?h=stable/15&id=660a79ef4f1112c90690b56c6e5ac7532428ec8c
And I just learned that the beginning of Unicode is older than the first release of Linux™.
Qwen3.5 35B A3B had mixed Coptic script with ancient Greek, inspired from Phoenician. Honestly, if you are not an expert like Indiana Jones, it's impossible to tell if it is wrong. I have learnt something though:
Ϫ (minuscule : ϫ, U+03EA), called djandja (it sounds like danger meets Jumanji somehow) #language #unicode
It's not the languages that have the interaction with #Unicode. The vulnerability in the languages is their ability to take code that is constructed at runtime in string form and interpret and execute it.
It's the text editors, IDEs, and pagers (that display the commit diffs) hiding the #PrivateUseArea (and, yes, unassigned code point) characters by rendering them as zero width.
But quite a lot of them don't. The code snippet in the article doesn't actually look like the screenshot given.
In the likes of text editors such as NeoVIM and VIM, and pagers such as less, more, most, and console-tty37-viewer, these characters are either emitted as narrow-width glyphs, which at minimum displays as mystery strings of replacement characters, or turned into reverse video hexadecimal code point values.
Amazon web services is not yet fully year 1988 compliant
https://cgit.freebsd.org/src/commit/?h=stable/15&id=660a79ef4f1112c90690b56c6e5ac7532428ec8c
And I just learned that the beginning of Unicode is older than the first release of Linux™.
In Undertale werden Herzen als Symbol für Seelen dargestellt. Menschenseelen sind farbig und können in Unicode dargestellt werden: ❤️🩵🧡💙💜💚💛
Monsterseelen hingegen sind weiß und stehen auf dem Kopf. Es gibt ein weißes Herz-Emoji, aber das ist falsch herum: 🤍
Da die allermeisten Monster echt nett sind, fordere ich, dass auch Monsterseelen eine Unicode-Darstellung bekommen!
Oder gibt es die doch, und ich konnte sie nur nicht finden?
https://blog.strangerthanusual.de/blogposts/unicode_herzen_undertale_seelen
"Why does "👩🏾🌾" have a length of 7 in #JavaScript?"
A very nice analyse!
#utf16 #unicode
by @EvanHahn
https://evanhahn.com/javascript-string-lengths/
"The invisible #Unicode characters were devised decades ago and then largely forgotten. That is, until 2024, when hackers began using the characters to conceal malicious prompts fed to AI engines. While the text was invisible to humans and text scanners, #LLMs had little trouble reading them and following the malicious instructions they conveyed."
@JourneysInFilm
Sad was a standard #Emoji as of 2024’s release of #Unicode 16
It was by far the most interesting of a dull few
But some devices that haven’t been updated may not support any of them
https://emojipedia.org/harp
Supply-chain attack using invisible code hits #GitHub and other repositories
From the department of “how did I not realise this sooner?!”:
1> “\r”.count
$R0: Int = 1
2> “\n”.count
$R1: Int = 1
3> “\r\n”.count
$R2: Int = 1
Yes, Swift treats the two bytes “\r\n” as a single character. This is actually super convenient a lot of the time, because it means algorithms that look for line breaks with isNewline just work, even on “Windows”-style […]
https://wadetregaskis.com/rn-is-one-character-in-swift/I can only recommend to use #notepadpp and the little option at the top that makes non ASCII visible as coloured blocks with the name of that char within it e.g. <CR><LF>
All text editors and IDEs should have that option enabled and enable it by default.
That would effectively prevent these issues here...