📰 技育祭で引いた文字化けおみくじを解読してみた (👍 29)

🇬🇧 Decoding a garbled fortune slip from Geek Festival 2026 - a fun character encoding adventure
🇰🇷 기술 축제에서 받은 깨진 문자의 오미쿠지를 해독한 재미있는 도전기

🔗 https://zenn.dev/toramutton/articles/garbled-omikuji

#CharacterEncoding #Debugging #Tech #Zenn

技育祭で引いた文字化けおみくじを解読してみた

Zenn

I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - https://github.com/IanDarwin/OpenLookCDROM. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

#CharacterEncoding

GitHub - IanDarwin/OpenLookCDROM: Final resting place for an archive of the historic artifact "OPEN LOOK and XView CD-ROM"

Final resting place for an archive of the historic artifact "OPEN LOOK and XView CD-ROM" - IanDarwin/OpenLookCDROM

GitHub

Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: https://doi.org/10.60693/p46s-8j72

#multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

«Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839 by @timbray

#programming #CharacterEncoding #LML

RFC 9839 and Bad Unicode

ongoing by Tim Bray
Why is there a "small house" in IBM's Code page 437?

There's a small house in the moddle of IBM's Code Page 437. Why?

GlyphDrawing.Club -blog

Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

This is version 2.0 of the standard:

https://github.com/Interlisp/medley/blob/master/unicode/xerox/Xerox%20Character%20Code%20Standard%20Version%202.0%201990.pdf

#CharacterEncoding #xerox #retrocomputing

medley/unicode/xerox/Xerox Character Code Standard Version 2.0 1990.pdf at master · Interlisp/medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources) - Interlisp/medley

GitHub
Monitoring Bee Health

Find out why BeeHero Pollination Insight Platform is one of the best inventions of 2024

Time
https://lbcone.hkust.edu.hk/booktalk/?p=211
our next talk at #HKUSTLibrary , fully looking forward to it and feeling excited to meet professor Lu Qin #cjk #CharacterEncoding #asiancharacters
Chinese Character Encoding beyond Borders: A Story of Challenges and Solutions | HKUST Library Book Talk

I really love @dylanbeattie's talks.

I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

https://youtu.be/gd5uJ7Nlvvo

#UTF #PlainText #CharacterEncoding #PikeMatchbox

Plain Text - Dylan Beattie - NDC Copenhagen 2022

YouTube
If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.