I'm on a Mac, where all filesystems are UTF-8. I want to clone a #git repo which has ISO-8859 filenames which are not valid UTF-8 - https://github.com/IanDarwin/OpenLookCDROM. Is there any way of doing that which will translate filenames back and forth on the fly when I `git pull`?

I worked around it by creating a #ZFS dataset with `utf8only=off`, cloning onto that, and manually renaming the two problematic files, but that obviously leaves my copy different from origin so I can't cleanly pull.

#CharacterEncoding

GitHub - IanDarwin/OpenLookCDROM: Final resting place for an archive of the historic artifact "OPEN LOOK and XView CD-ROM"

Final resting place for an archive of the historic artifact "OPEN LOOK and XView CD-ROM" - IanDarwin/OpenLookCDROM

GitHub

Auch wenn Digitalität beim #DOT2026 eher marginalisiert wird, ist zu meiner Freude heute mein seit langem im Publikationsprozess steckender CLIO Guide zur Digitalisierung des Kulturerbes der Gesellschaften des Globalen Südens online gegangen. Wer etwas über die repräsentative Macht monolingualer Infrastrukturen, Zeichenkodierungen, Umschriften, Katalogen als historische Quelle, Schattenbibliotheken etc. etc. und das ganze auch noch am Beispiel arabischer Periodika erfahren möchte: https://doi.org/10.60693/p46s-8j72

#multilingualDH #epistemicViolence #characterEncoding #الصحافة_العربية

«Unicode is good. If you’re designing a data structure or protocol that has text fields, they should contain #Unicode characters encoded in #UTF8. There’s another question, though: “Which Unicode characters?” The answer is “Not all of them, please exclude some.”

This issue keeps coming up, so [ @paulehoffman and @timbray ] put together an individual-submission draft to the IETF and now (where by “now” I mean “two years later”) it’s been published as #RFC9839. It explains which characters are bad, and why, then offers three plausible less-bad subsets that you might want to use.»

https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839 by @timbray

#programming #CharacterEncoding #LML

RFC 9839 and Bad Unicode

ongoing by Tim Bray
Why is there a "small house" in IBM's Code page 437?

There's a small house in the moddle of IBM's Code Page 437. Why?

GlyphDrawing.Club -blog

Like other computing and network systems developed at Xerox, Interlisp-D supported XCCS (Xerox Character Code Standard), a 16-bit character encoding released in the 1980s. XCCS predated and influenced Unicode.

This is version 2.0 of the standard:

https://github.com/Interlisp/medley/blob/master/unicode/xerox/Xerox%20Character%20Code%20Standard%20Version%202.0%201990.pdf

#CharacterEncoding #xerox #retrocomputing

medley/unicode/xerox/Xerox Character Code Standard Version 2.0 1990.pdf at master · Interlisp/medley

The main repo for the Medley Interlisp project. Wiki, Issues are here. Other repositories include maiko (the VM implementation) and Interlisp.github.io (web site sources) - Interlisp/medley

GitHub
Monitoring Bee Health

Find out why BeeHero Pollination Insight Platform is one of the best inventions of 2024

Time
https://lbcone.hkust.edu.hk/booktalk/?p=211
our next talk at #HKUSTLibrary , fully looking forward to it and feeling excited to meet professor Lu Qin #cjk #CharacterEncoding #asiancharacters
Chinese Character Encoding beyond Borders: A Story of Challenges and Solutions | HKUST Library Book Talk

I really love @dylanbeattie's talks.

I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

https://youtu.be/gd5uJ7Nlvvo

#UTF #PlainText #CharacterEncoding #PikeMatchbox

Plain Text - Dylan Beattie - NDC Copenhagen 2022

YouTube
If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

...
While part of me wants to find out why this odd character encoding situation crashes #perl, another part of knows that #characterencoding stuff is a big pit of misery and suffering and wasted time/life that you will never get back, so I'm just treating that crash as another way to debug tags in vorbis files.

The oddest mystery is that someone managed to get a string without a COMMENT type container into a #vorbis meta data block, that's impressive, you really have to try to do that!