We have a CI job to spot unwanted utf8 letters in #curl PRs as we have noticed that GitHub will gladly show the for example (identical) Cyrillic version of a letter next to the Latin version in a diff and it is yes, entirely impossible for a human to spot the diff. I mean the diff is shown, but the significance of it is not.

Changing just a single letter like that in a URL hostname opens up for a world of grief.

@bagder That means that somebody actually sat down and browsed all the fonts to find the 2 characters which look exactly alike?

Just imagine these people would spend their time doing something productive for a change...

@Brokar there are actually lots of tools that do exactly that. Here's one: https://util.unicode.org/UnicodeJsps/confusables.jsp
Unicode Utilities: Confusables

@Brokar you don't need to browse all fonts when you have Cyrillic layout. They're literally on the same key, so you can even swap them by accident!

@bagder

That means that somebody actually sat down and browsed all the fonts to find the 2 characters which look exactly alike?
Рrеttу surе thеrе аrе tооls fоr thаt, аlsо it usuаllу is еnоugh just tо knоw thе оthеr sсriрt, nо nееd tо sсоur fоnts ;)

CC: @bagder@mastodon.social

@Brokar @bagder

Unicode themselves handles this problem: https://www.unicode.org/reports/tr39/#Confusable_Detection

Confusables become readily apparent for readiers of the non-English script when they learn English. Not much work involved finding them.

UTS #39: Unicode Security Mechanisms

@Brokar @bagder well, it's not only about fonts itself, but different unicode entries being equivalent as well.

Rendering Latin H just the same as Greek H - yet, another question/problem in hand.