On this day seventeen years ago I learned something about locales and Turkish that has scarred me for life: I is not always uppercase i!

https://daniel.haxx.se/blog/2008/10/15/strcasecmp-in-turkish/

strcasecmp in Turkish

A friendly user submitted the (lib)curl bug report #2154627 which identified a problem with our URL parser. It doesn't treat "file://" as a known protocol if the locale in use is Turkish. This was the beginning of a minor world-moving revelation for me. Of course this is already known to mankind and I'm just behind, … Continue reading strcasecmp in Turkish →

daniel.haxx.se
@bagder I happened to see this post just yesterday, detailing how Kotlin was bitten by this multiple times! https://sam-cooper.medium.com/the-country-that-broke-kotlin-84bdd0afb237
The Country That Broke Kotlin

Logic vs language: how a Turkish alphabet bug played a years-long game of hide-and-seek inside the Kotlin compiler

Medium
@Tenzer it probably won't be the last team chasing for this either, it's such a weird thing!
@bagder @Tenzer IIRC one of the OpenSSL 3.x releases was also affected by this. We carry two versions of OpenSSL's casecmp symbol because of this.
@bagder There was a great talk at Pycon UK this year about similar issues people don't know when dealing with strings in different languages (notably Norwegian): https://www.youtube.com/watch?v=3MmfY5UIquk
PYCON UK 2025: Why len 4 and other weird things you should know about strings in Python,

YouTube
@jake The Norway problem(s) keeps getting more involved 😔
@bagder
couldn't you just temporarly set the locale to POSIX, do the comparison, and then set it ba... oh, wait, threads exist and libc's locale is a global per-process state :(
@wolf480pl and not the least: that would probably be way slower
@wolf480pl @bagder best would probably be strcasecmp_l() which has been in posix since… oh wow 2008 too.

@bagder Also featured in this brilliant talk from Dylan Beattie

https://www.youtube.com/watch?v=ajfb5LSbQVM

There's No Such Thing As Plain Text • Dylan Beattie • YOW! 2023

YouTube
@bagder @adrianco oh, this always resurfaces, invariably as something really obscure that you can't replicate.
Once you've fielded one escalation from it you'll know to never use the system locale for case conversion or comparison -and to be strict about that in code reviews.

@bagder
I don't know if it was from you but few years ago I've read an interesting article on false assumptions regarding languages (ie. In each culture people have a name and surname, people's name cannot have non letters symbols in theirs, etc...) where this inconsistent lowercase to uppercase matching was listed.

I cannot find it anymore but it was enlightening.

@bagder As a linguistic typologist and language technologist this was part of my work to inform people about for six years.