#PostgreSQL has relied on the OS to handle text sorting for most of its history. When glibc 2.28 shipped in 2018 with a major Unicode collation overhaul, every existing text index built under the old rules became invalid... but without warnings or errors.

#Postgres 17 added a builtin locale provider that removes the external dependency entirely:

initdb --locale-provider=builtin --locale=C.UTF-8

More details in this week's PG Phriday: https://www.pgedge.com/blog/what-is-a-collation-and-why-is-my-data-corrupt

#programming #data #dev #sql #code

What is a Collation, and Why is My Data Corrupt?

The GNU C Library (glibc) version 2.28 entered the world on August 1st, 2018 and Postgres hasn't been the same since. Among its many changes was a massive update to locale collation data, bringing it in line with the 2016 Edition 4 release of the ISO 14651 standard and Unicode 9.0.0. This was not a subtle tweak. It was the culmination of roughly 18 years of accumulated locale modifications, all merged in a single release.