Today I learned that there is a specific #unicode "record separator" symbol, formally known as "U+001E Information Separator Two".

https://codepoints.net/U+001E

It is meant to be used to indicate a separation between two units of information. An example of where this could be used is in a separated-value file, e.g. a CSV, but using this symbol instead of a comma.

This is interesting because there are vanishingly few instances where the record separator symbol would appear in most contexts, but many instances where a comma appears. Using this symbol instead of a comma (or a semi-colon, or an exclamation point, or any one of the usual separators) could make some data hygiene scenarios much more straightforward.

U+001E INFORMATION SEPARATOR TWO*: ␞ – Unicode

␞, codepoint U+001E INFORMATION SEPARATOR TWO* in Unicode, is located in the block “Basic Latin”. It belongs to the Common script and is a Control.

Codepoints.net
@phrawzty We had this in ASCII and nobody knew about it!
@mhoye @phrawzty And yet somehow gettext used \u0004 END OF TRANSMISSION as a separator! (Which, because I learned about it from gettext, I also have)
@Jamessocol @mhoye @phrawzty if you’re designing a new format, please please use well-defined escaping or length-prefixing instead of trying to find a less common delimiter and hoping for the best