Today I learned that there is a specific #unicode "record separator" symbol, formally known as "U+001E Information Separator Two".

https://codepoints.net/U+001E

It is meant to be used to indicate a separation between two units of information. An example of where this could be used is in a separated-value file, e.g. a CSV, but using this symbol instead of a comma.

This is interesting because there are vanishingly few instances where the record separator symbol would appear in most contexts, but many instances where a comma appears. Using this symbol instead of a comma (or a semi-colon, or an exclamation point, or any one of the usual separators) could make some data hygiene scenarios much more straightforward.

U+001E INFORMATION SEPARATOR TWO*: โž โ€“ Unicode

โž, codepoint U+001E INFORMATION SEPARATOR TWO* in Unicode, is located in the block โ€œBasic Latinโ€. It belongs to the Common script and is a Control.

Codepoints.net
@phrawzty We had this in ASCII and nobody knew about it!
@mhoye @phrawzty And yet somehow gettext used \u0004 END OF TRANSMISSION as a separator! (Which, because I learned about it from gettext, I also have)
@Jamessocol @mhoye @phrawzty if youโ€™re designing a new format, please please use well-defined escaping or length-prefixing instead of trying to find a less common delimiter and hoping for the best
@mhoye
REJECT MODERNITY
EMBRACE TRADTION(AL ASCII)

@phrawzty This would be way better than using CSV which is a disaster format. In particular, these characters are not Unicode interchange valid, so they can never appear in the text fields inside the tableโ€ฆ

https://wiesmann.codiferes.net/wordpress/archives/19862

More CSV Evil

good,โƒฃevil My post on CSV parsing got quite some attention, with various systems parsing them quite differently, one google+ posting by Kristian โ€ฆ

Thias ใฎ blog
KYLI - because it is superior to JSON

This is a (silly) attempt to fix some of the shortcomings of JSON. Hence it is named after the goddess of music. It uses C0 Control Characters Here is an example: โœ โ This is a KYLI document โ‚ โ GroupName โž data โŸ value โ› Comments are supported too! They can be multilined easily. โ™ I've used Unicode Control Pictures so you can see what's happening.โ€ฆ

Terence Edenโ€™s Blog
@Edent Ooh that's fun! I like how you're using all the symbols there. I get that it's all a bit tongue-in-cheek, but the your "Why this is better" list actually is better :D

@Edent
@phrawzty
Yeah, here is some repository which put even more effort into this: https://github.com/SixArm/usv (2022, so your blogpost predates it)

The issue, obviously, is that the symbols are non-printable and non-tech people do not know how to deal with it.

GitHub - SixArm/usv: Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more.

Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more. - SixArm/usv

GitHub
@tajpulo @phrawzty
Are non-techies going to be looking at JSON?
@phrawzty Iโ€™m a lay person, but I still want to code sometimes. I like separators that are always visible. Perhaps that is naive. Anyway, I can see RS in vim as ^^, but Iโ€™m not sure that uniquely identifies RS. When I test RS in the fonts on my system, half display it as white space, the other half tell me RS does not exist. Not sure what that means.

@kornelis Font support is a whole thing, for sure. As is whatever underlying structure is interpreting, storing, and rendering the symbols. For example, when I try to paste the symbol into the web interface of mastodon (i.e. right now), it pastes as a unicode box, but the stored and/or rendered result just omits it entirely.

fun!

@phrawzty What's the difference between "Information Separator One" and "Information Separator Two"? (besides them having dofferent code points, of course)

EDIT: Oh, I see there are at least four of them! ๐Ÿค”

EDIT 2: Now I see that they map directly to the ASCII control characters group. I find it strange that the Unicode database does not keep any reference to their original names in the ASCII context.

@castarco The group is all meant to be used together! This comment from @Edent has a link to a toy implementation that makes the usages of the group more clear. https://mastodon.social/@Edent/114029501886306300