Mastodawn

My calendar has had a fun and silly event marked on it for the end of 2022 for .. a while.

Four years ago I spent part of the holiday break implementing BOCU-1 (an obscure unicode encoding). Just for kicks, because it's cute and interesting.

Of course, it was patented! But the patent had a royalty-free promise associated with it and was near expiry anyway. After a year of runaround trying to get said royalty-free license (including briefly hiring a patent lawyer) I decided just to wait until the patent expired.

Last month it finally did! So I have finally published this most gratuitously delayed, expensive, tiny, mostly-useless crate: https://crates.io/crates/bocu1

(cc @manishearth possibly the only person in the world who could be interested in its content)

crates.io: Rust Package Registry

Show thread

Erik Novales Dec 28, 2022

@graydon I wouldn't be surprised if someone out there finds it genuinely useful and a lifesaver, somewhere down the line. Very cool project!

Show thread

Ryan Dec 28, 2022

@graydon @manishearth real annoying that they'd be jerks about it like that. Why patent something that isn't very useful, tell people they can use it if they just jump through some hoops, but then just sit on it instead.

Show thread

Ryan Dec 28, 2022

@graydon @manishearth also, good job and congratulations for getting it out!

Show thread

Peter Bindels Dec 28, 2022

@graydon @manishearth I think you underestimate the appeal of such a thing. On a more practical note - a company patenting (and limiting) an interchange format is ... so fundamentally stupid? Greedy? Not sure which applies more, but there are no positive options.

The point of interchange is that others can read it. From what I gather from Wikipedia though, it's only *fully compliant* decoders and encoders. Just file a single bug on "technical noncompliance" and you're good.

Show thread

thom (zuurr)Dec 28, 2022

@graydon @manishearth oh, I wrote in to IBM asking for a license to use this a couple years ago (for similar "it's cute and kind of fun" reasons -- I've never come across a real use), and never got a response from them, so it slipped out of my mind.

Glad to see it's no longer patented!

Show thread

thom (zuurr)Dec 28, 2022

@graydon @manishearth Ah wait, the one I wrote about was SCSU not BOCU-1... Close, though!

Show thread

Manish Dec 28, 2022

@graydon oh hey excited to see this out!

Show thread

Graydon Hoare Dec 28, 2022

@manishearth hooray! I hope it does good somewhere! Also implementing it made me think it not totally implausible that a faster version could be built that avoids integer division and modulus -- a bitwise ops BOCU-1, or "BOBOCU-1"...

Show thread

Mark Davis (Unicode)Dec 28, 2022

@graydon @manishearth it was fun working with Markus Scherer to come up with it.

Show thread

Winter Jo

❄️Jan 5, 2023

@graydon @manishearth
> Codepoint-order preserving. You can memcmp() BOCU-1 strings, or integer-compare them if small strings are packed into integers, and the comparison will obey the lexicographic Unicode codepoint order of the strings. This is probably the main use of the form: you can build a really fast ordered dictionary type or database key range using BOCU-1 small strings packed into integers (if you're ok with codepoint order).

oh WOW

Show thread

Winter Jo

❄️Jan 5, 2023

@graydon @manishearth tbh this little bit right here might be exactly what turns this into a niche interesting oddity into a real performance-optimisation feature, if one wants to have efficient prefix-hashmap lookups, of which i can be pretty sure are several cases for