'On November 28th, 2012, Randall Munroe published an xkcd comic that was a calendar in which the size of each date was proportional to how often each date is referenced by its ordinal name (โ€ฆ) "In months other than September, the 11th is mentioned substantially less often than any other date. It's been that way since long before 9/11 and I have no idea why." After digging into the raw data, I believe I have figured out why.'

https://drhagen.com/blog/the-missing-11th-of-the-month/

The Missing 11th of the Month - David R Hagen

Personal website of David R Hagen, scientific software engineer

@texttheater Do ypu know if there exist similar calendars for books in other languages, European for example? The xkcd one is rather....USA-centered. Of course "September 11th" will be extremely big in European languages too, but there will be other dates shining- July 14th, May 8th, May5th, October3rd....and of course December 24th instead of 25th.
@texttheater fear the single-minded exhaustive and absolutely commercially unsupportable analysis of some rando you accidentally nerdsniped. :)
@texttheater This was very enjoyable, thank you! ๐Ÿคฉ๐Ÿ™
@texttheater Se nunca ninguรฉm mencionasse 11 de agosto no Brasil, os advogados talvez fizessem menos estrago.
@texttheater This is the kind of quality content I love on Mastodon.

@texttheater this is awesome data sleuthing and a great writeup, thanks for introducing me to this.

It is also the sort of thing where I would really love quote posts, btw, to translate the awesomeness of serendipitous gold nuggets in my timeline to different unsuspecting audiences.

And it is also, also a reminder that blogging is hella cool, even if it is infrequent.

@texttheater The author says the numeral 1 was often confused for lookalikes by Google's OCR, but it goes deeper than that: traditional typewriters rarely had dedicated 0 or 1 keys. Operators used uppercase oh and lowercase ell instead. So, at least in some cases, the text on the original page really was "January llth"!

(I'm not aware of this ever being common practice for printers, even if they ran out of digits, but I wouldn't be surprised either way.)

@texttheater what an i11uminating post
@texttheater How about the 11th month? And the 11th day of the 11th month? All the birthday cards Iโ€™ve missed! And presents.

@texttheater
Salvador Allende ?
The CIA backed military coup on

  • September 1973 replacing a democratically elected president with a dictator?
  • @Suran @texttheater that's what I remember anyway (2nd generation Chileans here, Frisk and I ~Chara)

    @texttheater

    Many years ago I was involved in a system which did OCR of Giros (= European Postal Cheques).

    Our OCR-wizards (and they were!) showed us that handwritten '4' came in several dialects in Denmark.

    OCR is a much more complex subject than most people imagine.

    @texttheater I bet that the 'n' in 'nth' is a result of 'llth' being not a number... so it's actually shorthand for 'NaNth' ๐Ÿ˜œ!

    Fascinating analysis of why the 11th of the month has less representation is print.

    The NGRAM database by Google catalogs the usage of words in print and books from 1800's to 2008. But why would the 11th of the months be under represented? Dig in to find out!

    @texttheater awesome story! My two take always: The more complex/blackbox an algorithm, the harder it is to resolve (potentially harmful) biases. Legible fonts like `Atkinson Hyperlegible Next` could have prevented this.

    https://www.brailleinstitute.org/freefont/

    Atkinson Hyperlegible Font - Braille Institute

    Read easier with Atkinson Hyperlegible Font, crafted for low-vision readers. Download for free and enjoy clear letters and numbers on your computer!

    Braille Institute
    @texttheater excellent work! In the 70s I used a typewriter without a proper 1 (one) key. Didn't like it at the time, but I also never thought about the long term consequences.

    @hembrow @texttheater you can be forgiven for not accounting for the possibility of systems that started existing only 50 years later.

    But considered this: Alan Turing was already working on character recognition software on the Manchester Mk I

    @texttheater this is an excellent post, thank you so much for the analysis and the writeup, it's easy and enjoyable to read.
    @nicopap It's not my post
    @texttheater oops my bad. Thank for the info. Doing my research now, finding out the post is 10 yo. Ah, thanks for sharing in any case

    @texttheater this is a great review on database data quality. I am still unconvinced of xkcd's interpretation of the calendar interpretation of September 21st which must be considered WAYYY larger

    I mean who doesn't remember the 21st night of September ๐ŸŽถ๐ŸŽน

    re: lb: it's cuz `11th` looks like (or literally was, on typewriters) `llth`, `iith`, `nth`, etc to google's OCR digitizing the corpus. adding back the misread dates removes the anomaly.

    @texttheater
    I wonder whether the nth interpretations of 11th might have been due to shorter 1 characters used in fonts that had so-called Text Figures, where the 0, 1 and 2 were actually shorter than other figures, which had ascendera and descenders. In doing so, the two 1s would have been about the same height as an n.

    Wikipedia has more to say, and despite the example below being from a more recent font, Text Figures were popular much earlier.
    https://en.m.wikipedia.org/wiki/Text_figures