Mastodawn

“Input character sequence” is indeed a generic term, and needs to be tied to a process or algorithm to which the character sequence is an input. In this case, it’s an input to a shaping system, such as HarfBuzz.

Show thread

Lontar GmbH Jan 8

@TiroTypeworks

I might have called your “shaping order” the “initial glyph sequence”, but as long as we know what we mean, the name doesn’t matter that much to me. And yes, this point in shaping is important because it’s the first time the font gets to see and influence what’s going on.

No, “encoding order” is not tied to any particular step. I use the term to specify how text for a given script *should be* encoded, what we consider “correct”. An actual character sequence may not conform to the specified encoding order. That’s why the USE and some other shaping engines validate their input and insert dotted circles where text does not conform to the specified encoding order.

Show thread

Lontar GmbH Jan 8

@TiroTypeworks

“The moment when layout passes from character-level processing to glyph-level processing” isn’t well-defined in the Universal Shaping Engine – see USE bug 270.

An interesting point to look at glyph order is just before application of lookups starts. By then shaping systems have processed the input character sequence in several steps:

• (optional) apply full or partial Unicode normalization

• (USE) apply specified partial Unicode normalization

• (some shaping engines, incl. USE) inserted dotted circles to ensure valid clusters

• (some shaping engines, not USE) decompose, insert, reorder specific characters

• map characters to glyphs per cmap, possibly applying (de-)composition

Would that be your shaping order?

Show thread

Lontar GmbH Jan 8

@TiroTypeworks

I think there should be well-defined encoding orders for Brahmic scripts shared between all stages of processing. Sadly the Unicode Standard doesn’t define such encoding orders, or defines them incompletely, or (for Khmer) defines one that’s unworkable. So for the scripts supported by the Universal Shaping Engine the encoding orders defined by the USE are currently your best bet.

I wrote in much more detail about this topic in “Order and disorder in Unicode”:

https://lontar.eu/en/notes/order-and-disorder-in-unicode/

Order and disorder in Unicode

Show thread

Lontar GmbH Jan 7

@TiroTypeworks

I do mean “encoding order”, the order in which characters should appear within a Unicode character sequence. The Universal Shaping Engine defines its cluster model in terms of Unicode character properties, and validates before any glyph-level processing, so it seems to agree with that.

One issue is the USE’s lack of compatibility with Unicode normalization. The USE applies some decompositions, and rendering systems may apply more steps (decomposition, reordering, composition) before passing text to the USE. However, the USE is not designed to guarantee that text is rendered identically independent of normalization. See USE bugs 905 and 568.

Do you know of other issues?

I’m not familiar with “shaping order” – can you point me to a definition?

Lontar GmbH Jan 7

Updated article: Encoding orders of Brahmic scripts

Documents the encoding orders that the OpenType Universal Shaping Engine assumes for the Brahmic scripts it supports. Understanding encoding orders is necessary when rendering or otherwise interpreting text in these scripts, as well es when entering text using input methods or otherwise generating text.

Updated for Unicode 17.0 and latest USE data.

https://lontar.eu/en/notes/encoding-orders-of-brahmic-scripts/index.html

Lontar GmbH Dec 31

Muthu Nedumaran Dec 31

A smart component trick for adjusting spacing in Devanagari conjuncts while keeping automatic alignment intact in Glyphs

https://muthunedumaran.com/2025/12/30/a-spacer-trick-for-conjunct-spacing-in-glyphs/

Lontar GmbH Nov 12

New app released: Fontburst for iPhone and iPad

The Fontburst app lets you discover almost 400 font families that Apple makes available to users of iPhone and iPad. For each family, it shows sample strings in each script that the fonts support, so you know what works and what it looks like. If you like what you see, and the font isn’t installed yet, download it right from the app. You can then use the font in apps that let you choose fonts, such as Pages or Keynote.

https://apps.apple.com/us/app/fontburst/id6449231969

Lontar GmbH Sep 15, 2025

Norbert Lindenberg Sep 15, 2025

Unicode 17 includes a change that may improve line breaking, backspacing, and other behavior for Khmer, Myanmar, and twelve other Brahmic scripts: Extended grapheme cluster breaks, which may be used in such processes to identify “characters”, no longer occur within sequences of a conjoiner and a consonant in these scripts. Such sequences represent conjunct forms that users see as indivisible entities.

See

https://www.unicode.org/reports/tr44/tr44-36.html#Derivation_InCB

https://www.unicode.org/L2/L2024/24058r-conjuncts.pdf

Lontar GmbH Sep 12, 2025

W3C Internationalization, i18n Sep 12, 2025

I18N GAP FIXED: Line-breaking now works for Javanese & other SE Asian scripts in all major browser engines.

Javanese, Balinese, and some other scripts can't wrap at word boundaries because those often lie in letter stacks which must be kept together. They therefore wrap at orthographic syllable boundaries. Thanks to Norbert Lindenberg, Unicode now supports this type of line-breaking.

Gap report: https://www.w3.org/TR/java-gap/#issue40_line_breaking

Article about line breaking: https://www.w3.org/International/articles/typography/linebreak.en.html#sec_se_asia2