New preprint! Let me tell you a story about trees, caterpillars, brooms, entropy, and getting scooped by 50 years.

Rooted trees of all shapes and sizes crop up all over biology, computing and elsewhere. How can we best compare the shapes of these myriad trees? 1/14

https://arxiv.org/abs/2507.08615

Expected and minimal values of a universal tree balance index

Although the analysis of rooted tree shape has wide-ranging applications, notions of tree balance have developed independently in different domains. In computer science, a balanced tree is one that enables efficient updating and retrieval of data, whereas in biology tree balance quantifies bias in evolutionary processes. The lack of a precise connection between these concepts has stymied the development of universal indices and general results. We recently introduced a new tree balance index, $J^1$, that, unlike prior indices popular among biologists, permits meaningful comparison of trees with arbitrary degree distributions and node sizes. Here we explain how our new index generalizes a concept that underlies the definition of the weight-balanced tree, an important type of self-balancing binary search tree. Our index thus unifies the tree balance concepts of biology and computer science. We provide new analytical results to support applications of this universal index. First, we quantify the accuracy of approximations to the expected values of $J^1$ under two important null models: the Yule process and the uniform model. Second, we investigate minimal values of our index. These results help establish $J^1$ as a universal, cross-disciplinary index of tree balance that generalizes and supersedes prior approaches.

arXiv.org
Tree shape has various aspects including diversity (effective number of leaves), bushiness (average outdegree), and balance. Our preprint focusses on quantifying tree balance. Roughly speaking, this is the extent to which the tree's leaves are evenly distributed among its branches. 2/14
Although many tree balance indices have been defined, almost all have important shortcomings that make them unsuitable for comparing trees with different numbers of leaves or different outdegree distributions. Recently, we introduced an index called J1, which solves these problems. 3/14
To calculate J1, we first assign each internal node a balance score, which is the normalized Shannon entropy of the sizes of the subtrees that descend from the node. J1 is a weighted average of these node balance scores (see https://academic.oup.com/sysbio/article/71/5/1210/6567363 for details). 4/14
Robust, Universal Tree Balance Indices

Abstract. Balance indices that quantify the symmetry of branching events and the compactness of trees are widely used to compare evolutionary processes or tree-

OUP Academic

When we defined J1 in 2022, I was surprised nobody had this idea before. Turns out soneone had! As we describe in the preprint, Jürg Nievergelt and colleagues used the same concept in 1972 when introducing the weight-balanced tree - a data structure used in computing. 5/14

https://inf.ethz.ch/news-and-events/spotlights/2019/11/nievergelt-obituary.html

Farewell, Prof. Jürg Nievergelt

       

Department of Computer Science
So we were scooped by 50 years! But, whereas computer scientists regarded J1 merely as an accessory to defining weight-balanced trees, we contend it's useful in its own right as the best way to quantify tree balance. To support adoption of J1, we need to describe its mathematical properties. 6/14
First, we want to be able to compare empirical trees to null models: Are the observed trees more or less balanced than a model predicts? Hence in the preprint we derive highly accurate approximations to the expected values of J1 under the two most common tree-generating models. 7/14
Second, we want to know which trees are considered the least balanced, according to J1. Since J1 accounts for node sizes, any non-linear tree, regardless of topology, can be made arbitrarily unbalanced (J1→0) by adjusting its relative node sizes. But what if we fix the node sizes? 8/14
We focus on leafy trees with equally sized leaves. These can be interpreted as cladograms in which leaves represent extant types and internal nodes are extinct common ancestors. If we further require that all internal nodes have outdegree 2 then the least balanced tree is a caterpillar. 9/14
But if we permit higher outdegrees then the caterpillar is no longer optimal. Instead we show that, when the leaf count exceeds 4, the least balanced tree belongs to a class we call brooms, in which every internal node except the most distant from the root has outdegree 2. 10/14
Counterintuitively, the proportional size of the "head" of the least balanced broom approaches 1 as the leaf count goes to infinity. Yet the difference in J1 values between the minimally balanced broom and the caterpillar is always small and decreases rapidly as the number of leaves increases. 11/14
Is it odd that caterpillar trees do not always minimize J1? We argue not as there is no prior “correct” answer as to whether a given non-bifurcating broom tree is more or less balanced than the caterpillar with the same number of leaves. Our index provides a reasonable solution to this problem. 12/14

Our preprint concludes: "our results strengthen the case for J1 as the most useful cross-disciplinary index of tree balance and provide a firm foundation for using J1 instead of conventional indices to compare and categorize empirical trees." 13/14

https://arxiv.org/html/2507.08615v1

Expected and minimal values of a universal tree balance index

Credit is due to my former PhD student Ves Manojlović (now a postdoc in Cambridge), long-time collaborator Yannick Viossat, and extraordinarily talented undergraduate Armaan Ahmed. If you're looking for a PhD student in math bio then you won't find better than Armaan (dm me for details). 14/14