1. “Imagine we land a space probe on one of Jupiters’ moons, take up a sample of material, and find it is full of organic molecules. How can we tell whether those molecules are just randomly assembled goo or the outcome of some evolutionary process taking place on the planet?”

#science #scicomm #assemblytheory #exobiology

2. This is the question at the core of the now infamous Assembly Theory paper published last week in Nature and thoroughly panned on social media.

https://www.nature.com/articles/s41586-023-06600-9

My view? There is actually some very cool science here — it’s just extremely well hidden. This thread is my attempt to explain.

Assembly theory explains and quantifies selection and evolution - Nature

Assembly theory conceptualizes objects as entities defined by their possible formation histories, allowing a unified language for describing selection, evolution and the generation of novelty.

Nature

3. Let's get a few things out of the way first. 



a) The main text of the paper is terribly written. Terribly. 



b) It’s obvious why people understand it and were skeptical to say the least. 



c) Nature failed both the authors and its readers by publishing it in its present form.

4. I have two potential COIs, one of which matters and one of which people might claim matters.



The one that matters is that I've collaborated with author Michael Lachmann for nearly 30 years, and we are close friends. This matters because if I didn't know Michael so well, I probably wouldn't have taken the time to figure this paper out.

5. The one that people might claim matters is that I’m currently funded by the Templeton World Charity Foundation.

I would disagree, because they’re an entirely separate organization from the Templeton foundation that funded some of the Assembly Theory research, they don’t care what I say, and I wouldn’t pander to them even if they did.

But I want to be upfront about it.

6. With that out of the way, what does the paper do? The heart of the paper is a simple and elegant exercise in discrete mathematics. Imagine a world of arbitrary objects that can be assembled, combinatorially, to produce additional objects.

7. You begin with a set of elemental objects that require no assembly at all, and you have a set of assembly rules for when two objects can be joined. This then gives you a set of objects that can be created in a single step.

Below, an illustration.

8. Within this framework, we can take any object and calculate the minimum number of unique assembly steps that would have been required to produce it given our set of basic elements and our assembly rules.

9. This gives the ASSEMBLY INDEX of an object.

Example below: The object at left has assembly index of four: 1) join C-D, 2) join two C-D pairs, 3) join two C-D quadromers at an interior D, 4) join two of the resulting octomers.

The one at right has assembly index of five: every link is a unique step.

10. (This previous example highlights an interesting relation between modularity / compositionally and the emergence of elaborate form. The structure at left is maximally modular and thus has low assembly index despite large size; the structure at right is minimally modular and so has the reverse.)
11. In a system such as this, one can work out the mathematics of how the universe of possible objects increases with increasing assembly index. In general, it blows up super-exponentially. The Nature paper does this, though most of the details are hidden in the supplementary material.

12. But what happens if not all assembly rules are equally like to be applied, not all objects are equally likely to be incorporated into downstream objects, or not all objects are equally likely to survive?

One can treat that mathematically as well, and the space of observed objects can collapse.

13. Moreover, with strong enough biases in how assembly proceeds, the objects that are produced in high multiplicity of “copy number”. This can occur even for objects that have high assembly index.

Notice that thus far we are still talking about a simple model in discrete mathematics.

14. If we see a world with a high diversity of objects with low assembly index, it suggests that objects are merely randomly assembling and/or disassembling with no particular preference among assembly rules nor much propensity for some forms to survive better than others.
15. If instead we see a world with a low diversity of objects with high assembly index, we then need some explanation for why we these objects instead of the many others that could exist. This explanation might involve biases in assembly — think catalysis — or in survival — think selection.
16. Here’s an example. Suppose we observe world 1 at left. The objects are low assembly index, low copy number. Much of the possibility space at observed assembly indices is filled out.
17. Suppose instead we observe world 2 at right. The objects are high assembly index, high copy number. There’s clearly something special about that AA-B-CC-DD structure; it’s either really easy to form, or really stable once formed, or both.
18. Moreover these mechanisms creating preferences for some objects over others are making it possible to create and explore more of the object space for high assembly index objects, instead of getting bogged down in the already massive space of low assembly index possibilities.
19. And this brings us to the money figure from the Nature paper, reproduced below. At the left of the figure we see a world like world 1 above. At right, a world like world 2.
20. At least at the metaphorical level, life on earth, of course, is like the world at right. Highly complex molecules, organisms, etc., at high assembly numbers, tightly clustered in possibility space. This paper helps us solidify what is special about the biological complexity we observe here.

21. Returning at long last to the question at the start of the thread, how do we know if our organic soup from a Jovian moon is the product of some evolutionary process:

If the discrete math model from the paper’s supplementary material can be ported to real-world chemical environments using e.g. mass spectroscopy, we basically know how to build an evolution-detector that we could put on a space probe.

22. But what I find even cooler is that this paper gives me a new way to think about how the complexity and compositionality of structures in the universe relates to the processes that led them to come into existence and persist long enough to be observed. At it’s core, that is assembly theory.
@ct_bergstrom Thanks for the careful review, Carl. I dreaded having to try to pierce the offputting writing and the negative reviews of this to understand its core value.

@ct_bergstrom Long story short:

If you have many complex molecules that are all the same - that's probably life.

Still really interesting :)

@ct_bergstrom complete noob here. i appreciate the effort, it’s lots to think about.

@ct_bergstrom
Reminds me a little of using Zipf's Law as a rough statistical test for language - for instance, in the case of the endlessly discussed Voynich manuscript.

Definitely interesting stuff!

@ct_bergstrom one problem with this is that it treats molecules as classical objects, i.e. based on independent point masses, which violates Heisenberg's assumption that the true system state is a higher-dimensional function dependent on all of the variables coupled together, simultaneously, and which is in principle unobservable. even the concept of a "particle" is just an approximation of this high dimensional function in the form of a factorization. it's unlikely that biology is so simple
@ct_bergstrom Fantastic, thanks for the summary. Any thoughts on how that relates to something like Kolmogorov complexity?

@ct_bergstrom I feel like this is related to the Boltzmann brain thought experiment somehow, but I can't quite put my finger on how to connect them properly

https://en.wikipedia.org/wiki/Boltzmann_brain

(maybe this came up in the early feedback/critique, so apologies if this has already been discussed to (heat) death; I just learned of this paper through your thread)

Boltzmann brain - Wikipedia

@ct_bergstrom

On this regard – the processes that let complex structures come to be – I am reminded here of:

"On the statistical mechanics of life: Schrödinger revisited" by Jeffery, Pollack and Rovelli 2019 https://arxiv.org/abs/1908.08374

Where it says:

"We question some common assumptions about the thermodynamics of life and illustrate how, contrary to widespread belief, even in a closed system entropy growth can accompany an increase in macroscopic order."

On the statistical mechanics of life: Schrödinger revisited

We study the statistical underpinnings of life. We question some common assumptions about the thermodynamics of life and illustrate how, contrary to widespread belief, even in a closed system entropy growth can accompany an increase in macroscopic order. We consider viewing metabolism in living things as microscopic variables directly driven by the second law of thermodynamics, while viewing the macroscopic variables of structure, complexity and homeostasis as mechanisms that are entropically favored because they open channels for entropy to grow via metabolism. This perspective reverses the conventional relation between structure and metabolism, by emphasizing the role of structure for metabolism rather than the other way around. Structure extends in time, preserving information along generations, particularly in the genetic code, but also in human culture. We also consider why the increase in order/complexity over time is often stepwise and sometimes collapses catastrophically. We point out the relevance of the notions of metastable states and channels between these, which are discovered by random motion of the system and lead it into ever-larger regions of the phase space, driven by thermodynamics. We note that such changes in state can lead to either increase or decrease in order; and sometimes to complete collapse, as in biological extinction. Finally, we comment on the implications of these dynamics for the future of humanity.

arXiv.org

@ct_bergstrom
Thank you for this writeup! I guess I'm ready to brave the paper now so armed with insights to look out for.

I've been thinking about these characteristics as probabilities of occurrence of assembly events and their *local* diversity. I think locality is the misunderstood missing piece for physics. Local system effects arise in nonreciprocal systems:
https://www.quantamagazine.org/a-new-theory-for-systems-that-defy-newtons-third-law-20211111/

These arise in several different scenarios, like condensation of solar systems. Solar systems become very stable within their dynamic activity when they "survive" whatever catastrophic (planetary crashing) was inherent in their formation, very difficult to predict.

So we can note that we ourselves, phenotypes of R/DNA genotypes, are very unlikely in general, but less unlikely under conditions at this location. There are several layers of assembly processes involved.

A New Theory for Systems That Defy Newton’s Third Law | Quanta Magazine

In nonreciprocal systems, where Newton’s third law falls apart, “exceptional points” are helping researchers understand phase transitions and possibly other phenomena.

Quanta Magazine
@ct_bergstrom Excellent description. Agreed the writing was awful and the association fallacy. There is a kernel of a good idea there.
@ct_bergstrom
Highly reminiscent of the theory of Big Bang and supernova nucleosynthesis! Makes sense as an analytical approach.

@ct_bergstrom This is a wonderful explanation. I saw the paper go by but ignored it.

Tom's Law of the Internet: "Eventually everything will get a simple explanation from an expert, you just have to find it in the noise."

@ct_bergstrom This seems to assume two things - that life takes on the same form as life on earth, and that humans know what they are looking for.

I'm both cases, with infinite probability - what if we have "life" all wrong - we assume sentience in these cases? But what about basic conciseness + life, how could we spot something the universe has decided to try differently?

We so far have only one lab, and even then we're not entirely sure how it works.

@ct_bergstrom Here is an alternative approach that has been develop by Yu Liu and myself which also can be applied to spatial structures: https://www.mdpi.com/1099-4300/24/8/1082
Ladderpath Approach: How Tinkering and Reuse Increase Complexity and Information

The notion of information and complexity are important concepts in many scientific fields such as molecular biology, evolutionary theory and exobiology. Many measures of these quantities are either difficult to compute, rely on the statistical notion of information, or can only be applied to strings. Based on assembly theory, we propose the notion of a ladderpath, which describes how an object can be decomposed into hierarchical structures using repetitive elements. From the ladderpath, two measures naturally emerge: the ladderpath-index and the order-index, which represent two axes of complexity. We show how the ladderpath approach can be applied to both strings and spatial patterns and argue that all systems that undergo evolution can be described as ladderpaths. Further, we discuss possible applications to human language and the origin of life. The ladderpath approach provides an alternative characterization of the information that is contained in a single object (or a system) and could aid in our understanding of evolving systems and the origin of life in particular.

MDPI
@ct_bergstrom
Hi Prof. Bergstrom
You might also find this work interesting (https://www.researchsquare.com/article/rs-3440555/v2), which is motivated by Francois Jacob's concept of tinkering and similar ideas, and falls within the broader category of Algorithmic Information Theory.
More importantly, it can characterize the hierarchical and nested relationships among repetitive substructures.
Evolutionary Tinkering Enriches the Hierarchical and Nested Structures in Amino Acid Sequences

Genetic information often exhibits hierarchical and nested relationships, achieved through the reuse of repetitive subsequences such as duplicons and transposable elements, a concept termed ``evolutionary tinkering'' by Fran\c{c}ois Jacob. Current bioinformatics tools often struggle to capture...

@ct_bergstrom ( "the space of observed objects can collapse." would made a nice warning sign for an abstract area of danger. )
@tomtrottel @ct_bergstrom Hazel: I feel like you could build a whole SCP around that sentence alone
@wertercatt @tomtrottel Wait until you read book 3 of the Three Body Problem triology.
@ct_bergstrom ( easy, just be the center :-) ) ( never read the science fiction novels, worth ? )
@tomtrottel I loved them and I read only a small amount of science fiction these days. Definitely male-coded hard scifi.
@ct_bergstrom @wertercatt @tomtrottel Thanks for the review. I have always found it hard to see the main difference between assembly theory and algorithmic complexity (like Kolmogorov complexity), what is the main difference? Is it maybe the concept of the “copy number”?
@bjorn_hogberg @wertercatt @tomtrottel I don't fully understand this myself. My first thought was also that this was a reformulation of algorithmic complexity. Copy number is an important addition for sure. I look forward to a discussion emerging around this, which of course was my main motivation for writing the thread.

@bjorn_hogberg @wertercatt @tomtrottel

The authors have a short, dryly witty paragraph on this that I suspect will be lost on 99.99% of their audience; it was lost on me.

After further consideration, my interpretation of what they are saying is that until we get way down the evolutionary pathway toward complex life, what a universal Turing machine can do (KL complexity) is irrelevant because there are no universal Turing machines until very late in the process.

@ct_bergstrom ( forgive my unqualified comment, but that sounds like somekind of a chaos infused fuzzy compiler is needed , recompiling part of itself at runtime to compile new parts :-) )
@ct_bergstrom @wertercatt @tomtrottel I didn’t understand that. From that paragraph it sounds like AT is not to be used for actual biology, but rather for its “progenitor molecules”, i.e. stuff that would not require assemblers/constructors, like ribosomes and polymerases (Turing machine-like things)?

@bjorn_hogberg @wertercatt @tomtrottel

At the level of molecules, that's probably right?

Maybe there are applications of AT at higher level of structure e.g. assemblages of proteins? I'm not sure. I had never heard the term until Wedneday.

@ct_bergstrom @bjorn_hogberg @wertercatt @tomtrottel I suppose that, if your instrument is a certain kind of mass-spec with a limited number of tuning options, everything looks like a molecule between mass A and B. Larger stuff may need to be fractionated before going in, and if it's not an earth-kind of macromolecule you may have no way to "sequence" it at the stage of this investigation.
@ct_bergstrom @bjorn_hogberg @wertercatt @tomtrottel My understanding is the same as yours. Kolmogorov complexity assumes strong abstract computational powers (loops!), far beyond what we would expect from any real-world compositional mechanism. For instance, the K complexity of the sequences of natural numbers or of the cubes (say) is very small but we would be very surprised to see anything like this arising organically (either in the narrow sense of biology or the broader sense of chemistry).
@minimaliste13 @ct_bergstrom @wertercatt @tomtrottel Crystals are exactly that, low Kolmogorov complexity, and they are everywhere in nature.
@bjorn_hogberg @ct_bergstrom @wertercatt @tomtrottel Good point. Perhaps a refined thesis is that low Kolmogorov complexity chemical structures (e.g crystals) are too rigid to support organic processes.

@ct_bergstrom I think here Kolmogorov is just playing the customary complexity straw man, as if there's no third choice. Unfortunately, their attack that it "reflects nothing of the underlying process" largely applies to AT as well.

Their "conservative" assumption, taken seriously, requires an "underlying process" capable of performing constant effort searches of a hyperexponential space.

Rather than a third choice like, you know, looking at what reactions are actually happening in the 'goo'.

@ct_bergstrom
Throwing out the science because of the association informal fallacy is also a fallacy IMO. Again, furiously agreed.
@ct_bergstrom COI=Conflict of interest, for those who aren’t familiar