In a new preprint, I explore some problematic aspects of phylogeographic inference in continuous space with two main findings:

1) Estimating velocities of individual lineages makes little sense.

2) If population densities or habitats change over time, inferences of dispersal parameters and ancestral locations are unreliable.

https://www.biorxiv.org/content/10.1101/2024.07.03.601889v1

[1/6]

Outside of directed migrations, organisms explore their surroundings in an undirected manner and this is typically modeled as diffusion. Yet, many estimate “lineage velocities” by dividing the inferred distance traveled along a branch by the length of the branch.

But for diffusion, this doesn’t make any sense. The result will depend on sample size and can thus not even be compared between two samples from the same population, let alone between different populations.

[2/6]

A more fundamental short-coming of phylogeography is that it typically assumes that replication rate is independent of spatial location. But populations grow where resources are abundant and contract when conditions deteriorate. Ignoring this coupling between growth and spatial location can strongly distort inferences.

[3/6]

The problems of ignoring coupling between space and replication have multiple layers. In a static habitat of size L x L, it matters how the diffusion constant D compares to L^2/T_c (where T_c is the coalescent time scale of a panmictic population).

If D<<L^2/T_c, the population fragments into weakly coupled subpopulations with Tmrca>>T_c, violating model assumptions.

If D>>L^2/T_c, diffusion is so fast that ancestral locations are almost uniform within the boundaries of the habitat.

[4/6]

More complex issues arise when habitats are shifting in time. In this case, the location of deep nodes in the tree could be in parts of space without samples because the habitable region has shifted over time. In such situations, phylogeographic inferences can be confidently wrong.

[5/6]

Phylogeographic and phylodynamic analyses often involve complex models that are computationally expensive to infer, but still lack critical elements such as the coupling between population dynamics and spatial location.

In absence of identifiable models that capture the essential features of population dynamics and evolution, simple models that extract signals from the data in a robust and transparent manner are preferable to complex “black box” procedures.

[6/6]