Dear Mastodon friends, it's my pleasure to share this new manuscript on distinguishing gene flow from incomplete lineage sorting (ILS)

https://www.biorxiv.org/content/10.1101/2023.07.06.547897v1

[1/6]

It is well known that, in addition to affecting topology, ILS tends to generate tall gene trees, and gene flow short gene trees. The newly introduced Aphid method formalizes this in the maximum likelihood framework via a mixture model. So Aphid goes beyond ABBA-BABA-like methods by also analyzing gene tree branch lengths.
[2/6]
Besides partitioning the phylogenetic conflict in gene flow vs. ILS, Aphid also returns gene flow-aware estimates of speciation times and ancestral effective pop size. Simulations are reassuring regarding its accuracy/robustness/competitiveness.
[3/6]
Interestingly, data analysis in apes suggests significant gene flow between the human, chimpanzee and gorilla lineages shortly after the human/chimp split - this is predicted to concern ~12% of the ape genome, and account for roughly half of the phylogenetic conflict in this triplet.
[4/6]
This is kind of a secret project I was doing in my spare time, so, largely a one-brain project, there's probably room for improvement, comments more than welcome!
Aphid is easily installed and quite fast, maybe take a look and send feedback: https://gitlab.mbb.cnrs.fr/ibonnici/aphid/
[5/6]
iago-lito / aphid · GitLab

GitLab Community Edition

GitLab
PS: submitted to the @PeerCommunityIn of course
[6/6]

Hi @GaltierNicolas

I didn't have time to read your paper in detail yet (sorry) but it makes me think about two things:

- As @Julien_JOSEPH said, our paper in PLOS Biol where we show that HGT can produce long branches (instead of short ones) as soon as some species are missing (may not be that rare in most of the cases).

- Also this paper whose title is quite close:
:https://bsapubs.onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ajb2.1064

Happy to talk more if you want!

@damdevienne @Julien_JOSEPH
Thanks for the ref to L. Knowles' paper; a very similar goal, but using an unpublished machine learning method so I wasn't sure how to cite/compare it.

@GaltierNicolas Hi! Have you seen this article by colleagues from the LBBE ? @damdevienne
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001776

"Introgression from a ghost lineage (X) from outside the ingroup of interest produces a phylogenetic tree with increased branch lengths compared to the species tree (scenario C),"

It seems that GF does not always leads to shorter branches...

But may be there is a way to control for that ?

Ghost lineages can invalidate or even reverse findings regarding gene flow

Extinct and unknown species are overlooked in evolutionary studies of gene flow. This paper shows that taking such "ghost lineages" into account changes the conclusions of several studies on this subject.

@Julien_JOSEPH @damdevienne
yes!
Gene flow from ghost lineage is a potential confounder, which I could have discussed indeed.
This migth generate tall discordant gene trees with significant imbalance between the two discordant topologies; I haven't witnessed this pattern so far in my analysis but clearly something to keep in mind!