Damien de Vienne

114 Followers
79 Following
24 Posts
Tree of Life, ghost lineages, horizontal gene flow, #Lifemap App. CNRS researcher, LBBE Univ. Lyon 1.

I am very pleased to share with the mastodon community the first part of my PhD work:

High prevalence of Prdm9-independent recombination hotspots in placental #mammals

https://www.biorxiv.org/content/10.1101/2023.11.17.567540v1

This work was done in collaboration with @djivanprentout Alexandre Laverré, Théo Tricou and @duret_lbbe. (1/8)

#Recombination #PopGen #Evolution #gBGC #PRDM9

PhylteR was developed and written over the years with great students/colleagues,
Aurore Comte, Theo Tricou, Eric Tannier, @Julien_JOSEPH, Aurélie Siberchicot, Simon Penel, Rémi Allio, Frédéric Delsuc and Stéphane Dray.

Thanks! 🙏

7/ PhylteR is a package written in R language available on CRAN (https://cran.r-project.org/web/packages/phylter/index.html), but also as singularity and docker containers.
Extensive documentation can be found at https://damiendevienne.github.io/phylter/index.html.
phylter: Detect and Remove Outliers in Phylogenomics Datasets

Analyzis and filtering of phylogenomics datasets. It takes an input either a collection of gene trees (then transformed to matrices) or directly a collection of gene matrices and performs an iterative process to identify what species in what genes are outliers, and whose elimination significantly improves the concordance between the input matrices. The methods builds upon the Distatis approach (Abdi et al. (2005) <<a href="https://doi.org/10.1101%2F2021.09.08.459421" target="_top">doi:10.1101/2021.09.08.459421</a>>), a generalization of classical multidimensional scaling to multiple distance matrices.

6/ Well, this was quick, but you get the idea!? At the end, PhylteR users obtain a list of identified outliers. Their choice then to do what they want with it (filter MSAs, prune gene trees, explore outliers, etc.).

For more details read the paper!
And **GIVE IT A TRY!!**

5/ From this 2WR matrix, we detect outlier values, we store them in a list, we remove these outliers directly in the initial distance matrices, and we compute the new compromise matrix. If the compromise is improved, we continue this new loop and find new outliers (if any). Etc.
[...] a matrix from these projections, giving for each species in each individual gene, its distance to its average position according to the compromise. We call this the 2-way reference (2WR) matrix, a gene x species matrix where outliers (large values) can then be spotted.
4/ Then, on this same space, each individual matrix is projected, so that the position of each species (small dots) in each matrix can be compared to its average position (large dots).
This is actually very cool! Because one can then compute [...]
3/ The compromise matrix is then projected on the "compromise space". There, each dot represents the average position of each species with respect to the others; distance between dots reflects the distance between the species in the compromise matrix.
(matrices that are very dissimilar to the others are assigned a lower weight).
2/ These weights are used in the creation of the "Compromise Matrix", a distance matrix obtained by computing the weighted average of the indidual distance matrices.
Then the process at the heart of PhylteR starts. It is based on DISTATIS, an extension of multidimensional scaling to three dimensions. Here is what happens (simplified):
1/ RV-coefficients (~correlation) between matrices are computed and used to assign a weight to each matrix