Mastodawn

Damien de Vienne Nov 6, 2023

5/ From this 2WR matrix, we detect outlier values, we store them in a list, we remove these outliers directly in the initial distance matrices, and we compute the new compromise matrix. If the compromise is improved, we continue this new loop and find new outliers (if any). Etc.

Show thread

Damien de Vienne Nov 6, 2023

[...] a matrix from these projections, giving for each species in each individual gene, its distance to its average position according to the compromise. We call this the 2-way reference (2WR) matrix, a gene x species matrix where outliers (large values) can then be spotted.

Show thread

Damien de Vienne Nov 6, 2023

4/ Then, on this same space, each individual matrix is projected, so that the position of each species (small dots) in each matrix can be compared to its average position (large dots).
This is actually very cool! Because one can then compute [...]

Show thread

Damien de Vienne Nov 6, 2023

3/ The compromise matrix is then projected on the "compromise space". There, each dot represents the average position of each species with respect to the others; distance between dots reflects the distance between the species in the compromise matrix.

Show thread

Damien de Vienne Nov 6, 2023

(matrices that are very dissimilar to the others are assigned a lower weight).
2/ These weights are used in the creation of the "Compromise Matrix", a distance matrix obtained by computing the weighted average of the indidual distance matrices.

Show thread

Damien de Vienne Nov 6, 2023

Then the process at the heart of PhylteR starts. It is based on DISTATIS, an extension of multidimensional scaling to three dimensions. Here is what happens (simplified):
1/ RV-coefficients (~correlation) between matrices are computed and used to assign a weight to each matrix

Show thread

Damien de Vienne Nov 6, 2023

PhylteR starts from a collection of distance matrices, (pairwise patristic distances between species) retrieved from individual gene trees (or -optionally - directly from multiple sequence alignments).
Missing data (if any) are imputed to ensure equal dimensions of all matrices.

Damien de Vienne Nov 6, 2023

PhylteR, our new tool for filtering phylogenomics datasets, is now out!

https://doi.org/10.1093/molbev/msad234

PhylteR identifies with precision, from a collection of gene trees, the "outlier" sequences responsible for a lack of concordance among gene trees.

How it works? A small thread 👇

#phylogenomics

PhylteR: efficient identification of outlier sequences in phylogenomic datasets

Abstract. In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and

OUP Academic

Damien de Vienne Nov 23, 2022

Very nice general article (in french) in this month's
#Epsiloon magazine, on #ghost lineages!!

Damien de Vienne Nov 18, 2022

Thanks to l'#Humanité_magazine for this double page (in french) on #ghosts and their impact for the study of gene flow!!!

It's really nice to see the PhD work of Theo Tricou (with Eric Tannier and myself) being so well covered by mainstream medias!