| website | neherlab.org |
| https://twitter.com/richardneher | |
| github | https://github.com/rneher |
| orcid | https://orcid.org/my-orcid?orcid=0000-0003-2525-1407 |
| website | neherlab.org |
| https://twitter.com/richardneher | |
| github | https://github.com/rneher |
| orcid | https://orcid.org/my-orcid?orcid=0000-0003-2525-1407 |
Within the syntenic core genome, linkage between SNPs decays rapidly with distance. LD approaches background levels after about 1000 bases. The background level itself is often set by population structure with little linkage within subgroups but genome wide coupling across subgroups.
[4/6]
Phages are known to recombine with each other and have flexible and fluid genomes. We analyzed diversity within the clusters of the Acinetobacteriophage Database by organizing their genomes into pangenome graphs based on homologous phams (protein families).
[2/6]
We then use these improved estimates of neutral mutation rates to look for regions with evidence that synonymous or non-coding mutations are under purifying selection. The majority of clear signals correspond to well known structures like the ORF1a/b frameshift and TRS. But there are two clear signals for which we could not find a clear explanation (in E, and at the M/ORF6 boundary).
[6/N]
Rates also depend on neighboring bases, sometimes by more than 10-fold. These neighbor-dependence is very strand symmetric for some mutations (e.g. T>G and A>C), but not for others.
A symmetry between strands would be expected for mutations associated with replication (the genome is copied from + to - and back to +), while processes like deamination would depend mostly on the accessibility of the base.
[3/N]
A simple linear model with genomic region, 5' and 3' neighborhood, and 2nd pairing explains between 15 and 60% of the fold-variation of the rates.
[5/N]
More complex issues arise when habitats are shifting in time. In this case, the location of deep nodes in the tree could be in parts of space without samples because the habitable region has shifted over time. In such situations, phylogeographic inferences can be confidently wrong.
[5/6]
A more fundamental short-coming of phylogeography is that it typically assumes that replication rate is independent of spatial location. But populations grow where resources are abundant and contract when conditions deteriorate. Ignoring this coupling between growth and spatial location can strongly distort inferences.
[3/6]
Outside of directed migrations, organisms explore their surroundings in an undirected manner and this is typically modeled as diffusion. Yet, many estimate “lineage velocities” by dividing the inferred distance traveled along a branch by the length of the branch.
But for diffusion, this doesn’t make any sense. The result will depend on sample size and can thus not even be compared between two samples from the same population, let alone between different populations.
[2/6]
Mutation calling relative to clade/lineage founders is enabled by default in all datasets. Comparison to individual strains such as vaccine strains needs to be specified in the dataset by the dataset maintainers. These mutation calls are exported in the tabular and json output files.
[4/4]