Alex Diaz-Papkovich

@diazale
49 Followers
152 Following
9 Posts
Statistician and mathematician. Doing PhD research in population genetics.
PronounsHe/Him
Codehttps://github.com/diazale/
On Friday,
@diazale successfully defended a wonderful thesis on modelling high-dimensional genetic data. A must-read if you are working with UMAP or genetic clustering, or if you just like well-written and thoughtful scientific writing. Congratulations!

Our latest on the dramatic impact of colonial admixture and slave trade in South Africa. We identify strong male founder effects and displacement of Khoe-San peoples. 🇿🇦

https://www.biorxiv.org/content/10.1101/2023.09.06.556626v1

Thinking about kinds of analysis errors. There are errors of design, doing the wrong analysis intentionally, & also errors of execution, doing the wrong analysis by accident. Like this 2016 example in which country codes entered as continuous by accident. https://www.cell.com/current-biology/fulltext/S0960-9822(16)30670-4

Working in ways to prevent such errors is important. I simulate synthetic data to validate my analysis. Even in simple contexts, accidents can happen. Because computers suck and are out to get us.

Out now! We study genetic structure in large biobanks using topological data analysis via UMAP and HDBSCAN.

This approach is fast, easy-to-use, fits into existing pipelines, uses data you already have, and is downright fascinating.

Our pre-print is on bioRxiv and we have our code with a demo up on github: https://github.com/diazale/topstrat

https://www.biorxiv.org/content/10.1101/2023.07.06.548007

GitHub - diazale/topstrat: Topological stratification of biological data

Topological stratification of biological data. Contribute to diazale/topstrat development by creating an account on GitHub.

GitHub
Shadi Zabad’s latest work on predicting genetic risk:
https://www.cell.com/ajhg/fulltext/S0002-9297(23)00093-9
Variational Inference for Polygenic Risk Score (VIPRS) consistently competes with or outperforms leading approaches.
Fast and accurate Bayesian polygenic risk modeling with variational inference

We present VIPRS, a fast and accurate variational Bayesian method for estimating polygenic risk scores from genome-wide association study (GWAS) data. The method is shown to be robust and competitively accurate against popular baselines and scales well to dense genotype array data.

The American Journal of Human Genetics

It's official: it won't open this season, for the first time ever. 😔

It was clear this would happen eventually, but I didn't think it would be this soon.

Tracking the opening date of the Rideau Canal Skateway, it's clear that as climate warms in Ottawa, the skating season starts later. This season will be the latest ever, beating the Feb 2 opening in 2002.
@librarianshipwreck I think it's time to start applying Goodhart's law to carbon emissions
@cgroza guess i'm posting my research here now 😬