STATGEN 2024 talk
A Kernel-Based Neural Network for High-dimensional Risk Prediction on Massive Genetic Data
Qing Lu

Neural Network
Nonlinear
Non-additive

Kernel-Based Neural Network (KNN)
kernel matrics constructed based on the genetic variables.

Related preprint:
An Association Test Based on Kernel-Based Neural Networks for Complex Genetic Association Analysis
https://arxiv.org/abs/2312.06669

1/

#STATGEN2024 #Genetics #StatisticalGenetics

An Association Test Based on Kernel-Based Neural Networks for Complex Genetic Association Analysis

The advent of artificial intelligence, especially the progress of deep neural networks, is expected to revolutionize genetic research and offer unprecedented potential to decode the complex relationships between genetic variants and disease phenotypes, which could mark a significant step toward improving our understanding of the disease etiology. While deep neural networks hold great promise for genetic association analysis, limited research has been focused on developing neural-network-based tests to dissect complex genotype-phenotype associations. This complexity arises from the opaque nature of neural networks and the absence of defined limiting distributions. We have previously developed a kernel-based neural network model (KNN) that synergizes the strengths of linear mixed models with conventional neural networks. KNN adopts a computationally efficient minimum norm quadratic unbiased estimator (MINQUE) algorithm and uses KNN structure to capture the complex relationship between large-scale sequencing data and a disease phenotype of interest. In the KNN framework, we introduce a MINQUE-based test to assess the joint association of genetic variants with the phenotype, which considers non-linear and non-additive effects and follows a mixture of chi-square distributions. We also construct two additional tests to evaluate and interpret linear and non-linear/non-additive genetic effects, including interaction effects. Our simulations show that our method consistently controls the type I error rate under various conditions and achieves greater power than a commonly used sequence kernel association test (SKAT), especially when involving non-linear and interaction effects. When applied to real data from the UK Biobank, our approach identified genes associated with hippocampal volume, which can be further replicated and evaluated for their role in the pathogenesis of Alzheimer's disease.

arXiv.org

STATGEN 2024 talk
Improved methods for empirical Bayes multivariate multiple testing and effect size estimation
Yunqi Yang

Empirical Bayes multivariate normal means (EBMNM) model [Urbut et al., 2019]

Allow for heterogeneous sharing of eQTLs in multiple tissues (e.g., some are shared across all tissues, some are shared only within brain tissues, etc.)

Truncated Eigenvalue Decomposition

udr: Ultimate Deconvolution in R
https://stephenslab.github.io/udr/

#STATGEN2024 #Genetics #StatisticalGenetics

Ultimate Deconvolution for Multivariate Normal Means

Implements fast statistical algorithms for solving the multivariate normal means problem via empirical Bayes, building on the "Extreme Deconvolution" method <DOI:10.1214/10-AOAS439>.

STATGEN 2024 talk
MultiSTAAR: A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies
Xihao Li

Functionally-informed Multi-Trait MultiSTAAR approach.

MultiSTAAR-O: Omnibus test
1. Burden
2. SKAT
3. ACAT-V

Li X et al. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. bioRxiv doi: 10.1101/2023.10.30.564764.

#STATGEN2024 #Genetics #StatisticalGenetics

STATGEN 2024 talk
Adventures in Human Genetics: Purpose, Serendipity, Innovation
Gonçalo Abecasis

"It is important to think carefully about what is the right question, and what are the right statistics. But there is a lot of opportunity in thinking about what is the best design to answer the question."

Goal
Understand disease
Treat
Predict disease
Prevent

Can learn from natural experiments in millions of people.

1/

#STATGEN2024 #Genetics #StatisticalGenetics

STATGEN 2024 talk
Working towards Inclusivity in Genetic Studies: Estimating accurate population structure with Small Reference Sample Sizes
Souha Tifour

Arriaga-MacKenzie et al Summix: A method for detecting and adjusting for population structure in genetic summary data. Am J Hum Genet. 2021 Jul 1;108(7):1270-1282. doi: 10.1016/j.ajhg.2021.05.016.

Summix relies on reference populations, but what if the ref pop is small?

1/

#STATGEN2024 #Genetics #StatisticalGenetics

STATGEN 2024 talk
Genotype prediction of 336,463 samples from public expression data
Afrooz Razi

recount3: uniformly processed RNA-seq
https://rna.recount.bio/

We developed a statistical model to predict genotypes from the Recount3 data

It has high prediction accuracy.

1/

#STATGEN2024 #Genetics #StatisticalGenetics

recount3: uniformly processed RNA-seq

STATGEN 2024 talk
BRCAPRO+BCRAT: extending a Mendelian breast cancer risk prediction model to include non-genetic risk factors
Zoe Guan

BRCAPRO: Mendelian model, genes

BCRAT: 1st family hx, hormonal risk factors, hx of benign disease

Combine these complementary models.

https://www.mdpi.com/2072-6694/15/4/1090

#STATGEN2024 #Genetics #BreastCancer #RiskPrediction #StatisticalGenetics

Combining Breast Cancer Risk Prediction Models

Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Multiple breast cancer risk prediction models are used in clinical practice, and often provide a range of different predictions for the same patient. Integrating information from different models may improve the accuracy of predictions, which would be valuable for both clinicians and patients. BRCAPRO is a widely used model that predicts breast cancer risk based on detailed family history information. A major limitation of this model is that it does not consider non-genetic risk factors. To address this limitation, we expand BRCAPRO by combining it with another popular existing model, BCRAT (i.e., Gail), which uses a largely complementary set of risk factors, most of them non-genetic. We consider two approaches for combining BRCAPRO and BCRAT: (1) modifying the penetrance (age-specific probability of developing cancer given genotype) functions in BRCAPRO using relative hazard estimates from BCRAT, and (2) training an ensemble model that takes BRCAPRO and BCRAT predictions as input. Using both simulated data and data from Newton-Wellesley Hospital and the Cancer Genetics Network, we show that the combination models are able to achieve performance gains over both BRCAPRO and BCRAT. In the Cancer Genetics Network cohort, we show that the proposed BRCAPRO + BCRAT penetrance modification model performs comparably to IBIS, an existing model that combines detailed family history with non-genetic risk factors.

MDPI

STATGEN 2024 talk
Polygenic risk score analysis for multiethnic populations
Chris Amos

Polygenic Risk Scores (PRS)
* Inform re biological processes
* Identify some at higher risk
* Might motivate behavioral change

PRS could inform when to start screening.

"measles plot instead of a manhattan plot" - has excessive false positives all over the genome.

Lung cancer risk snp also is related to response to smoking cessation

1/

#STATGEN2024 #Polygenic #StatisticalGenetics #Genetics #PRS

STATGEN 2024 talk
Bayesian Meta-Analysis of Penetrance for Cancer Risk with Adjustment for Ascertainment Bias
Swati Biswas

Need accurate estimates of age-specific penetrance for cancer risk variants.
https://arxiv.org/abs/2304.01912

Heterogeneous studies w/ different measures of risk
Marabelli et al. Penetrance of ATM Gene Mutations in Breast Cancer: A Meta-Analysis of Different Measures of Risk. Genet Epidemiol. 2016 doi: 10.1002/gepi.21971

1/

#STATGEN2024 #Genetics #StatisticalGenetics #Pentrance

Bayesian Meta-Analysis of Penetrance for Cancer Risk

Multi-gene panel testing allows many cancer susceptibility genes to be tested quickly at a lower cost making such testing accessible to a broader population. Thus, more patients carrying pathogenic germline mutations in various cancer-susceptibility genes are being identified. This creates a great opportunity, as well as an urgent need, to counsel these patients about appropriate risk reducing management strategies. Counseling hinges on accurate estimates of age-specific risks of developing various cancers associated with mutations in a specific gene, i.e., penetrance estimation. We propose a meta-analysis approach based on a Bayesian hierarchical random-effects model to obtain penetrance estimates by integrating studies reporting different types of risk measures (e.g., penetrance, relative risk, odds ratio) while accounting for the associated uncertainties. After estimating posterior distributions of the parameters via a Markov chain Monte Carlo algorithm, we estimate penetrance and credible intervals. We investigate the proposed method and compare with an existing approach via simulations based on studies reporting risks for two moderate-risk breast cancer susceptibility genes, ATM and PALB2. Our proposed method is far superior in terms of coverage probability of credible intervals and mean square error of estimates. Finally, we apply our method to estimate the penetrance of breast cancer among carriers of pathogenic mutations in the ATM gene.

arXiv.org

STATGEN 2024 talk
Improving Genetic Risk Prediction with Genetic Architecture and Functional Annotations
Wei Jiang

Genome-wide Empirical Bayes to use both genetic architecture and functional annotations in a computationally efficient way.

* Summary-statistics-based
* No parameter tuning needed
* Has improved prediction accuracy over existing methods.

https://www.researchsquare.com/article/rs-3266942/v1

#STATGEN2024 #Genetics #StatisticalGenetics

Estimating genetic architecture and integrating functional annotations improve polygenic risk scores derived from GWAS summary statistics

Constructing accurate polygenic risk scores (PRS) can benefit the prevention and early treatment of complex diseases. This can be accomplished by leveraging different information characterizing the effects of genetic variants on the diseases, and incorporating functional annotations. In this ...