bioRxiv Genomics

@biorxiv_genomic@biologists.social
68 Followers
0 Following
3.3K Posts
Persistent Activation of Endothelial Cells is Linked to Thrombosis and Inflammation in Cerebral Cavernous Malformation Disease https://www.biorxiv.org/content/10.1101/2025.06.29.662238v1?med=mas
Persistent Activation of Endothelial Cells is Linked to Thrombosis and Inflammation in Cerebral Cavernous Malformation Disease

BACKGROUND: Cerebral cavernous malformations (CCM) are neurovascular lesions that affect both children and adults, and morbidity often results from thrombosis, bleeding, and neurological dysfunction. Studies indicate that inflammation-related activation of endothelial cells contributes significantly to the worsening of CCM disease. This suggests that ongoing vascular inflammation and endothelial dysfunction are key factors associated with thrombosis and bleeding in CCM disease. However, the inflammatory mechanisms leading to altered brain endothelial cell function with a high propensity for thrombosis, inflammation, and dysfunction are not fully understood. METHODS: Multi-omic analyses was conducted by performing simultaneous high-throughput single-nucleus RNA sequencing (snRNA-seq) and single-nucleus transposase-accessible chromatin sequencing (snATAC-seq) with the 10x Genomics multiome platform in combination with immunofluorescence to study CCM pathogenesis in both female and male mice with CCM (Slco1c1-CreERT2; Pdcd10fl/fl) disease. The analysis was complemented with bulk RNA-seq, bulk ATAC-seq, and ChIP-seq (Chromatin immunoprecipitation sequencing) using an in vitro human CCM model. An AAV-BR1 viral system selectively upregulates the activator protein-1 (AP-1) transcription factor JUNB in brain endothelial cells was used to evaluate its effectiveness in maintaining a persistent activated cell state during the pathogenesis of CCM. RESULTS: We found that epigenetics significantly influences the subtype identity and function of brain endothelial cells within the arteriovenous axis. Through multi-omic analyses, specific regulatory elements and enhancers (cis-Regulatory Elements, cCREs) in mouse brain endothelial cells were identified that influence subtype-specific transcriptional programs and the transcription factors responsible for establishing the various subtypes of brain endothelial cells. Additionally, large-scale epigenomic reprogramming of brain endothelial cell subtypes was observed during the pathogenesis of CCM disease. Among the most significant changes were alterations in the chromatin state of endothelial cells, along with transcriptional processes associated with a persistently activated endothelial cell state, which renders them susceptible to inflammation and thrombosis. The activator AP-1 transcription factor JUNB was identified as a key regulator of the persistently activated endothelial state during chronic neuroinflammation. Moreover, both trans- and cis-regulatory factors conserved between mice and humans were discovered and contribute to the progression of chronic CCM disease. CONCLUSIONS: Epigenetics plays a crucial role in determining the transcription patterns and functions of brain arteriovenous endothelial cells. The activator JUNB is identified as a driver of chronic brain vascular inflammation by inducing a persistent activated endothelial cell state from epigenome reprogramming. ### Competing Interest Statement The authors have declared no competing interest. National Institute of Neurological Disorder and Stroke, R01NS121070 National Institute of Health, National Heart, Lung, and Blood Institute, P01HL151433, R01HL163931

bioRxiv
Multi-season analysis reveals hundreds of drought-responsive genes in sorghum https://www.biorxiv.org/content/10.1101/2025.06.27.662006v1?med=mas
Multi-season analysis reveals hundreds of drought-responsive genes in sorghum

Persistent drought affects global crop production and is becoming more severe in many parts of the world in recent decades. Deciphering how plants respond to drought will facilitate the development of flexible mitigation strategies. Sorghum bicolor L. Moench (sorghum), a major cereal crop and an emerging bioenergy crop, exhibits remarkable resilience to drought. To better understand the molecular traits that underlie sorghum's remarkable drought tolerance, we undertook a large-scale sorghum gene expression profiling effort, totaling nearly 1,500 transcriptome profiles, across a 3-year field study with replicated plots in California's Central Valley. This study included time-resolved gene expression data from roots and leaves of two sorghum genotypes, BTx642 and RTx430, with different pre-flowering and post-flowering drought-tolerance adaptations under control and drought conditions. Quantification of genotype-specific drought tolerance effects was enabled by de novo sequencing, assembly, and annotation of both BTx642 and RTx430 genomes. These reference-quality genomes were used to construct a pan-gene set for characterizing conserved and genotype-specific expression. By integrating time-resolved transcriptomic responses to drought in the field across three consecutive years, we identified a set of drought-responsive genes that responded similarly in all three years of our field study. This expansive dataset represents a unique resource for sorghum and drought research communities and provides a methodological framework for the integration of multi-faceted time-resolved transcriptomic datasets. ### Competing Interest Statement The authors have declared no competing interest. United States Department of Energy, https://ror.org/01bj3aw27, DE-SC0014081

bioRxiv
Perplexity as a Metric for Isoform Diversity in the Human Transcriptome https://www.biorxiv.org/content/10.1101/2025.07.02.662769v1?med=mas
Perplexity as a Metric for Isoform Diversity in the Human Transcriptome

Long-read sequencing (LRS) has revealed a far greater diversity of RNA isoforms than earlier technologies, increasing the critical need to determine which, and how many, isoforms per gene are biologically meaningful. To define the space of relevant isoforms from LRS, many existing analysis pipelines rely on arbitrary expression cutoffs, but a single threshold cannot accommodate the broad variability in isoform complexity across genes, cell-types, and disease states captured by LRS. To address this, we propose using perplexity, an interpretable measure derived from entropy, that quantifies the effective number of isoforms per gene based on the full, unfiltered isoform ratio distribution. Calculating perplexity for 124 ENCODE4 PacBio LRS datasets spanning 55 human cell types, we show that it provides intuitive assessments of isoform diversity and captures uncertainty across genes with varying complexity. Perplexity can be calculated at multiple gene regulatory levels, from transcript to protein, to compare how isoform diversity is reduced across stages of gene expression. On average, genes have an ORF-level perplexity of 2.1, indicating production of two distinct protein isoforms. We extended this analysis to evaluate expression variation across tissues and identified 4,593 ORFs across 3,102 genes with moderate to extreme tissue-specificity. We propose perplexity as a consistent, quantitative metric for interpreting isoform diversity across genes, cell types, and disease states. All results are compiled into a community resource to enable cross-study comparisons of novel isoforms. ### Competing Interest Statement G.M.S. is on the scientific advisory board of Quantum-Si Incorporated and holds stock in Quantum-Si Incorporated.

bioRxiv
microRNA-206 is a reproducibly sensitive and specific plasma biomarker of amyotrophic lateral sclerosis https://www.biorxiv.org/content/10.1101/2025.06.27.662023v1?med=mas
microRNA-206 is a reproducibly sensitive and specific plasma biomarker of amyotrophic lateral sclerosis

Amyotrophic lateral sclerosis (ALS) is a devastating and fatal neurodegenerative disease with no current therapeutic to modify disease progression. Reliable biomarkers for ALS are essential for improving diagnosis and evaluating therapeutic efficacy. We combined small-RNA sequencing from a discovery cohort of ALS patients and healthy controls with sequencing data from a previously published ALS cohort to identify candidate biomarkers. Machine learning analysis identified hsa-miR-206 as a strong classifier of ALS status in both cohorts. This finding was validated in an independent ALS cohort using droplet digital PCR (ddPCR), confirming the biomarkers sensitivity and specificity in identifying ALS. Importantly, hsa-miR-206 also displayed high accuracy in differentiating ALS from Parkinsons disease. These results further validate hsa-miR-206 as a circulating small-RNA biomarker for ALS with potential utility in diagnosis and therapeutic monitoring. Further studies in larger, diverse cohorts will be needed to validate its clinical applicability. ### Competing Interest Statement The authors have declared no competing interest.

bioRxiv
An optimised computational approach for the identification of somatic structural variants in cancer https://www.biorxiv.org/content/10.1101/2025.07.01.662575v1?med=mas
An optimised computational approach for the identification of somatic structural variants in cancer

Structural variants play a critical role in tumorigenesis. At present, these events are most commonly identified using short-read whole-genome sequencing data, and a number of computational tools are available for this purpose. Consensus approaches have been used to improve precision, but may reduce sensitivity. The optimal number and combination of callers remains unclear, in part due to the lack of gold standard real-world datasets for validation. Here, we benchmark the performance of Delly, GRIDSS, LUMPY, Manta and SvABA, using a validation set of consensus calls from the Pan-Cancer Analysis of Whole Genomes Consortium. Manta showed the best standalone performance, identifying 88% of the validation set calls, and was included in all of the best-performing caller combinations. A consensus approach comprising Delly, GRIDSS, Manta and SvABA was selected as the optimum approach from those tested. We provide a NextFlow implementation of our optimised consensus approach as a resource for the cancer genomics community. ### Competing Interest Statement The authors have declared no competing interest. Pathological Society, https://ror.org/037tx1q23, JSPS CLG 2019 01, TSGS 0421 1297

bioRxiv
Integrative Transcriptomic and Machine Learning Approaches to decipher Mitochondrial Gene Regulation in severe Plasmodium vivax Malaria https://www.biorxiv.org/content/10.1101/2025.07.01.662590v1?med=mas
Integrative Transcriptomic and Machine Learning Approaches to decipher Mitochondrial Gene Regulation in severe Plasmodium vivax Malaria

Mitochondria in Plasmodium vivax are functionally vital despite possessing a highly reduced genome and differing substantially from the human organelle. Beyond their classical role in energy production, they dynamically coordinate processes like pyrimidine biosynthesis and heme metabolism, adapting their functions across the intra-erythrocytic development cycle (IDC). Their unique architecture and stage-specific roles enable the parasite to fine-tune mitochondrial gene expression, which operationally includes both sense and Natural Antisense Transcripts (NATs) - a class of long non-coding RNAs. The study involves the analysis of transcriptomic data to identify significant differentially expressed genes, in both sense and NATs categories, associated with severe malaria manifestations. This emphasizes the critical role of mitochondrial gene regulation in disease severity. These genes were statistically ranked and then used as input features for machine learning analysis for verification. Machine learning acted as a hypothesis-testing framework, enabling refinement of gene lists and strengthening biological interpretations. Further, a comprehensive gene enrichment analysis was performed for both sense and NATs to investigate the mitochondrial or other cellular pathways impacted during severe malaria. The findings revealed that NATs have a striking association with mitochondrial pathways and translation machinery, indicating that NATs are not merely by-products of transcription but also play a regulatory role in fine-tuning mitochondrial gene expression with severe manifestation. This work highlights mitochondrial NATs as critical regulators of parasite biology and positions the Plasmodium mitochondrion as a promising target for antimalarial drug development and therapeutic intervention. ### Competing Interest Statement The authors have declared no competing interest. Indian Council of Medical Research, https://ror.org/0492wrx28, PID: 2019-1121

bioRxiv
Haplotype-Resolved DNA Methylation at the APOE Locus identifies Allele-Specific Epigenetic Signatures Relevant to Alzheimer's Disease Risk https://www.biorxiv.org/content/10.1101/2025.07.01.662592v1?med=mas
Haplotype-Resolved DNA Methylation at the APOE Locus identifies Allele-Specific Epigenetic Signatures Relevant to Alzheimer's Disease Risk

The APOE gene encodes a key lipid transport protein and plays a central role in Alzheimer's disease (AD) pathogenesis. Three common APOE alleles, ϵ2 (rs7412(C>T), ϵ3 (reference), and ϵ4 (rs429358(T>C)), arise from two coding variants in exon 4 and confer distinct AD risk profiles, with ϵ4 increasing risk and ϵ2 providing protection. The ϵ3-linked APOE variant rs769455[T] has also been associated with elevated AD risk in individuals of African ancestry carrying both rs769455[T] and ϵ4 alleles. These single nucleotide variants (SNVs) reside in a cytosine-phosphate-guanine (CpG) island, which is a region with a higher frequency of CpG sites compared to the rest of the genome. CpG sites are subject to 5-methylcytosine (5mC) methylation by DNA methyltransferases which add a methyl group to the fifth carbon on the cytosine residue of a CpG site. The presence of SNVs can disrupt this process, making these regions prime targets for differential methylation; however, allele-specific methylation patterns in APOE remain poorly resolved due to technical limitations of conventional bisulfite and methylation array based methods, including degraded DNA quality, sparse CpG coverage, and lack of haplotype phasing. Here, we leverage high-accuracy long-read sequencing data to generate haplotype-resolved methylation profiles of the APOE locus in 332 postmortem brain samples from two ancestrally different cohorts. This includes 201 individuals of European ancestry from the North American Brain Expression Consortium (NABEC), comprising 402 haplotypes (48 ϵ2 and 58 ϵ4 alleles), and 131 individuals of African and African admixed ancestry from the Human Brain Core Collection (HBCC), comprising 262 haplotypes (25 ϵ2, 64 ϵ4, and 7 rs769455 alleles). A linear regression analysis identified 18 novel differentially methylated CpG sites (DMCs) associated with APOE ϵ2, ϵ4, and rs769455 within a gene cluster spanning TOMM40, APOE, APOC1, and APOC4-APOC2. This represents the most comprehensive haplotype-resolved methylation study of APOE in human brain tissue to date. Our results uncover distinct allele-specific methylation signatures and demonstrate the power of long-read sequencing for resolving epigenetic variation relevant to AD risk. ### Competing Interest Statement Some authors' participation in this project was part of a competitive contract awarded to DataTecnica LLC by the National Institutes of Health to support open science research. M.A.N. owns stock in Character Bio Inc. and Neuron23 Inc.

bioRxiv
Whole-genome sequencing in Galicia reveals male-biased pre-Islamic North African ancestry, subtle population structure, and micro-geographic patterns of disease risk https://www.biorxiv.org/content/10.1101/2025.06.27.662083v1?med=mas
Increased rate of de novo single nucleotide mutation in house mice born through assisted reproduction https://www.biorxiv.org/content/10.1101/2025.06.27.662069v1?med=mas
Super-silencers are crucial for development and carcinogenesis in B cells. https://www.biorxiv.org/content/10.1101/2025.06.27.662063v1?med=mas