StringTie3 improves total RNA-seq assembly by resolving nascent and mature transcripts
StringTie3 improves total RNA-seq assembly by resolving nascent and mature transcripts | Nature Methods
Accurate assembly of rRNA-depleted (total) RNA sequencing (RNA-seq) remains challenging because existing methods often conflate incomplete, nascent RNA with fully processed mature isoforms, leading to misassemblies and quantification errors. Here, we present StringTie3, a major update to the widely used StringTie assembler, specifically designed for total RNA-seq. StringTie3 introduces a nascent mode that models co-transcriptional splicing to separate nascent from mature transcripts, and a refined long-read module that distinguishes genuine polyadenylation sites from poly(A)-priming artifacts. Across short-, long- and hybrid-read datasets, StringTie3 substantially reduces assembly errors and outperforms existing tools. In Argonaute knockout experiments, nascent-mode analysis reveals that single knockouts predominantly alter nascent transcripts while leaving mature RNA largely unchanged, whereas double or triple knockouts disrupt both fractions. In breast cancer samples, certain extracellular matrix and tumor suppressor genes show discordant nascent and mature expression, suggesting posttranscriptional regulation. StringTie3 provides a framework for investigating transcriptional and posttranscriptional processes in total RNA-seq data. StringTie3 shows enhanced performance in total RNA-seq assembly and quantification by modeling both nascent and mature transcripts, across short-, long- and hybrid-read sequencing data.