TATAT: a containerized software for generating annotated coding transcriptomes from raw RNA-seq data https://www.biorxiv.org/content/10.1101/2025.07.09.663867v1?med=mas
TATAT: a containerized software for generating annotated coding transcriptomes from raw RNA-seq data

Motivation: Many transcriptome creation workflows are not standardized, are difficult to install or share, prone to breaking as dependencies update or cease to be maintained and are resource intensive. Due to a lack of authoritative literature, many also overlook potentially important steps, such as thinning contig over-assembly or identifying transcript consensus across samples, which reduce resource demands during annotation and increase the accuracy of final transcripts. Results: We developed TATAT, a modular, Dockerized software that contains all the tools necessary to generate an annotated coding transcriptome from raw RNA-seq data. The tools remain in a static state and can be coordinated with bash and python scripts provided therein, making TATAT a standardized, reproducible workflow that can easily be shared and installed. We preferentially incorporate tools that are not only accurate, but are fast and require less RAM, and subsequently show TATAT can generate a comprehensive transcriptome for a non-model organism, the Egyptian rousette bat (Rousettus aegyptiacus), in ~8 hours in a high-performance computing (HPC) environment. Availability and implementation: The TATAT code, instructions, and tutorial are available at https://github.com/viralemergence/tatat. ### Competing Interest Statement The authors have declared no competing interest. National Science Foundation, https://ror.org/021nxhr62, 2515340

bioRxiv