🎉New preprint out today! We present rastair - an ultra-fast SNP and methylation caller for TAPS or 5-Base data. Rastair takes less than 1h to process e.g. a 50x 5-Base dataset, yet SNP call accuracy is nearly identical to GATK on WGS data 🔥

https://www.biorxiv.org/content/10.64898/2026.03.19.712983v1

#biorxiv #epigenetics #bioinformatics #science #preprint

Rastair: an integrated variant and methylation caller

Cytosine methylation is a crucial epigenetic mark that impact tissue-specific chromatin conformation and gene expression. For many years, bisulfite sequencing (BS-seq), which converts all non-methylated cytosine (C) to thymine (T), remained the only approach to measure cytosine methylation at base resolution. Recently, however, several new methods that convert only methylated cytosines to thymine (mC→T) have become widely available. Here we present rastair, an integrated software toolkit for simultaneous SNP detection and methylation calling from mC→T sequencing data such as those created with Watchmaker's TAPS+ and Illumina's 5-Base chemistries. Rastair combines machine-learning-based variant detection with genotype-aware methylation estimation. Using NA12878 benchmark datasets, we show that rastair outperforms existing methylation-aware SNP callers and achieves F1 scores exceeding 0.99 for datasets above 30x depth, matching the accuracy of state-of-the-art tools run on whole-genome sequencing data. At the same time, rastair is significantly faster than other genetic variant callers, processing a 30x depth file takes less than 30 minutes given 32 CPU cores on an Intel Xeon, and half as long when a GPU is available. By integrating genotyping with methylation calling, rastair reports an additional 500,000 positions in NA12878 where a SNP turns a non-CpG reference position into a "de-novo" CpG. Vice-versa, rastair also identifies positions where a variant disrupts a CpG and corrects their reported methylation levels. Rastair produces standard-compliant outputs in vcf, bam and bed formats, facilitating integration into downstream analyses pipelines. Rastair is open-source and available via conda, Dockerhub, and as pre-compiled binaries from https://www.rastair.com. ### Competing Interest Statement Pascal Hertleif is a employee and owner of Softleif AB, a software development company. All other authors declare no competing financial interests. Ludwig Institute For Cancer Research

bioRxiv
Rastair comes with optional GPU acceleration which makes it ~ 2x faster than the CPU version, and about 10x faster than any WGS variant caller. Meanwhile, F1 scores are very close to top-of-the-line callers.
By calling variants first, rastair identifies nearly 1M additional CpG positions in an average human genome where a SNP turns a CpH into a CpG 🤯 AFAIK, no other caller currently does this cleanly, not even Illumina's DRAGEN, which reports the C and the G part of a "de-novo CpG" in 2 files 🙈

Rastair is freely available as source code (and compilation is easy 😉), via Docker, conda or as pre-compiled binaries. If you want to learn more, check the website:

https://www.rastair.com

Introduction - Rastair

Rastair is a command-line tool to process epigentic sequencing data. It can call genetic variants and DNA methylation simultaneously from Illuminaâ„¢ 5base and TAPS+.