Happy to spread the word about our new paper in BioData Mining, mostly based on the work of my former student Jakub Horváth. We trained 3 different ML models to detect/classify long terminal repeats (LTR) from plant #retrotransposons, then looked at features that influenced their predictive ability the most.

https://rdcu.be/d36gT

TATA box related k-mers, 4-6 bp at sequence borders, and transcription factor binding sites were among signals the models leaned upon.
#Transposon #bioinformatics

Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning