Happy to spread the word about our new paper in BioData Mining, mostly based on the work of my former student Jakub Horváth. We trained 3 different ML models to detect/classify long terminal repeats (LTR) from plant #retrotransposons, then looked at features that influenced their predictive ability the most.
TATA box related k-mers, 4-6 bp at sequence borders, and transcription factor binding sites were among signals the models leaned upon.
#Transposon #bioinformatics