Is more data always better? Or should we select less data of higher quality?
🚨 🧬 #bioinformatics #computationalbiology #statistics #preprint 🧬🚨
(repost trying to improve my #hashtag game)
We explored this question in the context of fitness prediction of #proteins 🧬 #mutations from MSA #sequencedata, finding a scaling law that relates the performance of statistical models to two simple data descriptors: