Came across this brilliant #rstats package {groupdata2} for dividing data into groups.
The function I’m most appreciative of is collapse_groups() which allows you to divide already grouped data into another set of groups whilst retaining the original.
Super handy when working with kfold CV and you want to split your data but there are structural elements you don’t want shared across groups, to prevent inflation of predictive abilities.
https://github.com/ludvigolsen/groupdata2
#MachineLearning #GenomicSelection
GitHub - LudvigOlsen/groupdata2: R-package: Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.

R-package: Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing g...

GitHub

‘Since <trait> involves a myriad of factors, the traditional breeding strategy combined with MAS will be cumbersome. Therefore, exploiting genomic selection for improving <trait> will speed up the development of superior genotypes by combining high-throughput phenotyping and genotyping…’

We agree, I shall read on 👍 Although I would consider being a little 🤏 stronger, than the word ‘cumbersome’.

#GenomicSelection #PlantBreeding

Anyone else find that although calculating selection indexes 🧮 using the SmithHazel formula is good at maximising economic gain, it completely ignores any balance between traits which in an agricultural system 🌱🌾🐄 can be problematic? 🧬 🖥️ #PlantBreeding #GenomicSelection
1/🧵 Interested about how others are structuring their genomic pipelines post-alignment/variant calling for #GenomicPrediction #GenomicSelection #RiskScoring. Do you maintain individual sample VCFs extracting the genotype for GEBV calculation or merge multiple into a single VCF for analysis? Or perhaps extract genotype calls per sample, storing in a DB, and then pulling from the DB for routine downstream analysis? 🧬🖥️
🤔 Any potential issues in running kfolds in parallel for #bayesian GS models within the same R session, thoughts? I would think everything during the calculation would be contained to the respective k-fold process. 🧬💻 #GenomicSelection #rstats
Is anyone doing genomic selection in hops? 🌱 🧬 🍺 #Genomics #GenomicSelection #PlantBreeding
When you have finished preprocessing your massive genomic input files 🧬 and then submit them to the cloud ☁️ to explore large genomic prediction models 🧮 #Genomics #GenomicSelection #AWS
🧬🔍Before investing in creating one, does anyone know of a great illustration depicting the limited resolution when tracking causal mutations using linkage mapping compared to large, diverse association mapping panels or genomic selection?🤔#GenomicSelection #ScienceIllustration
🔮 Wondering about visionary papers that leapfrogged their time, like Meuwissen et al. 2001 #GenomicSelection proposal before affordable genomic seq! 🧬Thoughts on the next concept for breeding that has been waiting for the rapid developments in genomics we are currently seeing?

Interested in #GenomicPrediction or #GenomicSelection?

👀Have a look at our course with @OscarGenomics & Evangelina Lopez de Maturana in February.

🔗https://physalia-courses.org/courses-workshops/course49b/

#Genomics #Bioinformatics #Rstats #GenomicPrediction
#GWAS

GENOME-WIDE PREDICTION OF COMPLEX TRAITS

Dates 24-28 March 2025 To foster international participation, this course will be held online

physalia-courses