Mastodawn

Jorge Miguel Silva Sep 25, 2025

PhenoQC: QC for phenotype tables in genomics. Plain-language summary → https://link.growkudos.com/1f7b2pmsh6o
DOI → https://doi.org/10.1016/j.imu.2025.101693
#genomics #bioinformatics #FAIRdata

PhenoQC: fast quality checks for clinical phenotype tables in genomic research

Phenotypic tables power genotype–phenotype studies. Errors, missing values, and inconsistent terms slow analysis and bias results. PhenoQC is a configuration-driven toolkit that brings three steps into one workflow: schema validation, ontology mapping, and missing-data imputation. It checks structure and types against a JSON schema, aligns phenotype text to standard ontologies (HPO, DO, MPO) with exact, synonym, and fuzzy matching, and fills gaps using baselines or KNN, MICE, and low-rank SVD. It audits imputation effects with standardized mean difference, variance ratio, Kolmogorov–Smirnov, population stability index, and Cramér’s V. It scales with chunk-based parallelism and runs via CLI or a web GUI. In tests, PhenoQC processed up to 100k records with near-linear scaling, reached ≈97–99% ontology-mapping accuracy under text noise, and on two UCI clinical datasets (CKD and Heart Disease) imputed all missing numeric cells and produced clean reports. The output is analysis-ready and reproducible.

Jorge Miguel Silva Jul 29, 2025

📢 Just published our new work on federated random forests for privacy-preserving machine learning!
📄 “A Federated Random Forest Solution for Secure Distributed Machine Learning”
📌 IEEE: https://doi.org/10.1109/CBMS65348.2025.00159

📂 Supplementary slides:
🔗 https://doi.org/10.5281/zenodo.16539345

We're advancing secure AI without sharing data. Feedback & collaborations welcome! 🚀
#FederatedLearning #PrivacyPreservingAI #MachineLearning #OpenScience #IEEE #DataScience #Zenodo #ResearchSoftware #Reproducibility