Emma Griffiths

49 Followers
53 Following
13 Posts
I make data specs and ontologies for public health/food safety genomics contextual data harmonization and integration. Crossing the streams without the world imploding.
Harmonizing and integrating metadata across labs and systems can be difficult. In our new publication we describe an open source One Health data standard to support collaborative genomic surveillance of MPXV (Mpox virus).
www.microbiologyresearch.org/content/jour...
@cidgoh and @pha4ge are developing ISO-based, modular, interoperable contextual data standards supporting global Mpox genomic surveillance and sharing. To learn more, find them here: One Health Mpox: https://github.com/cidgoh/MPox_Contextual_Data_Specification, Wastewater: https://github.com/pha4ge/Wastewater_Contextual_Data_Specification.
GitHub - cidgoh/MPox_Contextual_Data_Specification: A data specification for harmonizing MPox pathogen genomics contextual data. The specification provides standardized (ontology-based) fields and terms which are implemented via a the DataHarmonizer, supported by field and reference guides as well as different curation and new term request SOPs.

A data specification for harmonizing MPox pathogen genomics contextual data. The specification provides standardized (ontology-based) fields and terms which are implemented via a the DataHarmonizer...

GitHub
Integrating different types of data across sectors can be challenging. The Canadian One Health AMR genomics contextual data standard, built in partnership with the Can gov't, is readily adapted for international use and supports public data sharing. #GRDI https://osf.io/preprints/osf/xbf4t
OSF

📢 We're excited to introduce Pathoplexus, a specialized sequence database for human viral pathogens! 🌍 🧬
Launching with 4 viruses, http://Pathoplexus.org combines modern open-source software with transparent governance to improve global pathogen sequence sharing.

1/14

Pathoplexus | Home

Pathoplexus is a new, open-source database dedicated to the efficient sharing of human viral pathogen genomic data, fostering global collaboration and public health response.

Seq data of varying quality can be used for different public health applications (analysis, training, more). @pha4ge standardized QC tags help users share, triage, and identify a wider array of data in public repos. For more info, see: https://shorturl.at/1yaCw hot off the presses!
PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA’s GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.

microbiologyresearch.org
Jurassic park 🦖is 30 years old & the science 🧬 is still relevant. For a bit of fun
@happykhan
@Lskatz & I watched it & comment on all the #science in the movie!
🎙️ https://soundcloud.com/microbinfie/real-bioinformaticians-react-to-jurassic-park
#JurassicPark
118 Real bioinformaticians react to Jurassic Park

Andrew, Nabil, and Lee react to the bioinformatics and the science overall in the 1993 film Jurassic Park. We looked at these YouTube clips: * https://youtu.be/mDTaykXudVI?si=I5aiUdBGStpIKHVC * http

SoundCloud
Awesome (as always) Microbinfie podcast episodes on the Global Microbial Identifier meeting that happened last week in Vancouver, Canada. Well done @AndrewPage, @LeeKatz and guests!
1) Guests Finlay Maguire & Emma Griffiths
https://soundcloud.com/microbinfie/global-microbial-identifier-with-finlay-maguire-and-emma-griffiths
2) Guest Will Hsiao
https://soundcloud.com/microbinfie/112-global-microbial-identifier-conference-with-will-hsiao
3) Guest Ruth Timme
https://soundcloud.com/microbinfie/113-global-microbial-identifier-conference-with-ruth-timme
111 Global Microbial Identifier with Finlay Maguire and Emma Griffiths

Andrew and Lee are at the Global Microbial Identifier conference 13 in Vancouver Canada. On the first day they talked to Dr Finlay Maguire and Dr Emma Griffiths about microbial genomics and Tim Horton

SoundCloud

Registration for the 2023 Global Microbial Identifier meeting (GMI13) on microbial genomics data sharing is now open!

Program and conference details can be found at gmi13.org.

Cost: $150
Where: Vancouver, Canada,
Who is invited: Everyone!
Contact: [email protected]

Nakeema Stefflbauer's thread on the realities of #Sudan as told through a story of friendship is utterly heartbreaking - https://twitter.com/DocStefflbauer/status/1652136456868052994 -- and should be read alongside reporting from Yousra Elbagir - https://twitter.com/SkyNews/status/1652570357042216962- and Mat Nashed (e.g. https://twitter.com/matnashed/status/1652301760839417860)
Dr. Nakeema Stefflbauer on Twitter

“This 🧵 is about one of my oldest friends, who I've admired for literal decades, and who is Sudanese. I met her in Cairo, a million years ago when I was desperate to *not* live in Brooklyn and the Arab world felt like a revelation...”

Twitter
Do you wish that people would include more info on QC processes in their public repo submissions? Have you ever tried to find datasets with known QC issues for teaching/training purposes? @pha4ge has created high level QC attribute "tags" to help. https://www.preprints.org/manuscript/202303.0037/v1
PHA4GE Quality Control Contextual Data Tags: Standardized Annotations for Sharing Public Health Sequence Datasets with Known Quality Issues to Facilitate Testing and Training

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower-quality datasets be made available for analysis and comparison alongside those of higher-quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies, community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA’s GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.