At #eccb2024, Alan Bridge, head of Swiss-Prot group at @SIB, presented a highlight talk on a benchmark set for curating enzyme reactions using LLMs. Also made the point that many AI initiatives are overlooking needed investments in benchmark sets. https://www.nature.com/articles/s41597-024-03835-7
EnzChemRED, a rich enzyme chemistry relation extraction dataset - Scientific Data

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea.

Nature
"Ultimately, the real problem is researchers"
#eccb2024 out of context
Daniele Raimondi from our lab gave a highlight talk on the ballance of dataset size and model complexity in genome interpretation on #ECCB2024. You can read the original work at https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-03064-y
Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease - Genome Biology

Background Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ( $$p \gg n$$ p ≫ n ). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects. Results We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case–control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis. Conclusions In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.

BioMed Central

A quote worthy of bravely writing on a grant proposal: "I don't need to know what I'm doing. I just need to know what I've done."

(Jussi Taipale, #eccb2024)

Mastodonians, hit me up if you're at #eccb2024 in Turku

#bioinformatics @bioinformatics

🚀 Next week, several members of the Beacon team at the @EGAarchive will be attending the GA4GH 12th Plenary and the ECCB 2024

Be sure to check out the agendas:
🔗 #GA4GHPlenary: https://broadinstitute.swoogo.com/ga4gh-12th-plenary/4273671
🔗 #ECCB2024: https://eccb2024.fi/

Hope to see you there 👋!

GA4GH 12th Plenary

Who here is coming to #ECCB2024 in Turku?

#bioinformatics @bioinformatics

Wie wird es Europas alten #Wäldern in Zukunft ergehen? Diese Frage stellten sich Wissenschaftler, Naturschützer und Politiker auf dem 7. Europäischen Kongress für Naturschutzbiologie #ECCB2024 Hier finden Sie ihre Erkenntnisse https://idw-online.de/de/news835555
Europas alte Wälder dauerhaft bewahren

How is it being an academic who tries not to fly for work #NoFly working in evolutionary biology and bioinformatics in central Europe this year?
#SMBE2024 @officialSMBE, #Evol2024 (which includes @eseb), and #ismb2024 in N. America.
#EuroEvoDevo and #ECCB2024 in Finland, which can be reached from Switzerland without flying, but it takes 2 days each way.
We can do better! We need conference planning to take into account travel, and avoid peripheral locations.

RT ECCB 2024 Bologna
Do you know what time is it?
It's submission time! The Call for symposia, workshops and training courses for #ECCB2024 is now open!
Submit yours now and don't miss your chance!
Find all information here https://eccb2024.eu/symposia-workshop-trainingcourse/

🐦🔗: https://n.respublicae.eu/ECCB_2024/status/1684869131886002176

Call for Sym/W/TC - ECCB 2024

Submission Guidelines Important Dates Submit your symposium, workshop or training course! Submissions are open! You can find more information regarding the important dates below. SUBMIT NOW Please note: registration to the submission platform is not

ECCB 2024 - 7th European Congress of Conservation Biology