CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers

Zihan Liang, Ziwen Pan, Sumon Kanti Dey, Azra Ismail
https://arxiv.org/abs/2506.18185 https://arxiv.org/pdf/2506.18185 https://arxiv.org/html/2506.18185

arXiv:2506.18185v1 Announce Type: new
Abstract: This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance-securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT-4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations

toXiv_bot_toot

CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers

This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance-securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT-4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations

arXiv.org

ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents

Hoang-Thang Ta, Abu Bakar Siddiqur Rahman, Lotfollah Najjar, Alexander Gelbukh
https://arxiv.org/abs/2404.19714 https://arxiv.org/pdf/2404.19714

arXiv:2404.19714v1 Announce Type: new
Abstract: This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop, explicitly targeting the classification challenges within tweet data. Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety. Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children. We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets. We also presented some data augmentation methods to see their impact on the model performance. Finally, the systems obtained the best F1 score of 0.627 in Task 3 and the best F1 score of 0.841 in Task 5.

ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents

This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop, explicitly targeting the classification challenges within tweet data. Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety. Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children. We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets. We also presented some data augmentation methods to see their impact on the model performance. Finally, the systems obtained the best F1 score of 0.627 in Task 3 and the best F1 score of 0.841 in Task 5.

arXiv.org

This paper outlines the methods in our participation in the #SMM4H 2023 Shared Tasks, including data preprocessing, continual pre-training and fine-tuned optimization strategies. Especially for the Named Entity Recognition (NER) task, we utilize the model architecture named W2NER that effectively enhances the model generalization ability.

https://arxiv.org/pdf/2312.10652.pdf
#NLP #Healthcare

MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit

Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz
https://arXiv.org/abs/2312.09451 https://arXiv.org/pdf/2312.09451

MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit

This paper presents our system employed for the Social Media Mining for Health 2023 Shared Task 4: Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. We systematically investigate and contrast the efficacy of hybrid and ensemble models that harness specialized medical domain-adapted transformers in conjunction with BiLSTM neural networks. The evaluation results outline that our best performing model obtained 89.31% F1 on the validation set and 83.76% F1 on the test set.

arXiv.org

Graciela Gonzalez, opening the Social Media Mining for Health 2023 (#SMM4H) at #AMIA2023, showing data for how social media data offers a great representative sample of population demographics.

Also, General findings of the shared tasks on #socialmedia mining. Pre-trained transformer models ruling. #AI #MachineLearning #AMIA2023 #SMM4H

Will give an overview of our NIH-NLM-funded research on “Network Science to Model Health Vulnerabilities and Biases” at #SMM4H workshop at #AMIA2023, St. Charles Ballroom, Hilton Riverside, New Orleans. Saturday November 12, 10:30am

#networkscience #complexSystems
https://healthlanguageprocessing.org/smm4h-2023/

Social Media Mining for Health 2023 (#SMM4H)

WorkshopShared Task Workshop Program Past events WorkshopThe Social Media Mining for Health Applications (#SMM4H) Workshop serves as a venue for bringing together researchers interested in automati…

HLP @ Cedars-Sinai Computational Biomedicine
Looking forward to the AMIA Annual Symposium this Saturday in New Orleans, where I will give a keynote talk at the #SocialMediaMining for #Health Applications (#SMM4H) Workshop. #medicalinformatics #ComplexSystems #NetworkScience #machinelearning
https://healthlanguageprocessing.org/smm4h-2023/
Social Media Mining for Health 2023 (#SMM4H)

WorkshopShared Task Workshop Program Past events WorkshopThe Social Media Mining for Health Applications (#SMM4H) Workshop serves as a venue for bringing together researchers interested in automati…

HLP @ Cedars-Sinai Computational Biomedicine

tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for Detecting Tweets Self-reporting a COVID-19 Diagnosis

Anna Glazkova
https://arXiv.org/abs/2311.00732 https://arXiv.org/pdf/2311.00732

tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for Detecting Tweets Self-reporting a COVID-19 Diagnosis

The paper describes a system developed for Task 1 at SMM4H 2023. The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis (for example, a positive test, clinical diagnosis, or hospitalization) from those that do not. We investigate the use of different techniques for preprocessing tweets using four transformer-based models. The ensemble of fine-tuned language models obtained an F1-score of 84.5%, which is 4.1% higher than the average value.

arXiv.org

UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media

Yan Jiang, Ruihong Qiu, Yi Zhang, Zi Huang
https://arXiv.org/abs/2309.04213 https://arXiv.org/pdf/2309.04213

UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media

As social media becomes increasingly popular, more and more activities related to public health emerge. Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs). However, the costs of training in-domain LLMs for public health are especially expensive. Furthermore, such kinds of in-domain datasets from social media are generally imbalanced. To tackle these challenges, the data imbalance issue can be overcome by data augmentation and balanced training. Moreover, the ability of the LLMs can be effectively utilized by prompting the model properly. In this paper, a novel ALEX framework is proposed to improve the performance of public health analysis on social media by adopting an LLMs explanation mechanism. Results show that our ALEX model got the best performance among all submissions in both Task 2 and Task 4 with a high score in Task 1 in Social Media Mining for Health 2023 (SMM4H)[1]. Our code has been released at https:// github.com/YanJiangJerry/ALEX.

arXiv.org

DS4DH at #SMM4H 2023: Zero-Shot Adverse Drug Events Normalization using Sentence Transformers and Reciprocal-Rank Fusion

Anthony Yazdani, Hossein Rouhizadeh, David Vicente Alvarez, Douglas Teodoro
https://arXiv.org/abs/2308.12877 https://arXiv.org/pdf/2308.12877

DS4DH at #SMM4H 2023: Zero-Shot Adverse Drug Events Normalization using Sentence Transformers and Reciprocal-Rank Fusion

This paper outlines the performance evaluation of a system for adverse drug event normalization, developed by the Data Science for Digital Health (DS4DH) group for the Social Media Mining for Health Applications (SMM4H) 2023 shared task 5. Shared task 5 targeted the normalization of adverse drug event mentions in Twitter to standard concepts of the Medical Dictionary for Regulatory Activities terminology. Our system hinges on a two-stage approach: BERT fine-tuning for entity recognition, followed by zero-shot normalization using sentence transformers and reciprocal-rank fusion. The approach yielded a precision of 44.9%, recall of 40.5%, and an F1-score of 42.6%. It outperformed the median performance in shared task 5 by 10% and demonstrated the highest performance among all participants. These results substantiate the effectiveness of our approach and its potential application for adverse drug event normalization in the realm of social media text mining.

arXiv.org