๐Ÿงช Enhancing Skin Lesion Classification with Conformal Prediction and Vision Transformers
A recent study titled "Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers" introduces CE-ViTs, a novel framework that combines Vision Transformers (ViTs) with conformal prediction to improve skin lesion classification, particularly under domain shifts.
๐Ÿ” The Role of Conformal Prediction
Conformal prediction is central to CE-ViTs, providing a method to quantify uncertainty in model predictions.
By generating prediction sets that are statistically guaranteed to contain the true label with a specified probability, conformal prediction enhances the trustworthiness of the model's outputs.
In CE-ViTs, conformal prediction is applied to an ensemble of ViT models trained on diverse datasets (HAM10000, Dermofit, and ISIC). This ensemble approach, calibrated through conformal prediction, ensures that the model maintains high coverage rates even when faced with domain shifts.
๐Ÿ“Š Key Findings
Improved Coverage: CE-ViTs achieved a coverage rate of 90.38%, representing a 9.95% improvement over models trained solely on the HAM10000 dataset.
Robustness to Domain Shifts: The ensemble approach, combined with conformal prediction, enhances the model's ability to generalize across different datasets, addressing challenges posed by domain shifts.
Uncertainty Quantification: By providing prediction sets rather than single-point predictions, CE-ViTs offer a more informative and reliable output, crucial for medical decision-making.
arxiv.org
๐Ÿ“„ Access the Full Paper: https://arxiv.org/abs/2505.15997
Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.

arXiv.org