New research shows human‑aligned AI models like AligNet, built on Vision Transformers and SigLIP, outperform standard models on robustness tests with the THINGS and Levels datasets. Lukas Muttenthaler’s team demonstrates higher reliability across varied inputs—promising safer AI deployments. Dive into the findings! #AI #AligNet #VisionTransformers #SigLIP

🔗 https://aidailypost.com/news/human-aligned-ai-models-show-greater-robustness-reliability-study

🤖 Presented at #SPIE2025 in the AI for Security & Defence Applications track our paper "Vision Transformers: the threat of realistic adversarial patches", revealing vulnerabilities in modern AI classification systems.
Vision Transformers show improved robustness over CNNs. However, they remain vulnerable to adversarial patches, with attack success rates ranging from 40% to 99%.

#DFR_ARC #AI #AdversarialAttacks #VisionTransformers #DefenseTech #SPIE #AISecurity

كلية أيلول الجامعية : نشر بحثًا علميًا متميزًا باسم الكلية في مجلة علمية سويسرية مرموقة ومصنفة ضمن فئة Q2. بعنوان البحث:Historical Manuscripts Analysis: A Deep Learning System for Writer Identification Using Intelligent Feature Selection with Vision Transformers

🌟 إنجاز علمي جديد لكلية أيلول الجامعية – يريم 🌟

إنطلاقًا من رؤية ورسالة وأهداف الكلية المتعلقة في دعم وتشجيع البحث العلمي.

نُهنئ وبكل فخر
الأستاذ الدكتور محمد الصارم
عضو الهيئة الاستشارية للكلية،
بمناسبة نشره بحثًا علميًا متميزًا باسم الكلية في مجلة علمية سويسرية مرموقة ومصنفة ضمن فئة Q2.

🔬 عنوان البحث:
Historical Manuscripts Analysis: A Deep Learning System for Writer Identification Using Intelligent Feature Selection with Vision Transformers

📚 تفاصيل النشر:
✔ المجلة: J. Imaging 2025, 11, 204
✔ التصنيف: Q2
✔ مفهرسة في: Scopus و Web of Science
✔ معامل التأثير: 3.3
✔ دار النشر: MDPI

📎 للاطلاع على البحث:
🔗 اضغط هنا

https://www.mdpi.com/2313-433X/11/6/204

كلية أيلول الجامعية حيث الطموح وجودة التعليم يلتقيان .

📍 العنوان: يريم – الدائري الغربي – أمام مستشفى يريم العام
📞 للتواصل: 779944553

كلية أيلول الجامعية #إنجاز علمي #MDPI #JImaging #Yareem #VisionTransformers #بحثعلمي

#يمنأكاديميك #انشطةالجامعاتالاهلية #بحوثعلمية #JImaging #MDPI #VisionTransformers #Yareem

Oh no, our beloved Vision Transformers have misplaced their registers! 🤦‍♂️ Like a toddler who can't find their sippy cup, these AI models are now wandering aimlessly in the land of mathematical jargon. 📚 Who knew that extra tokens were like the spare screws left over after assembling IKEA furniture? 🛠️
https://arxiv.org/abs/2309.16588 #VisionTransformers #AIModels #TechHumor #MachineLearning #DataScience #HackerNews #ngated
Vision Transformers Need Registers

Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

arXiv.org

Vision Transformers Need Registers (extra tokens discarded after last layer)

https://arxiv.org/abs/2309.16588

#HackerNews #VisionTransformers #Registers #Tokens #AIResearch #DeepLearning #arXiv

Vision Transformers Need Registers

Transformers have recently emerged as a powerful tool for learning visual representations. In this paper, we identify and characterize artifacts in feature maps of both supervised and self-supervised ViT networks. The artifacts correspond to high-norm tokens appearing during inference primarily in low-informative background areas of images, that are repurposed for internal computations. We propose a simple yet effective solution based on providing additional tokens to the input sequence of the Vision Transformer to fill that role. We show that this solution fixes that problem entirely for both supervised and self-supervised models, sets a new state of the art for self-supervised visual models on dense visual prediction tasks, enables object discovery methods with larger models, and most importantly leads to smoother feature maps and attention maps for downstream visual processing.

arXiv.org
Ah, behold! Another groundbreaking thesis on Vision Transformers, where we learn the three most essential things everyone already forgot by the time they finished reading the title. 🙄🔍 Sponsored by the Simons Foundation, because funding irrelevant #research is always in vogue. 💸✨
https://arxiv.org/abs/2203.09795 #VisionTransformers #SimonsFoundation #IrrelevantResearch #GroundbreakingThesis #HackerNews #ngated
Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis. We offer three insights based on simple and easy to implement variants of vision transformers. (1) The residual layers of vision transformers, which are usually processed sequentially, can to some extent be processed efficiently in parallel without noticeably affecting the accuracy. (2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks. This saves compute, reduces the peak memory consumption at fine-tuning time, and allows sharing the majority of weights across tasks. (3) Adding MLP-based patch pre-processing layers improves Bert-like self-supervised training based on patch masking. We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set. Transfer performance is measured across six smaller datasets.

arXiv.org
Three things everyone should know about Vision Transformers

After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis. We offer three insights based on simple and easy to implement variants of vision transformers. (1) The residual layers of vision transformers, which are usually processed sequentially, can to some extent be processed efficiently in parallel without noticeably affecting the accuracy. (2) Fine-tuning the weights of the attention layers is sufficient to adapt vision transformers to a higher resolution and to other classification tasks. This saves compute, reduces the peak memory consumption at fine-tuning time, and allows sharing the majority of weights across tasks. (3) Adding MLP-based patch pre-processing layers improves Bert-like self-supervised training based on patch masking. We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set. Transfer performance is measured across six smaller datasets.

arXiv.org
KI: Künstliche Intelligenz ahmt Handschriften originalgetreu nach

Ein Forschungsteam entwickelte ein Programm, basierend auf Künstlicher Intelligenz (KI), mit der Fähigkeit, Handschriften exakt zu imitieren.

Tarnkappe.info

Right now I am working on a Detection Transformer library and plant to include as may Transformer based object detection based models as possible.

The docs are a bit lacking now. Is anyone willing to give any ideas how to move forward with this library and what kind of models to include?

PR and contributions are very welcome.

https://github.com/sovit-123/detr-custom-training

#deeplearning #objectdetection #visiontransformers #detectiontransformer

GitHub - sovit-123/detr-custom-training: Training DETR (Detection Transformer) on custom object detection datasets.

Training DETR (Detection Transformer) on custom object detection datasets. - GitHub - sovit-123/detr-custom-training: Training DETR (Detection Transformer) on custom object detection datasets.

GitHub
Newly Developed Camera Can Take In-Focus Photos Without a Lens

Researchers think that machine learning could be the key to a lensless camera.

PetaPixel