WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang

tl;dr: augmentation-robust user signature to stable diffusion results
#kornia used for data augmentatio.

https://arxiv.org/abs/2306.04744

#computervision #deeplearning

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use.

arXiv.org

Baseline submission notebook with all the best
@kornia_foss

#kornia image matching models:
- DISK
- LoFTR
- KeyNet-AffNet-HardNet

Be careful: lots of aux code to import results into colmap database.
https://kaggle.com/code/eduardtrulls/imc-2023-submission-example
Clean local version:

https://github.com/ducha-aiki/imc2023-kornia-starter-pack

imc-2023-submission-example

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection

Anurag Ghosh, N. Dinesh Reddy, Christoph Mertz, Srinivasa G. Narasimhan

https://arxiv.org/abs/2303.14311

tl;dr vanishing points help egocentric object detection.
#kornia used for geometry warps & DLT.

#computervision #deeplearning

Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection

Real-time efficient perception is critical for autonomous navigation and city scale sensing. Orthogonal to architectural improvements, streaming perception approaches have exploited adaptive sampling improving real-time detection performance. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene (a ground plane and a plane above) to resample images for efficient object detection. This significantly improves small and far-away object detection performance while also being more efficient both in terms of latency and memory. For autonomous navigation, using the same detector and scale, our approach improves detection rate by +4.1 $AP_{S}$ or +39% and in real-time performance by +5.3 $sAP_{S}$ or +63% for small objects over state-of-the-art (SOTA). For fixed traffic cameras, our approach detects small objects at image scales other methods cannot. At the same scale, our approach improves detection of small objects by 195% (+12.5 $AP_{S}$) over naive-downsampling and 63% (+4.2 $AP_{S}$) over SOTA.

arXiv.org

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, Andre Araujo

tl;dr: SuperGlue meets monodepth for matching objects, not scenes.
#kornia used for LoFTR baseline.

https://arxiv.org/abs/2303.12779

#computervision #deeplearning

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.

arXiv.org

A Survey of Feature detection methods for localisation of plain sections of Axial Brain Magnetic Resonance Imaging

Jiří Martinů, Jan Novotný, Karel Adámek, Petr Čermák, Jiří Kozel, David Školoudík

tl;dr: #kornia HardNet works for matching brain images
https://arxiv.org/abs/2302.04173.pdf

A Survey of Feature detection methods for localisation of plain sections of Axial Brain Magnetic Resonance Imaging

Matching MRI brain images between patients or mapping patients' MRI slices to the simulated atlas of a brain is key to the automatic registration of MRI of a brain. The ability to match MRI images would also enable such applications as indexing and searching MRI images among multiple patients or selecting images from the region of interest. In this work, we have introduced robustness, accuracy and cumulative distance metrics and methodology that allows us to compare different techniques and approaches in matching brain MRI of different patients or matching MRI brain slice to a position in the brain atlas. To that end, we have used feature detection methods AGAST, AKAZE, BRISK, GFTT, HardNet, and ORB, which are established methods in image processing, and compared them on their resistance to image degradation and their ability to match the same brain MRI slice of different patients. We have demonstrated that some of these techniques can correctly match most of the brain MRI slices of different patients. When matching is performed with the atlas of the human brain, their performance is significantly lower. The best performing feature detection method was a combination of SIFT detector and HardNet descriptor that achieved 93% accuracy in matching images with other patients and only 52% accurately matched images when compared to atlas.

arXiv.org

Nailfold capillaroscopy and deep learning in diabetes

Reema Shah, Jeremy Petch, Walter Nelson, Karsten Roth, Michael D Noseworthy, Marzyeh Ghassemi, Hertzel C Gerstein

tl;dr: CNNs for diabetes detection from nailfold photos. #kornia used for data augmentation

https://onlinelibrary.wiley.com/doi/epdf/10.1111/1753-0407.13354

Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays

Rogier van der Sluijs, Nandita Bhaskhar, Daniel Rubin, Curtis Langlotz, Akshay Chaudhari

tl;dr: #kornia RandomResizedCrop+Contrast+Brightness are good for X-Ray representation learning

https://arxiv.org/abs/2301.12636

#computervision #deeplearning

Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays

Image augmentations are quintessential for effective visual representation learning across self-supervised learning techniques. While augmentation strategies for natural imaging have been studied extensively, medical images are vastly different from their natural counterparts. Thus, it is unknown whether common augmentation strategies employed in Siamese representation learning generalize to medical images and to what extent. To address this challenge, in this study, we systematically assess the effect of various augmentations on the quality and robustness of the learned representations. We train and evaluate Siamese Networks for abnormality detection on chest X-Rays across three large datasets (MIMIC-CXR, CheXpert and VinDR-CXR). We investigate the efficacy of the learned representations through experiments involving linear probing, fine-tuning, zero-shot transfer, and data efficiency. Finally, we identify a set of augmentations that yield robust representations that generalize well to both out-of-distribution data and diseases, while outperforming supervised baselines using just zero-shot transfer and linear probes by up to 20%. Our code is available at https://github.com/StanfordMIMI/siaug.

arXiv.org

Unsupervised Volumetric Animation

Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

tl;dr: video ->face keypoints ->PnP -> learning to animate faces

#kornia PnP reported not helpful.
https://arxiv.org/abs/2301.11326
#computervision #deeplearning

Unsupervised Volumetric Animation

We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns the underlying object geometry and parts decomposition in an entirely unsupervised manner. This allows it to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. We primarily evaluate the framework on two video datasets: VoxCeleb $256^2$ and TEDXPeople $256^2$. In addition, on the Cats $256^2$ image dataset, we show it even learns compelling 3D geometry from still images. Finally, we show our model can obtain animatable 3D objects from a single or few images. Code and visual results available on our project website, see https://snap-research.github.io/unsupervised-volumetric-animation .

arXiv.org

🚀#kornia v0.6.9!

📢 Revamp kornia.geometry including new primitives Hyperplane, ParametrizedLine/Ray, Quaternions, Vector3, Scalar and lie algebra. Big improve of CI, typing robustness and many more.

👉 https://github.com/kornia/kornia/releases/tag/v0.6.9

#computervision #opensource #ai #PyTorch #deeplearning

Release v0.6.9 Revamp kornia.geometry: Hyperplane, Ray, Quaternion, liegroup; restructure CI and typing robustness · kornia/kornia

What's Changed Quaternion pow bug fix (div by zero) by @cjpurackal in #1946 fix cuda init by @ducha-aiki in #1953 Bump accelerate from 0.13.1 to 0.13.2 by @dependabot in #1957 add kornia.testing a...

GitHub

Hi everyone.
We are kornia, differentiable low-to-high level computer vision library in #Pytorch with ~1M downloads/month.

Everything you can think of in OpenCV, is or will be in future in kornia, as well as new #deeplearning #augmentation things. Said that, we are not focused on models, but instead on functions & operators.

We promote papers, who use #kornia and open to discussions.

Docs: https://kornia.readthedocs.io/en/latest/
GH: https://github.com/kornia/kornia
Sponsor us: https://opencollective.com/kornia
#introduction

Kornia