71 Followers
4 Following
20 Posts

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang

tl;dr: augmentation-robust user signature to stable diffusion results
#kornia used for data augmentatio.

https://arxiv.org/abs/2306.04744

#computervision #deeplearning

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use.

arXiv.org

Baseline submission notebook with all the best
@kornia_foss

#kornia image matching models:
- DISK
- LoFTR
- KeyNet-AffNet-HardNet

Be careful: lots of aux code to import results into colmap database.
https://kaggle.com/code/eduardtrulls/imc-2023-submission-example
Clean local version:

https://github.com/ducha-aiki/imc2023-kornia-starter-pack

imc-2023-submission-example

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

Image Matching Challenge 2023 starts NOW!

Task: 3D reconstructions from 10-100 images
Entry Deadline: June 6, 2023.
Prize Money: $50,000

#IMC2023 #CVPR2023
#deeplearning #computervision

https://kaggle.com/competitions/image-matching-challenge-2023/overview

Image Matching Challenge 2023

Reconstruct 3D scenes from 2D images

🚀Kornia 0.6.11 is out!

🔥Amazing DISK local feature by Michał Tyszkiewicz

(used in #IMC2021-winning solutions)
🔥 Random MedianBlur, RandomSnow, RandomRain augs
✅many 🐞-fixes

📚Release notes:
https://github.com/kornia/kornia/releases/tag/v0.6.11

#computervision #machinelearning #ai #deeplearning #PyTorch #opensource

Release v0.6.11 DISK local features, new augmentations and bugfixes · kornia/kornia

Highlights In this release we have added DISK, which is the best free local feature for 3D reconstruction. (part of winning solutions in IMC2021 together with SuperGlue). Thanks to @jatentaki for t...

GitHub

Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection

Anurag Ghosh, N. Dinesh Reddy, Christoph Mertz, Srinivasa G. Narasimhan

https://arxiv.org/abs/2303.14311

tl;dr vanishing points help egocentric object detection.
#kornia used for geometry warps & DLT.

#computervision #deeplearning

Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection

Real-time efficient perception is critical for autonomous navigation and city scale sensing. Orthogonal to architectural improvements, streaming perception approaches have exploited adaptive sampling improving real-time detection performance. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the 3D scene (a ground plane and a plane above) to resample images for efficient object detection. This significantly improves small and far-away object detection performance while also being more efficient both in terms of latency and memory. For autonomous navigation, using the same detector and scale, our approach improves detection rate by +4.1 $AP_{S}$ or +39% and in real-time performance by +5.3 $sAP_{S}$ or +63% for small objects over state-of-the-art (SOTA). For fixed traffic cameras, our approach detects small objects at image scales other methods cannot. At the same scale, our approach improves detection of small objects by 195% (+12.5 $AP_{S}$) over naive-downsampling and 63% (+4.2 $AP_{S}$) over SOTA.

arXiv.org

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, Andre Araujo

tl;dr: SuperGlue meets monodepth for matching objects, not scenes.
#kornia used for LoFTR baseline.

https://arxiv.org/abs/2303.12779

#computervision #deeplearning

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.

arXiv.org

HEB: A Large Scale Homography Benchmark

Daniel Barath, Dmytro Mishkin Michal Polic Wolfgang Förstner Jiri Matas

tl;dr: COLMAP->plane detection->challenging homography dataset for H-RANSAC evaluation.

Classics: VSAC rules, OpenCV MAGSAC++ is good

Deep prefiltering (no-retraining):
OANet rules, AdaLAM rules.
PROSAC is VERY important speedwise, yet it is widely ignored in modern libraries.

@kornia_foss RANSAC is usable, but needs much more love.

https://arxiv.org/abs/2302.09997 #CVPR2023

A Large Scale Homography Benchmark

We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographies and includes roughly 4M correspondences. The homographies link images that often undergo significant viewpoint and illumination changes. As applications of HEB, we perform a rigorous evaluation of a wide range of robust estimators and deep learning-based correspondence filtering methods, establishing the current state-of-the-art in robust homography estimation. We also evaluate the uncertainty of the SIFT orientations and scales w.r.t. the ground truth coming from the underlying homographies and provide codes for comparing uncertainty of custom detectors. The dataset is available at \url{https://github.com/danini/homography-benchmark}.

arXiv.org

Interactive patch orientation estimation widget, developed by our student Ruslan Rozumnyi for Computer Vision Methods course.
You put the orientation function as an input as see how well it compensates rotation, compared to
@kornia_foss SIFT.

Course link: https://cw.fel.cvut.cz/wiki/courses/mpv/start

Visualization link: https://github.com/ducha-aiki/mpv-templates-backup/blob/master/assignment_0_3_correspondences_template/oriviz.py

#computervision #visualization #matplotlib

courses:mpv:start [CourseWare Wiki]

A Survey of Feature detection methods for localisation of plain sections of Axial Brain Magnetic Resonance Imaging

Jiří Martinů, Jan Novotný, Karel Adámek, Petr Čermák, Jiří Kozel, David Školoudík

tl;dr: #kornia HardNet works for matching brain images
https://arxiv.org/abs/2302.04173.pdf

A Survey of Feature detection methods for localisation of plain sections of Axial Brain Magnetic Resonance Imaging

Matching MRI brain images between patients or mapping patients' MRI slices to the simulated atlas of a brain is key to the automatic registration of MRI of a brain. The ability to match MRI images would also enable such applications as indexing and searching MRI images among multiple patients or selecting images from the region of interest. In this work, we have introduced robustness, accuracy and cumulative distance metrics and methodology that allows us to compare different techniques and approaches in matching brain MRI of different patients or matching MRI brain slice to a position in the brain atlas. To that end, we have used feature detection methods AGAST, AKAZE, BRISK, GFTT, HardNet, and ORB, which are established methods in image processing, and compared them on their resistance to image degradation and their ability to match the same brain MRI slice of different patients. We have demonstrated that some of these techniques can correctly match most of the brain MRI slices of different patients. When matching is performed with the atlas of the human brain, their performance is significantly lower. The best performing feature detection method was a combination of SIFT detector and HardNet descriptor that achieved 93% accuracy in matching images with other patients and only 52% accurately matched images when compared to atlas.

arXiv.org

Nerfstudio: A Modular Framework for Neural Radiance Field Development

Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, Angjoo Kanazawa

https://arxiv.org/abs/2302.04264

tl;dr: home to your NERFs. Pipeline, modular components, web-viewer

#computervision #opensource #ai #PyTorch #deeplearning

Nerfstudio: A Modular Framework for Neural Radiance Field Development

Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio.

arXiv.org