May's here and so are new #MLJ online-first papers: "Faster Riemannian Newton-type optimization by subsampling and cubic regularization" by Yian Deng & Tingting Mu (https://link.springer.com/article/10.1007/s10994-023-06321-0) (OA)
Faster Riemannian Newton-type optimization by subsampling and cubic regularization - Machine Learning

This work is on constrained large-scale non-convex optimization where the constraint set implies a manifold structure. Solving such problems is important in a multitude of fundamental machine learning tasks. Recent advances on Riemannian optimization have enabled the convenient recovery of solutions by adapting unconstrained optimization algorithms over manifolds. However, it remains challenging to scale up and meanwhile maintain stable convergence rates and handle saddle points. We propose a new second-order Riemannian optimization algorithm, aiming at improving convergence rate and reducing computational cost. It enhances the Riemannian trust-region algorithm that explores curvature information to escape saddle points through a mixture of subsampling and cubic regularization techniques. We conduct rigorous analysis to study the convergence behavior of the proposed algorithm. We also perform extensive experiments to evaluate it based on two general machine learning tasks using multiple datasets. The proposed algorithm exhibits improved computational speed, e.g., a speed improvement from $$12\% \:\text {to} \:227\%$$ 12 % to 227 % , and improved convergence behavior, e.g., an iteration number reduction from $$\mathcal{O}\left(\max\left(\epsilon_g^{-2}\epsilon_H^{-1},\epsilon_H^{-3}\right)\right) \,\text {to}\: \mathcal{O}\left(\max\left(\epsilon_g^{-2},\epsilon_H^{-3}\right)\right)$$ O max ϵ g - 2 ϵ H - 1 , ϵ H - 3 to O max ϵ g - 2 , ϵ H - 3 , compared to a large set of state-of-the-art Riemannian optimization algorithms.

SpringerLink
An #MLJ online-first #NewPaper on a new data set: "ROAD-R: the autonomous driving dataset with logical requirements" by Eleonora Giunchiglia, Mihaela Cătălina Stoian, Salman Khan, Fabio Cuzzolin & Thomas Lukasiewicz (https://link.springer.com/article/10.1007/s10994-023-06322-z)
ROAD-R: the autonomous driving dataset with logical requirements - Machine Learning

Neural networks have proven to be very powerful at computer vision tasks. However, they often exhibit unexpected behaviors, acting against background knowledge about the problem at hand. This calls for models (i) able to learn from requirements expressing such background knowledge, and (ii) guaranteed to be compliant with the requirements themselves. Unfortunately, the development of such models is hampered by the lack of real-world datasets equipped with formally specified requirements. In this paper, we introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints. Given ROAD-R, we show that current state-of-the-art models often violate its logical constraints, and that it is possible to exploit them to create models that (i) have a better performance, and (ii) are guaranteed to be compliant with the requirements themselves.

SpringerLink
#MLJ online-first #NewPaper: "Data driven discovery of systems of ordinary differential equations using nonconvex multitask learning" by Clément Lejeune, Josiane Mothe, Adil Soubki & Olivier Teste (https://rdcu.be/daInh)
Data driven discovery of systems of ordinary differential equations using nonconvex multitask learning

Two #MLJ online-first #NewPaper|s on understanding models for image classification today: "Understanding CNN fragility when learning with imbalanced data" by Damien Dablain, Kristen N. Jacobson, Colin Bellinger, Mark Roberts & Nitesh V. Chawla (https://link.springer.com/article/10.1007/s10994-023-06326-9) (OA)
Understanding CNN fragility when learning with imbalanced data - Machine Learning

Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes and their decisions are difficult to interpret. These problems are related because the method by which CNNs generalize to minority classes, which requires improvement, is wrapped in a black-box. To demystify CNN decisions on imbalanced data, we focus on their latent features. Although CNNs embed the pattern knowledge learned from a training set in model parameters, the effect of this knowledge is contained in feature and classification embeddings (FE and CE). These embeddings can be extracted from a trained model and their global, class properties (e.g., frequency, magnitude and identity) can be analyzed. We find that important information regarding the ability of a neural network to generalize to minority classes resides in the class top-K CE and FE. We show that a CNN learns a limited number of class top-K CE per category, and that their magnitudes vary based on whether the same class is balanced or imbalanced. We hypothesize that latent class diversity is as important as the number of class examples, which has important implications for re-sampling and cost-sensitive methods. These methods generally focus on rebalancing model weights, class numbers and margins; instead of diversifying class latent features. We also demonstrate that a CNN has difficulty generalizing to test data if the magnitude of its top-K latent features do not match the training set. We use three popular image datasets and two cost-sensitive algorithms commonly employed in imbalanced learning for our experiments.

SpringerLink
Good Friday #MLJ online-first paper: "An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization" by Ziyi Chen, Bhavya Kailkhura & Yi Zhou (https://rdcu.be/c9tDx)
An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization

New #MLJ online-first paper: "Robust matrix estimations meet Frank–Wolfe algorithm" by Naimin Jing, Ethan X. Fang & Cheng Yong Tang (https://rdcu.be/c9mY5)
Robust matrix estimations meet Frank–Wolfe algorithm

Another #MLJ online-first #NewPaper dropped yesterday: "Domain adversarial neural networks for domain generalization: when it works and how to improve" by Anthony Sicilia, Xingchen Zhao & Seong Jae Hwang (https://link.springer.com/article/10.1007/s10994-023-06324-x) (OA)
Domain adversarial neural networks for domain generalization: when it works and how to improve - Machine Learning

Theoretically, domain adaptation is a well-researched problem. Further, this theory has been well-used in practice. In particular, we note the bound on target error given by Ben-David et al. (Mach Learn 79(1–2):151–175, 2010) and the well-known domain-aligning algorithm based on this work using Domain Adversarial Neural Networks (DANN) presented by Ganin and Lempitsky (in International conference on machine learning, pp 1180–1189). Recently, multiple variants of DANN have been proposed for the related problem of domain generalization, but without much discussion of the original motivating bound. In this paper, we investigate the validity of DANN in domain generalization from this perspective. We investigate conditions under which application of DANN makes sense and further consider DANN as a dynamic process during training. Our investigation suggests that the application of DANN to domain generalization may not be as straightforward as it seems. To address this, we design an algorithmic extension to DANN in the domain generalization case. Our experimentation validates both theory and algorithm.

SpringerLink
We've a new #MLJ online-first paper: "Imbalanced gradients: a subtle cause of overestimated adversarial robustness" by Xingjun Ma, Linxi Jiang, Hanxun Huang, Zejia Weng, James Bailey & Yu-Gang Jiang (https://rdcu.be/c8P3h)
Imbalanced gradients: a subtle cause of overestimated adversarial robustness

You thought we were done with #MLJ online-first #NewPaper|s this week? Well, we're not: "PreCoF: counterfactual explanations for fairness" by Sofie Goethals, David Martens & Toon Calders (https://rdcu.be/c8Fhc)
PreCoF: counterfactual explanations for fairness

Do you like #ML papers? Because we at #MLJ have some for you: "Generalizing universal adversarial perturbations for deep neural networks" by Yanghao Zhang, Wenjie Ruan, Fu Wang & Xiaowei Huang (https://rdcu.be/c8A0j)
Generalizing universal adversarial perturbations for deep neural networks