| Homepage | https://samuelvaiter.com |
| Homepage | https://samuelvaiter.com |
Unfortunately, we also show that for some pathological cases, automatic differentiation of momentum methods might diverge, even in 1D.
The proofs are based on a Banach-Picard fixed-point theorem for set-valued functions, along with some previous results of Jerome and Edouard on conservative Jacobians.
PS: This is *not* a result on implicit differentiation, already covered in https://arxiv.org/abs/2106.04350
In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
This is not trivial because:
1. We consider *nonsmooth* derivation that requires concepts such as conservative Jacobians
2. Convergence of functions do not imply convergence of derivatives in general
3. It could be not observed in practice
But it also works in practice! For instance, differentiating forward-backward for the Lasso or Douglas-Rachford for Sparse inv covar select leads to a linear rate. In fact, one can prove that we have linear convergence as soon as we have enough regularity
What happens when you use #autodiff and let your nonsmooth iterative algorithm goes to convergence?
With J. Bolte & E. Pauwels, we show that under a contraction assumption, the derivatives of the algorithm converge linearly!
Preprint: https://arxiv.org/abs/2206.00457
I will present this work this week at #NEURIPS2022
Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.