Mastodawn

Sam Power Jul 19, 2023

https://arxiv.org/abs/2307.07619
'Stochastic dynamics and the Polchinski equation: an introduction'
- Roland Bauerschmidt, Thierry Bodineau, Benoit Dagallier

Proving that probability measures satisfy nice concentration properties via functional inequalities is easiest to accomplish when studying log-concave measures. However, for many applications, this class of measures is insufficient for handling interesting problems.For example, when studying the large-scale spin systems which arise in statistical physics, convexity is very much the exception, rather than the rule. This survey exposits some exciting new mathematical techniques which have been developed to address such applications, and details connections to various other related constructions in the literature on functional inequalities.

#sparxivdigest

Stochastic dynamics and the Polchinski equation: an introduction

This introduction surveys a renormalisation group perspective on log-Sobolev inequalities and related properties of stochastic dynamics. We also explain the relationship of this approach to related recent and less recent developments such as Eldan's stochastic localisation and the Föllmer process, the Boué--Dupuis variational formula and the Barashkov--Gubinelli approach, the transportation of measure perspective, and the classical analogues of these ideas for Hamilton--Jacobi equations which arise in mean-field limits.

arXiv.org

Sam Power Jul 17, 2023

https://arxiv.org/abs/2307.07131
'Parallelising Glauber dynamics'
- Holden Lee

Glauber dynamics (also 'Gibbs sampling') is an MCMC technique for sampling from certain classes of probability distribution, often those conforming to a suitable graphical structure. When the underlying graph is sparse, various parallelisation strategies have been proposed, with a goal of accelerating real-time convergence of the algorithm. This work demonstrates that for suitably well-behaved **dense** graphical models, it can still be possible to make effective use of parallelisation strategies.

#sparxivdigest

Parallelising Glauber dynamics

For distributions over discrete product spaces $\prod_{i=1}^n Ω_i'$, Glauber dynamics is a Markov chain that at each step, resamples a random coordinate conditioned on the other coordinates. We show that $k$-Glauber dynamics, which resamples a random subset of $k$ coordinates, mixes $k$ times faster in $χ^2$-divergence, and assuming approximate tensorization of entropy, mixes $k$ times faster in KL-divergence. We apply this to obtain parallel algorithms in two settings: (1) For the Ising model $μ_{J,h}(x)\propto \exp(\frac1 2\left\langle x,Jx \right\rangle + \langle h,x\rangle)$ with $\|J\|<1-c$ (the regime where fast mixing is known), we show that we can implement each step of $\widetilde Θ(n/\|J\|_F)$-Glauber dynamics efficiently with a parallel algorithm, resulting in a parallel algorithm with running time $\widetilde O(\|J\|_F) = \widetilde O(\sqrt n)$. (2) For the mixed $p$-spin model at high enough temperature, we show that with high probability we can implement each step of $\widetilde Θ(\sqrt n)$-Glauber dynamics efficiently and obtain running time $\widetilde O(\sqrt n)$.

arXiv.org

Sam Power Jul 17, 2023

https://arxiv.org/abs/2307.07086
'Value-Gradient Iteration with Quadratic Approximate Value Functions'
- Alan Yang, Stephen Boyd

Certain approaches to reinforcement learning and control focus on { learning, approximation, optimisation, ... } of the value function. A particularly simple approach is to model the value function as quadratic, as this enables a particularly simplified optimisation setup for both parameter tuning and practical actuation. This work studies a version of this approach based around fitting the gradient of the value function.

#sparxivdigest

Value-Gradient Iteration with Quadratic Approximate Value Functions

We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.

arXiv.org

Sam Power Jul 12, 2023

https://arxiv.org/abs/2307.04456
'Invex Programs: First Order Algorithms and Their Convergence'
- Adarsh Barik, Suvrit Sra, Jean Honorio

There is substantial recent interest in identifying classes of non-convex functions for which optimisation problems are nevertheless tractable in some sense. This work studies such a class ("invex" functions), proposes an algorithm which is adapted to solving optimisation problems from this class, and carries out a theoretical analysis of this algorithm. In particular, this algorithm is able to perform well in scenarios in which gradient descent may run into problems.

#sparxivdigest

Invex Programs: First Order Algorithms and Their Convergence

Invex programs are a special kind of non-convex problems which attain global minima at every stationary point. While classical first-order gradient descent methods can solve them, they converge very slowly. In this paper, we propose new first-order algorithms to solve the general class of invex problems. We identify sufficient conditions for convergence of our algorithms and provide rates of convergence. Furthermore, we go beyond unconstrained problems and provide a novel projected gradient method for constrained invex programs with convergence rate guarantees. We compare and contrast our results with existing first-order algorithms for a variety of unconstrained and constrained invex problems. To the best of our knowledge, our proposed algorithm is the first algorithm to solve constrained invex programs.

arXiv.org

Sam Power Jul 4, 2023

https://arxiv.org/abs/2306.14445
'Hybrid unadjusted Langevin methods for high-dimensional latent variable models'
- Ruben Loaiza-Maya, Didier Nibbering, Dan Zhu

Unadjusted MCMC methods have thus far been most widely adopted for "fully-observed" statistical models, with relatively less attention focused on models with latent structure. This work demonstrates a strategy for approximating "marginal" MCMC in latent variable models, using samples from the conditional law of the latent variables to approximate the drift of the ideal marginal sampler, showing some impressive numerical results.

#sparxivdigest

Hybrid unadjusted Langevin methods for high-dimensional latent variable models

The exact estimation of latent variable models with big data is known to be challenging. The latents have to be integrated out numerically, and the dimension of the latent variables increases with the sample size. This paper develops a novel approximate Bayesian method based on the Langevin diffusion process. The method employs the Fisher identity to integrate out the latent variables, which makes it accurate and computationally feasible when applied to big data. In contrast to other approximate estimation methods, it does not require the choice of a parametric distribution for the unknowns, which often leads to inaccuracies. In an empirical discrete choice example with a million observations, the proposed method accurately estimates the posterior choice probabilities using only 2% of the computation time of exact MCMC.

arXiv.org

Sam Power Jun 13, 2023

https://arxiv.org/abs/2306.07262
'Tight skew adjustment to the Laplace approximation in high dimensions'
- Anya Katsevich

Mode-centred Gaussian posterior approximations are a powerful tool for approximate Bayesian inference in the data-rich regime. Relative to approximations based on more global principles, these approximations can often degenerate somewhat in the presence of skewness and asymmetry. This work proposes a simple skewness correction to such approximations, and develops theory which shows that it can offer substantial improvements relative to the baseline approximation.

#sparxivdigest

The Laplace approximation accuracy in high dimensions: a refined analysis and new skew adjustment

In Bayesian inference, making deductions about a parameter of interest requires one to sample from or compute an integral against a posterior distribution. A popular method to make these computations cheaper in high-dimensional settings is to replace the posterior with its Laplace approximation (LA), a Gaussian distribution. In this work, we derive a leading order decomposition of the LA error, a powerful technique to analyze the accuracy of the approximation more precisely than was possible before. It allows us to derive the first ever skew correction to the LA which provably improves its accuracy by an order of magnitude in the high-dimensional regime. Our approach also enables us to prove both tighter upper bounds on the standard LA and the first ever lower bounds in high dimensions. In particular, we prove that $d^2\ll n$ is in general necessary for accuracy of the LA, where $d$ is dimension and $n$ is sample size. Finally, we apply our theory in two example models: a Dirichlet posterior arising from a multinomial observation, and logistic regression with Gaussian design. In the latter setting, we prove high probability bounds on the accuracy of the LA and skew-corrected LA in powers of $d/\sqrt n$ alone.

arXiv.org

Sam Power Jun 13, 2023

https://arxiv.org/abs/2306.06059
'Semiparametric posterior corrections'
- Andrew Yiu, Edwin Fong, Chris Holmes, Judith Rousseau

A peculiarity of Bayesian inference which can be surprising to learn is that even when a posterior distribution supplies high-quality estimates of a given quantity, the estimates of derived quantities may nevertheless be quite poor, e.g. estimating a density well, but estimating integrals against that density poorly. This work proposes some corrective strategies for processing posterior information towards obtaining improved estimators of relevant estimands of semiparametric type.

#sparxivdigest

Semiparametric posterior corrections

We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional nonparametric posterior, we target the functional of interest by transforming the entire distribution with a Bayesian bootstrap correction. We provide conditions for the resulting $\textit{one-step posterior}$ to possess calibrated frequentist properties and specialize the results for several canonical examples: the integrated squared density, the mean of a missing-at-random outcome, and the average causal treatment effect on the treated. The procedure is computationally attractive, requiring only a simple, efficient post-processing step that can be attached onto any arbitrary posterior sampling algorithm. Using the ACIC 2016 causal data analysis competition, we illustrate that our approach can outperform the existing state-of-the-art through the propagation of Bayesian uncertainty.

arXiv.org

Sam Power Jun 13, 2023

https://arxiv.org/abs/2306.05026
'An introduction to the analysis of gradients systems'
- Alexander Mielke

Gradient flows arise naturally in various contexts, particularly in applied problem-solving involving optimisation and sampling. These lecture notes give an in-depth mathematical treatment of the topic at a powerful level of generality.

#sparxivdigest

An introduction to the analysis of gradients systems

The present notes provide an extended version of a small lecture course given at the Humboldt Universität zu Berlin in the Winter Term 2022/23 (of 36 hours). The material starting in Section 5.4 was added afterwards. The aim of these notes to give an introductory overview on the analytical approaches for gradient-flow equations in Hilbert spaces, Banach spaces, and metric spaces and to show that on the first entry level these theories have a lot in common. The theories and their specific setups are illustrated by suitable examples and counterexamples.

arXiv.org

Sam Power Jun 8, 2023

https://arxiv.org/abs/2306.03955
'Kernel Quadrature with Randomly Pivoted Cholesky'
- Ethan N. Epperly, Elvira Moreno

Kernel quadrature methods can become bottlenecked by the task of selecting quadrature nodes, which is often approached through the (approximate) solution of a difficult non-convex optimisation problem. This work proposes instead using a simple randomised strategy for node selection, motivated by recent developments in randomised numerical linear algebra, and which shows promising empirical performance.

#sparxivdigest

Kernel quadrature with randomly pivoted Cholesky

This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.

arXiv.org

Sam Power Jun 6, 2023

https://arxiv.org/abs/2306.02066
'Variational Gaussian Process Diffusion Processes'
- Prakhar Verma, Vincent Adam, Arno Solin

One popular approach to approximate inference in partially-observed diffusion processes is variational inference, wherein the smoothing distribution is often approximated by a Gauss-Markov process. This work revisits this approach and presents a modified algorithm with improved practical convergence properties, and easier interfacing with parameter learning routines.

#sparxivdigest

Variational Gaussian Process Diffusion Processes

Diffusion processes are a class of stochastic differential equations (SDEs) providing a rich family of expressive models that arise naturally in dynamic modelling tasks. Probabilistic inference and learning under generative models with latent processes endowed with a non-linear diffusion process prior are intractable problems. We build upon work within variational inference, approximating the posterior process as a linear diffusion process, and point out pathologies in the approach. We propose an alternative parameterization of the Gaussian variational process using a site-based exponential family description. This allows us to trade a slow inference algorithm with fixed-point iterations for a fast algorithm for convex optimization akin to natural gradient descent, which also provides a better objective for learning model parameters.

arXiv.org