Mastodawn

Arthur Gretton Dec 20, 2022

Turn your HSIC dependence statistic into a conditional dependence statistic with CIRCE!
Learn NN features that are independent of distractors/protected attributes, conditioned on labels.
Used for domain invariant learning and fairness with equalized odds.
https://arxiv.org/abs/2212.08645

Efficient Conditionally Invariant Representation Learning

We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features $φ(X)$ of data $X$ to estimate a target $Y$, while being conditionally independent of a distractor $Z$ given $Y$. Both $Z$ and $Y$ are assumed to be continuous-valued but relatively low dimensional, whereas $X$ and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning. The procedure requires just a single ridge regression from $Y$ to kernelized features of $Z$, which can be done in advance. It is then only necessary to enforce independence of $φ(X)$ from residuals of this regression, which is possible with attractive estimation properties and consistency guarantees. By contrast, earlier measures of conditional feature dependence require multiple regressions for each step of feature learning, resulting in more severe bias and variance, and greater computational cost. When sufficiently rich features are used, we establish that CIRCE is zero if and only if $φ(X) \perp \!\!\! \perp Z \mid Y$. In experiments, we show superior performance to previous methods on challenging benchmarks, including learning conditionally invariant image features.

arXiv.org

Arthur Gretton Nov 25, 2022

Optimal Rates for Regularized Conditional Mean Embedding Learning
"what it says on the tin"😁

Oral presentation #NeurIPS22

arXiv: https://arxiv.org/abs/2208.01711
short video: https://youtu.be/Pl8OM2sckwA
Poster #838 Hall J Thursday 01 Dec 4:30
Zhu Li, Dimitri Meunier, Mattes Mollenhauer

Optimal Rates for Regularized Conditional Mean Embedding Learning

We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of $Y$ given $X$ into a target reproducing kernel Hilbert space $\mathcal{H}_Y$. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal $O(\log n / n)$ rates without assuming $\mathcal{H}_Y$ to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.

arXiv.org