Interested in representation learning and Conditional Neural Processes (CNPs)?

Together with Victor Prokhorov, Siddharth N and Ivan Titov we propose Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNPs context points as latent variables.

Given an image (y) an inference network learns a distribution over context points positions (xM) and the CNPs are used as a generative network.

Latent variable (a) acts as an abstraction of the context points, providing control over different arrangements and pixel values(yM).

In our analysis we find that under certain fairly simple conditions, PPS-VAE learns to choose context points that:

- provide useful semantic information for downstream tasks both in-distribution and out-of-distribution settings
- enable better-fit models to data

Also, we highlight the scalability of PPS-VAE using simple post-hoc augmentations such as dynamic resizing of context points and reconfiguration of context points as tiles (see figure below).

Would you like to learn more? Check:
[paper: https://arxiv.org/abs/2305.18485]
[code: https://github.com/exlab-research/pps-vae]

If you find this work interesting, please reach out. I will be presenting it at #ICML2024. Hope to see many of you!

Autoencoding Conditional Neural Processes for Representation Learning

Conditional neural processes (CNPs) are a flexible and efficient family of models that learn to learn a stochastic process from data. They have seen particular application in contextual image completion - observing pixel values at some locations to predict a distribution over values at other unobserved locations. However, the choice of pixels in learning CNPs is typically either random or derived from a simple statistical measure (e.g. pixel variance). Here, we turn the problem on its head and ask: which pixels would a CNP like to observe - do they facilitate fitting better CNPs, and do such pixels tell us something meaningful about the underlying image? To this end we develop the Partial Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNP context as latent variables learnt simultaneously with the CNP. We evaluate PPS-VAE over a number of tasks across different visual data, and find that not only can it facilitate better-fit CNPs, but also that the spatial arrangement and values meaningfully characterise image information - evaluated through the lens of classification on both within and out-of-data distributions. Our model additionally allows for dynamic adaption of context-set size and the ability to scale-up to larger images, providing a promising avenue to explore learning meaningful and effective visual representations.

arXiv.org