18 Followers
4 Following
33 Posts
Perceptual grounding and explainable AI (#xAI) lab at Informatics at the University of Edinburgh.
Banyan stays competitive often even managing to outperform the baselines. This is despite the fact that it is a much much smaller model 7/🧵:
Where this really shines is in the low resource setting, where embeddings still play a critical role, but scale just isn’t available. That’s what we evaluate next, and this time we compare to LLMs in the 100M - 7B range as well as supervised embedding models 6/🧵:
Banyan is a special type of AutoEncoder, called a Self-StrAE (see fig). Given a sequence it needs to learn which elements to merge with each other, and in what order, to get the best compression. This means its representations model compositional semantics 2/🧵
Also, we highlight the scalability of PPS-VAE using simple post-hoc augmentations such as dynamic resizing of context points and reconfiguration of context points as tiles (see figure below).

Given an image (y) an inference network learns a distribution over context points positions (xM) and the CNPs are used as a generative network.

Latent variable (a) acts as an abstraction of the context points, providing control over different arrangements and pixel values(yM).

Interested in representation learning and Conditional Neural Processes (CNPs)?

Together with Victor Prokhorov, Siddharth N and Ivan Titov we propose Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNPs context points as latent variables.

Finally, we wondered what happens if we let the model choose its own order of compositions (basically define its own tree), so we modified StrAE so it can do just that!

This variant called Self-StrAE produces a tree and representations using a simple agglomerative clustering algorithm, no parser required!

We pre-trained StrAE through a novel application of the contrastive loss to structure. This basically means that the decoder has to reconstruct the encoders node embeddings at all levels, not just the input words (lowest level). By using contrastive loss we are able to do this without having to softmax over a huge vocabulary.

To do this we introduce StrAE, a simple recursive autoencoder that learns representations going up a tree (encoder) and then, using the root as the bottleneck, learns another set of embeddings going back down (decoder).

StrAE is very simple, it doesn't have a cell state or skip-connections so every level in the tree forces a strict bottleneck. This means that StrAE has to conform to the compression order dictated by whatever structure it is given as input. We call this faithfulness.

[4/4]

We show that DooD generalises across varied data such as numbers, characters, and doodles, by performing effective zero-shot transfer from one dataset (e.g. MNIST) to other (e.g. QuickDraw, Omniglot, etc), clearly outperforming baselines.

DooD also generalises across task---evaluated on 3 of the Omniglot challenge tasks: unconditional generation, conditional generation, and 1-shot classification, showing that DooD typically matches or outperforms SotA methods.