Mastodawn

ExLab Jul 17, 2025

Banyan stays competitive often even managing to outperform the baselines. This is despite the fact that it is a much much smaller model 7/🧵:

Show thread

ExLab Jul 17, 2025

Where this really shines is in the low resource setting, where embeddings still play a critical role, but scale just isn’t available. That’s what we evaluate next, and this time we compare to LLMs in the 100M - 7B range as well as supervised embedding models 6/🧵:

Show thread

ExLab Jul 17, 2025

Banyan is a special type of AutoEncoder, called a Self-StrAE (see fig). Given a sequence it needs to learn which elements to merge with each other, and in what order, to get the best compression. This means its representations model compositional semantics 2/🧵

Show thread

ExLab May 6, 2024

Also, we highlight the scalability of PPS-VAE using simple post-hoc augmentations such as dynamic resizing of context points and reconfiguration of context points as tiles (see figure below).

Show thread

ExLab May 6, 2024

Given an image (y) an inference network learns a distribution over context points positions (xM) and the CNPs are used as a generative network.

Latent variable (a) acts as an abstraction of the context points, providing control over different arrangements and pixel values(yM).

ExLab May 6, 2024

Interested in representation learning and Conditional Neural Processes (CNPs)?

Together with Victor Prokhorov, Siddharth N and Ivan Titov we propose Pixel Space Variational Autoencoder (PPS-VAE), an amortised variational framework that casts CNPs context points as latent variables.

Show thread

ExLab Nov 4, 2023

Finally, we wondered what happens if we let the model choose its own order of compositions (basically define its own tree), so we modified StrAE so it can do just that!

This variant called Self-StrAE produces a tree and representations using a simple agglomerative clustering algorithm, no parser required!

Show thread

ExLab Nov 4, 2023

We pre-trained StrAE through a novel application of the contrastive loss to structure. This basically means that the decoder has to reconstruct the encoders node embeddings at all levels, not just the input words (lowest level). By using contrastive loss we are able to do this without having to softmax over a huge vocabulary.

Show thread

ExLab Nov 4, 2023

To do this we introduce StrAE, a simple recursive autoencoder that learns representations going up a tree (encoder) and then, using the root as the bottleneck, learns another set of embeddings going back down (decoder).

StrAE is very simple, it doesn't have a cell state or skip-connections so every level in the tree forces a strict bottleneck. This means that StrAE has to conform to the compression order dictated by whatever structure it is given as input. We call this faithfulness.

Show thread

ExLab Nov 29, 2022

[4/4]

We show that DooD generalises across varied data such as numbers, characters, and doodles, by performing effective zero-shot transfer from one dataset (e.g. MNIST) to other (e.g. QuickDraw, Omniglot, etc), clearly outperforming baselines.

DooD also generalises across task---evaluated on 3 of the Omniglot challenge tasks: unconditional generation, conditional generation, and 1-shot classification, showing that DooD typically matches or outperforms SotA methods.