Mastodawn

We are advertising a postdoc position to work on #generative #models, #structure #induction, and MI #estimation with Michael Gutmann as part of GenAI (@genaihub)!

https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1001/job/13930

Get in touch! (#ML #AI)
👉 homepages.inf.ed.ac.uk/snaraya3/
👉 michaelgutmann.github.io
👉 genai.ac.uk

Research Associate

We invite applications for a Postdoctoral Research Associate in machine learning based in the School of Informatics, University of Edinburgh. The postholder will also be formally affiliated with the EPSRC-funded Hub in Generative AI and work with Drs Siddharth N. and Michael Gutmann as part of the Hub. This is an outstanding opportunity to conduct methodological research at the frontier of machine learning and to collaborate across a vibrant national network of leading universities and industry partners.

University of Edinburgh

ExLab Jul 17, 2025

We will be advertising for a postdoc position soon, to work on #generative #models #structure #induction and #uncertainty with Michael Gutmann as part of the GenAI Hub (@genaihub)!

Keep an eye out, and get in touch! (#ML #AI #ICML2025 )

👉 https://homepages.inf.ed.ac.uk/snaraya3/
👉 https://michaelgutmann.github.io/
👉 https://www.genai.ac.uk/

Siddharth - Home

Siddharth's page

Show thread

ExLab Jul 17, 2025

Interested?

If you want to see all the experiments and find out why it works in the first place, you can check out the paper here:
https://www.arxiv.org/abs/2407.17771

Our code is available with the paper :)

Banyan: Improved Representation Learning with Explicit Structure

We present Banyan, a model that efficiently learns semantic representations by leveraging explicit hierarchical structure. While transformers excel at scale, they struggle in low-resource settings. Conversely recent structured models have shown promise as efficient learners, but lack performance. Banyan bridges this gap with two key innovations: an entangled hierarchical tree structure and diagonalized message passing, enabling it to outperform larger transformer models with just 14 non-embedding parameters. It excels in low-resource settings, offering a viable alternative for under-represented languages and highlighting its potential for efficient, interpretable NLP in resource-constrained environments.

arXiv.org

Show thread

ExLab Jul 17, 2025

Banyan stays competitive often even managing to outperform the baselines. This is despite the fact that it is a much much smaller model 7/🧵:

Show thread

ExLab Jul 17, 2025

Where this really shines is in the low resource setting, where embeddings still play a critical role, but scale just isn’t available. That’s what we evaluate next, and this time we compare to LLMs in the 100M - 7B range as well as supervised embedding models 6/🧵:

Show thread

ExLab Jul 17, 2025

Banyan turns out to be a pretty efficient learner! Its embeddings outperform our prior recursive net, as well as a RoBERTa medium ( a few million parameter encoder) and several word embedding baselines trained on 10x more data
5/🧵

Show thread

ExLab Jul 17, 2025

2) We change our parameterization to a diagonal mechanism inspired by SSMs, which lets us reduce parameters by 10x while massively increasing performance 💪

For our initial benchmarks we pre-train Banyan on 10M tokens of English and test STS, retrieval and classification... 4/🧵

Show thread

ExLab Jul 17, 2025

We can make this set up much more powerful with two changes:

1) Entangling: whenever any instance of the encoder merges the same span, we reconstruct it from every possible context it can occur in, learning the global connective structure of our pre-training corpus 3/🧵

Show thread

ExLab Jul 17, 2025

Banyan is a special type of AutoEncoder, called a Self-StrAE (see fig). Given a sequence it needs to learn which elements to merge with each other, and in what order, to get the best compression. This means its representations model compositional semantics 2/🧵

ExLab Jul 17, 2025

Are you compositionally curious 🤓

Want to know how to learn embeddings using🌲?

In our new #ICML2025 paper, we present Banyan:
A recursive net that you can train super efficiently for any language or domain, and get embeddings competitive with much much larger LLMs 1/🧵

#embeddings #structure #nlp #semantics #efficient #lowresource