Sander Dieleman

159 Followers
75 Following
40 Posts
Research Scientist at DeepMind. Deep learning (research + software), music, generative models (personal account).

RT @NeurIPSConf
We are happy to announce the #NeurIPS2023 Creative AI Track. This track will invite ML researchers to showcase their work during the conference in the form of visual, language, musical, and performing arts. Proposals are due by *June 15*.

https://blog.neurips.cc/2023/05/02/call-for-neurips-creative-ai-track/

Announcing the NeurIPS Creative AI Track – NeurIPS Blog

RT @DrJimFan
In the age of foundation models, designing the data pipeline is actually more important than tweaking the model architecture.

Here's a fun competition: curate an image-text dataset that yields high performance on downstream tasks, while keeping CLIP model & training *fixed*.

RT @du_yilun
Check out our work on recycling diffusion models!

We show how to combine different diffusion models together, to form new probability distributions which can solve a variety of new tasks– without any need for training!

Webpage:
https://shorturl.at/nrQ39

(1/7)

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

RT @harmdevries77
Surprised by the loss of LLaMA-7B still going down after 1 trillion tokens?

In a new blogpost, I explain why you shouldn't be and argue we haven't reached the limit of the recent trend of training smaller LLMs for longer:
https://www.harmdevries.com/post/model-size-vs-compute-overhead/

Analysis in 🧵👇

Very cool work on diffusion models for constrained data. The fact that the inverse of a reflected diffusion is another reflected diffusion seems magical🤯 this enables DDIM sampling with thresholding "baked in", which is awesome.
---
RT @aaron_lou
Presenting Reflected Diffusion Models w/ @StefanoErmon!

Diffusion models should reverse a SDE, but common hacks break this. We provide a fix through a general framework.

Arxiv: http://arxiv.or…
https://twitter.com/aaron_lou/status/1646528998594482176

Aaron Lou on Twitter

“Presenting Reflected Diffusion Models w/ @StefanoErmon! Diffusion models should reverse a SDE, but common hacks break this. We provide a fix through a general framework. Arxiv: https://t.co/5ov4mm3rU4 Github: https://t.co/HlagPOGJeU Blog: https://t.co/glGw3DuTEe 🧵(1/n)”

Twitter

RT @karpathy
Common Q: Can you train language model w diffusion?
Favorite A: read this post (the whole blog is excellent)

(Roughly speaking state of the art generative AI is either trained autoregressively or with diffusion. The underlying neural net usually a Transformer.) https://twitter.com/sedielem/status/1612459398005235716

Sander Dieleman on Twitter

“New blog post about diffusion language models: https://t.co/uMF2BZNCqZ Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? And can we do anything about that?”

Twitter
Feels like just yesterday that any work on generative modelling had to be justified through "representation learning", or some sort of downstream task, to be taken seriously. Times have changed!
---
RT @karpathy
Around 5 years ago we were very proud of these state of the art results in image generation, trained on 32x32 "images" of CIFAR-10. You can kind of make out little wheel shapes, car/plane parts, and organic structures and textures…
https://twitter.com/karpathy/status/1642682172116172801
Andrej Karpathy on Twitter

“Around 5 years ago we were very proud of these state of the art results in image generation, trained on 32x32 "images" of CIFAR-10. You can kind of make out little wheel shapes, car/plane parts, and organic structures and textures. Pretty cool right”

Twitter
Some thoughts on non-AR language models, and what it might take to dethrone autoregression: https://sander.ai/2023/01/09/diffusion-language.html
Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

Sander Dieleman

This is definitely a problem with AR waveform models, which produce very long sequences (~10^6 steps) and are prone to "going off the rails".

It's clearly not been much of an issue with language models so far, but I suppose it could be in the long run!

Diffusion it is, then?😁
---
RT @ylecun
I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes.
Here is the argument:
Let e be the probability that any gene…
https://twitter.com/ylecun/status/1640122342570336267

Yann LeCun on Twitter

“I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n 1/”

Twitter
RT @D_Berthelot_ML
New paper TRACT - Faster diffusion model sampling
- Single-step diffusion SotA for CIFAR10 and ImageNet64 with L2 loss without architecture changes
- Up to 2.4x FID improvement
https://arxiv.org/abs/2303.04248
TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends BTD. For single step diffusion,TRACT improves FID by up to 2.4x on the same architecture, and achieves new single-step Denoising Diffusion Implicit Models (DDIM) state-of-the-art FID (7.4 for ImageNet64, 3.8 for CIFAR10). Finally we tease apart the method through extended ablations. The PyTorch implementation will be released soon.

arXiv.org