Mastodawn

Sander Dieleman May 3, 2023

RT @NeurIPSConf
We are happy to announce the #NeurIPS2023 Creative AI Track. This track will invite ML researchers to showcase their work during the conference in the form of visual, language, musical, and performing arts. Proposals are due by *June 15*.

https://blog.neurips.cc/2023/05/02/call-for-neurips-creative-ai-track/

Announcing the NeurIPS Creative AI Track – NeurIPS Blog

Sander Dieleman May 2, 2023

RT @DrJimFan
In the age of foundation models, designing the data pipeline is actually more important than tweaking the model architecture.

Here's a fun competition: curate an image-text dataset that yields high performance on downstream tasks, while keeping CLIP model & training *fixed*.

Sander Dieleman Apr 25, 2023

RT @du_yilun
Check out our work on recycling diffusion models!

We show how to combine different diffusion models together, to form new probability distributions which can solve a variety of new tasks– without any need for training!

Webpage:
https://shorturl.at/nrQ39

(1/7)

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

Sander Dieleman Apr 13, 2023

RT @harmdevries77
Surprised by the loss of LLaMA-7B still going down after 1 trillion tokens?

In a new blogpost, I explain why you shouldn't be and argue we haven't reached the limit of the recent trend of training smaller LLMs for longer:
https://www.harmdevries.com/post/model-size-vs-compute-overhead/

Analysis in 🧵👇

Sander Dieleman Apr 13, 2023

Very cool work on diffusion models for constrained data. The fact that the inverse of a reflected diffusion is another reflected diffusion seems magical🤯 this enables DDIM sampling with thresholding "baked in", which is awesome.
---
RT @aaron_lou
Presenting Reflected Diffusion Models w/ @StefanoErmon!

Diffusion models should reverse a SDE, but common hacks break this. We provide a fix through a general framework.

Arxiv: http://arxiv.or…
https://twitter.com/aaron_lou/status/1646528998594482176

Aaron Lou on Twitter

“Presenting Reflected Diffusion Models w/ @StefanoErmon! Diffusion models should reverse a SDE, but common hacks break this. We provide a fix through a general framework. Arxiv: https://t.co/5ov4mm3rU4 Github: https://t.co/HlagPOGJeU Blog: https://t.co/glGw3DuTEe 🧵(1/n)”

Twitter

Sander Dieleman Apr 5, 2023

RT @karpathy
Common Q: Can you train language model w diffusion?
Favorite A: read this post (the whole blog is excellent)

(Roughly speaking state of the art generative AI is either trained autoregressively or with diffusion. The underlying neural net usually a Transformer.) https://twitter.com/sedielem/status/1612459398005235716

Sander Dieleman on Twitter

“New blog post about diffusion language models: https://t.co/uMF2BZNCqZ Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? And can we do anything about that?”

Twitter

Sander Dieleman Apr 3, 2023

Feels like just yesterday that any work on generative modelling had to be justified through "representation learning", or some sort of downstream task, to be taken seriously. Times have changed!
---
RT @karpathy
Around 5 years ago we were very proud of these state of the art results in image generation, trained on 32x32 "images" of CIFAR-10. You can kind of make out little wheel shapes, car/plane parts, and organic structures and textures…
https://twitter.com/karpathy/status/1642682172116172801

Andrej Karpathy on Twitter

“Around 5 years ago we were very proud of these state of the art results in image generation, trained on 32x32 "images" of CIFAR-10. You can kind of make out little wheel shapes, car/plane parts, and organic structures and textures. Pretty cool right”

Twitter

Show thread

Sander Dieleman Mar 27, 2023

Some thoughts on non-AR language models, and what it might take to dethrone autoregression: https://sander.ai/2023/01/09/diffusion-language.html

Diffusion language models

Diffusion models have completely taken over generative modelling of perceptual signals -- why is autoregression still the name of the game for language modelling? Can we do anything about that?

Sander Dieleman

Sander Dieleman Mar 27, 2023

This is definitely a problem with AR waveform models, which produce very long sequences (~10^6 steps) and are prone to "going off the rails".

It's clearly not been much of an issue with language models so far, but I suppose it could be in the long run!

Diffusion it is, then?😁
---
RT @ylecun
I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes.
Here is the argument:
Let e be the probability that any gene…
https://twitter.com/ylecun/status/1640122342570336267

Yann LeCun on Twitter

“I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n 1/”

Twitter

Sander Dieleman Mar 10, 2023

RT @D_Berthelot_ML
New paper TRACT - Faster diffusion model sampling
- Single-step diffusion SotA for CIFAR10 and ImageNet64 with L2 loss without architecture changes
- Up to 2.4x FID improvement
https://arxiv.org/abs/2303.04248

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Denoising Diffusion models have demonstrated their proficiency for generative sampling. However, generating good samples often requires many iterations. Consequently, techniques such as binary time-distillation (BTD) have been proposed to reduce the number of network calls for a fixed architecture. In this paper, we introduce TRAnsitive Closure Time-distillation (TRACT), a new method that extends BTD. For single step diffusion,TRACT improves FID by up to 2.4x on the same architecture, and achieves new single-step Denoising Diffusion Implicit Models (DDIM) state-of-the-art FID (7.4 for ImageNet64, 3.8 for CIFAR10). Finally we tease apart the method through extended ablations. The PyTorch implementation will be released soon.

arXiv.org