Mastodawn

Tom Goldstein Nov 26, 2022

Finally, Galactica is not the first model that can produce technical content, yet older models haven’t caused major problems to date. Here’s an example of text generated by GPT-neo on Huggingface Hub. I ran this generation once for each prompt. No cherry picking.

Show thread

Tom Goldstein Nov 25, 2022

Here's an algorithmic reasoning problem where standard nets fail. We train resnet18 to solve little 13x13 mazes. It accepts a 2D image of a maze and spits out a 2D image of the solution. Resnet18 gets 100% test acc on unseen mazes of the same size. But something is wrong…

Tom Goldstein Nov 25, 2022

Neural algorithm synthesis is done by giving models a human-crafted programming language and millions of sample programs. Recently, my lab looked at whether neural networks can synthesize algorithms on their own without these crutches. They can, with the right architecture. 🧵

Show thread

Tom Goldstein Nov 23, 2022

Emojis that represent writing implements are not widely used. 🖍 and 🖊 have to stay as raw bytes. But the neural net recognizes their byte sequences and associates them with artistic styles. In fact, you can control the style of an image by placing one in your prompt.

Show thread

Tom Goldstein Nov 23, 2022

Let's look at some of the rejects. Unlike most emojis,🏯 and📜 are not commonly used enough to be part of the 49K-word vocabulary. The closest conventional word to 🏯 in embedding space is "yin" (as in "yin and yang"). The closest word to 📜 is "news".

Here's "🏯🔥🐉 📜"

Show thread

Tom Goldstein Nov 23, 2022

All emojis are represented by fairly similar embedding vectors, as are math symbols. In fact, most emojis lie closer to unrelated emojis than to any English word. Still, the model understands the unique meaning of these symbols.

Prompt: e=mc^2

Show thread

Tom Goldstein Nov 23, 2022

Common english words, symbols, and *most* emojis are known to SD as a single "word."
Next, each "word" is replaced with a 512-dimensional "embedding vector". The result is a list of at most 77 such embedding vectors, which are fed into a big neural net that makes an image.

Show thread

Tom Goldstein Nov 23, 2022

You might think 49408 is a lot. Well, it's not. Here's the first 1200 words in the vocabulary. They don't get you very far. The words are auto-selected by a simple algorithm and half are junk. And what are the weird "�" entries? We'll get back to them later...

Show thread

Tom Goldstein Nov 23, 2022

Prompts are fed to stable diffusion as binary code, with each letter/symbol represented as several bytes. Then a "tokenizer" looks for commonly occurring spans of adjacent bytes and groups them into a single known "word". Stable diffusion only knows 49408 words.

Here's "🧛🦇🗡️"

Tom Goldstein Nov 23, 2022

I always thought #StableDiffusion prompts needed the right combination of words. But byte-pair encoding can represent anything you can type, including math formulas and emojis. Turns out you don't need any words at all! Here's how and why this works...🧵

Prompt: 🧑‍🚀👽🛰️🌌 🔥🍄