Mastodawn

karpathy Feb 4, 2023

Ok with a nice cost-saving PR into nanoGPT as of this morning, finally managed to fit the largest GPT-2 model (gpt2-xl) with vanilla DDP on my 8XA100 40GB node, with batch size 1 and no tricks (e.g. checkpointing etc.). So with batch size of 1M, 300B tokens = 300 iters, at ~17 sec/iter that's... checks notes... 59 days to reproduce GPT-2. Cool.

Show thread

karpathy Jan 17, 2023

I hope it's allowed to cross-link to Twitter. I'm not getting banned right?

karpathy Jan 17, 2023

Not me awkwardly cross-posting my tweets to Mastodon about my new GPT video:

https://twitter.com/karpathy/status/1615398117683388417

Andrej Karpathy on Twitter

“🔥 New (1h56m) video lecture: "Let's build GPT: from scratch, in code, spelled out." https://t.co/2pKsvgi3dE We build and train a Transformer following the "Attention Is All You Need" paper in the language modeling setting and end up with the core of nanoGPT.”

Twitter

karpathy Jan 17, 2023

lol!! https://www.learngpt.com/posts/play-tic-tac-toe-with-me

(great website overall!)

Play Tic Tac Toe with me

karpathy Dec 31, 2022

When I find a song I like I just put it on repeat for the entire day. Today's (re-)find is Zonnestraal (MÖWE Remix) https://www.youtube.com/watch?v=5HJr3Sxtsvg

This is the coding music to my upcoming best ever tutorial on GPT that I am working on today.

About a year ago I stumbled by a Twitch channel that played this song on a repeat next to video of a cat jamming and singing. I love the internet.

De Hofnar : Zonnestraal (MÖWE Remix)

YouTube

karpathy Dec 26, 2022

Even 1 night of GPT training on 1 little GPU that could can produce a decent poet:

And all the land shall be bright
With the looks of a golden light;
And the willows beyond the stream
Shall quiver with the delight
Of music and of winds that gleam
On the dark waters of night;

karpathy Dec 26, 2022

The original Transformer paper used no biases in the self-attention's Linear layers (producing q,k,v) but the OpenAI gpt2 models seem to have them in the checkpoint. Anyone aware of best practices around the presence of these?

karpathy Dec 18, 2022

When I read the announcement I immediately thought of this tweet, one of my favorites. Judging by the overwhelming sentiment among friends things look dark atm but my unpopular (?) hope/belief is that Twitter can over time keep the good changes, iterate on the bad ones, and converge towards a global optimum. I hope I'm right let's see. TLDR in the Denial stage of grief :)

karpathy Dec 18, 2022

toot?