Sebastian Raschka

@SebRaschka
2.3K Followers
198 Following
224 Posts
ML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (amzn.to/4fqvn0D). Blogging about AI research at magazine.sebastianraschka.com.
Websitehttps://sebastianraschka.com
Bloghttps://magazine.sebastianraschka.com
GitHubhttps://github.com/rasbt
The next tutorial in my “Build A Large Language Model From Scratch” series is now live (https://www.youtube.com/watch?v=341Rb8fJxY0)
- Tokenizing raw text and converting tokens into token IDs
- Applying byte pair encoding
- Setting up data loaders in PyTorch for efficient training
Build an LLM from Scratch 2: Working with text data

YouTube

It's been a another wild month in AI & Deep Learning research.
I curated and summarized noteworthy papers here:

https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-2a1/

Ranging from new optimizers for LLMs to new scaling laws for vision transformers.

AI Research Highlights In 3 Sentences Or Less (May-June 2023)

This article is a compilation of 23 AI research highlights, handpicked and summarized. A lot of exciting developments are currently happening in the fields of natural language processing and computer vision! In addition, if you are curious about last month's highlights, you can find them here:

Ahead of AI
@SebRaschka finished his excellent free deep learning fundamentals course…highly recommended to watch: https://lightning.ai/pages/courses/deep-learning-fundamentals/unit-1/
Welcome to Machine Learning and Deep Learning | Unit 1

Lightning AI
@alenowak Glad you liked it, @alenowak !!
@hwaseem04 Assuming your code is correct, the val_loss could look like this due to random fluctuation. Given the scale (zoom of the y-axis) it maybe exaggerates / looks like a trend where in truth there is no trend.

A new Ahead of AI issue is out, where I am covering the latest research highlights concerning LLM tuning and dataset efficiency:

https://magazine.sebastianraschka.com/p/ahead-of-ai-9-llm-tuning-and-dataset/

Ahead of AI #9: LLM Tuning & Dataset Perspectives

In the last couple of months, we have seen a lot of people and companies sharing and open-sourcing various kinds of LLMs and datasets, which is awesome. However, from a research perspective, it felt more like a race to be out there first (which is understandable) versus doing principled analyses.

Ahead of AI

I just saw that my Ahead of AI magazine crossed the 20k subscriber mark!

https://magazine.sebastianraschka.com

I am incredibly grateful for all the support. Knowing that so many people find my writings useful is very, very motivating!

And stay tuned for the next article featuring the most recent research on finetuning LLMs with less data and LLM evaluation pitfalls.

And beyond LLMs, I am also excited to talk about the most recent, efficient computer vision transformers as well!

Ahead of AI | Sebastian Raschka, PhD | Substack

Ahead AI specializes in Machine Learning & AI research and is read by tens of thousands of researchers and practitioners who want to stay ahead in the ever-evolving field. Click to read Ahead of AI, by Sebastian Raschka, PhD, a Substack publication with tens of thousands of subscribers.

@edrogers I must say I kind of forgot about it for a while 😅
@hwaseem04 Thanks, glad you like them. And I might! But I have a few other topics I wanted to cover first. Stay tuned I'd say 😅

Just put together a list of papers to highlight 4 interesting things about transformers & LLMs.

Including a discussion on why the original transformer architecture figure is wrong, and a related approach published in 1991!

https://magazine.sebastianraschka.com/p/why-the-original-transformer-figure

About LayerNorm Variants in the Original Transformer Paper, and Some Other Interesting Historical Tidbits About LLMs

A few months ago, I shared the article, Understanding Large Language Models: A Cross-Section of the Most Relevant Literature To Get Up to Speed, and the positive feedback was very motivating! So, I also added a few papers here and there to keep the list fresh and relevant.

Ahead of AI