Mehrdad Yazdani

85 Followers
48 Following
355 Posts
Checking in on my AI character
AI generated image of a popular meme in the style of medieval art: https://glif.app/@fab1an/glifs/cm128fz6e000312o77ai2ae97
glif - Weird Medieval Style by fab1an

glif
Attention is All You Need gets so much, ahem, attention that Google had to put this on top of the PDF.
Mixture of experts when implemented right is so impressive: https://arxiv.org/abs/2409.02060
OLMoE: Open Mixture-of-Experts Language Models

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B. We present various experiments on MoE training, analyze routing in our model showing high specialization, and open-source all aspects of our work: model weights, training data, code, and logs.

arXiv.org
Big Bear early in the morning on Labor Day weekend.
Adapting to switching from Jupyter notebooks to VSCode. I still like the look and feel of Jupyter more, but these additional bells and whistles VSCode has is extra nice.
Man I love this thing

Neat free tool to generate directory tree structures in asciii: https://ascii-tree-generator.com/

Pretty useful when trying to figure out the structure for a project and brainstorming. Curious how the ascii renders here:

```
my_project/
├─ models/
│ ├─ model_1.py
│ ├─ model_2.py
├─ data/
│ ├─ data_prep.py
│ ├─ meta_data.json
├─ results/
├─ checkpoints/
trainer.py
runner.py
```

ASCII Tree Generator

Online interactive folder structure generator. Easily create and visualise your development tree for your new projects and your documentations.

OK, ok, I may be converting to wandb user. Have to admit this feature of showing the run history as ascii art is particularly neat.

Pretty neat transformer visualization project. Wonder if this could help with debugging 😜

https://poloclub.github.io/transformer-explainer/

Transformer Explainer: LLM Transformer Model Visually Explained

An interactive visualization tool showing you how transformer models work in large language models (LLM) like GPT.