I just published A brief history of the satellite-image-deep-learning Github repository
https://link.medium.com/LsxZDLK3Jwb
π¦π: https://twitter.com/robmarkcole/status/1616406634011426816
I just published A brief history of the satellite-image-deep-learning Github repository
https://link.medium.com/LsxZDLK3Jwb
π¦π: https://twitter.com/robmarkcole/status/1616406634011426816
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Presents VALL-E, a LM approach for TTS that significantly outperforms the SotA zero-shot TTS in terms of speech naturalness and speaker similarity.
proj: https://valle-demo.github.io/
abs: https://arxiv.org/abs/2301.02111
π¦π: https://twitter.com/arankomatsuzaki/status/1611174058699395072
Cool new Transformer Circuits paper on toy models of memorisation! IMO, the most exciting part is buried at the end - some fascinating exploration from @[email protected] on MNIST, showing that looking at how many "dimensions" a data point takes up can identify memorisation on a real model! https://twitter.com/AnthropicAI/status/1611045993516249088
π¦π: https://twitter.com/NeelNanda5/status/1611102356766347265
βWe have little mechanistic understanding of how deep learning models overfit to their training data, despite it being a central problem. Here we extend our previous work on toy models to shed light on how models generalize beyond their training data. https://t.co/0bYUToop3mβ
#ChatGPT cannot scrape the web and has limited knowledge of the world after 2021.
Introducing `WebChatGPT`, a mighty Chrome extension that augments your prompts with relevant results from the web! π€―
See my demo video below π and install it here:
π https://chrome.google.com/webstore/detail/web-chatgpt/lpfemeioodjbpieminkklglpmhlngfcn
π¦π: https://twitter.com/DataChaz/status/1610556519531089921
The architecture of GPT3.
http://jalammar.github.io/how-gpt3-works-visualizations-animations/
π¦π: https://twitter.com/Grady_Booch/status/1610156904042594310
Discussions: Hacker News (397 points, 97 comments), Reddit r/MachineLearning (247 points, 27 comments) Translations: German, Korean, Chinese (Simplified), Russian The tech world is abuzz with GPT3 hype. Massive language models (like GPT3) are starting to surprise us with their abilities. While not yet completely reliable for most businesses to put in front of their customers, these models are showing sparks of cleverness that are sure to accelerate the march of automation and the possibilities of intelligent computer systems. Letβs remove the aura of mystery around GPT3 and learn how itβs trained and how it works. A trained language model generates text. We can optionally pass it some text as input, which influences its output. The output is generated from what the model βlearnedβ during its training period where it scanned vast amounts of text.
Can we compress large language models for better perf?
"SparseGPT: Massive Language Models can be Accurately Pruned in One Shot"
Eliminates the need to use/store 50% of weights for a 175B param model with no significant sacrifice in perf
https://arxiv.org/pdf/2301.00774.pdf
Here's how π
π¦π: https://twitter.com/mathemagic1an/status/1610159526598311936
Muse: Text-To-Image Generation via Masked Generative Transformers
Presents Muse, a text-to-image Transformer model that achieves SotA image generation perf while being far more efficient than diffusion or AR models.
proj: https://muse-model.github.io/
abs: https://arxiv.org/abs/2301.00704
π¦π: https://twitter.com/arankomatsuzaki/status/1610088296922718208
Deep Learning with PyTorch - University of Amsterdam(UvA)
A fantastic series of tutorials covering a wide array of topics from PyTorch basics, basics of neural nets, architectures(CNNs, transformers, GNNs), generative networks, and contrastive learning.
https://uvadlc-notebooks.readthedocs.io/en/latest/
π¦π: https://twitter.com/Jeande_d/status/1609999660177059840
Final day of a lovely trip to Bangalore. Looking forward to many future visits, especially to visit a dynamic incoming NLP prof at @[email protected], my dear friend and former student @[email protected].
(Attn prospective graduate students: get in touch w Danish!)
π¦π: https://twitter.com/zacharylipton/status/1608653472685260801
We are running out of a vital resource: words!
There are βonlyβ 5 to 10 trillion high-quality words (papers, books, code) on the internet. Our AI models will have used all of that for training by 2026. Low-quality data (tweets, fanfic) will last to 2040. https://arxiv.org/pdf/2211.04325.pdf
π¦π: https://twitter.com/emollick/status/1605756428941246466