Just ran llama.cpp - with Facebook's LLaMA 7B large language model - on my M2 64GB MacBook Pro!

https://github.com/ggerganov/llama.cpp

GitHub - ggerganov/llama.cpp: LLM inference in C/C++

LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

GitHub

It's now possible to run a genuinely interesting large language model on a consumer laptop

I thought it would be at least another year or two before we got there, if not longer

Here are detailed notes on how I got it to work, plus some examples of prompts and their responses https://til.simonwillison.net/llms/llama-7b-m2
Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp

See also: **[Large language models are having their Stable Diffusion moment right now](https://simonwillison.net/2023/Mar/11/llama/)**. Facebook's [LLaMA](https://research.facebook.com/publications/l

Thanks to an update from llama.cpp author Georgi I have now successfully run the 13B model on my machine too! That's the one which Facebook research claims is competitive with original GPT3 in benchmarks. Notes on how I did that here: https://til.simonwillison.net/llms/llama-7b-m2#user-content-running-13b
Running LLaMA 7B and 13B on a 64GB M2 MacBook Pro with llama.cpp

See also: **[Large language models are having their Stable Diffusion moment right now](https://simonwillison.net/2023/Mar/11/llama/)**. Facebook's [LLaMA](https://research.facebook.com/publications/l

@simon If I understand it correctly it's only the conversion and quantization step that needs a lot of RAM? Would it be possible for people to share those converted models? Are these platform dependent?
@simon thanks for posting about this! I just got the 7B model working on my I9 (after an update that fixed some issues with Intel processors, hah)
@simon Beavers Are Friends But They Are Not Friends With Cat Coffee might not have been what you were after but it is exactly what I was after
@simon I’m just diving back into AI/ML. How RAM constrained are these models? I’ve a M1 Pro Mac with 32 GB RAM and a Linux machine with 32 GB as well. Wondering whether I should even attempt this.
@is Running the 7B model only seems to use about 4GB of RAM on my M2 MacBook Pro, 32GB should be easily enough
@simon So cool. Keeping my eye on this for sure.
@simon Thanks for this. I’ll be trying out your recipe on my M1 Pro Max.

@simon It lives!

"The first president of the USA was 38 years old when he became a “citizen” and then later a President.
He had been raised in New York, but his father had moved to Virginia before George Washington’s birth (1746) because there were too few farms for all the children they wanted – …”

@simon how is it at writing code?

@numist Pretty impressive considering this is the smallest of the LLaMA models (I'm running 7B but they also released 13B, 30B and 65B)

Got this result for a prompt of "def open_and_return_content(filename):"