@adingbatponder Exactly. My explorations of LLMs have been exclusively focused on local use. One I really like is how all photos uploaded to the family Nextcloud get automatically tagged and have descriptions made with the content. All local.
@adingbatponder there is the basic problem that LLM are by definition a big tech technology. It's ludicrously expensive to train models from the ground up and requires incredible amounts of data that is hard to come by unless you disregard all copyright.
That's why the majority of "open weight" models are just modified versions of open weight models originating from big tech companies.
That is not true on all accounts. You can base train local LLMs on GPUs as modest as a RTX 3090 and even lower if you are willing to wait. In regards to the amount of data you need it depends on whether you are trying to make general purpose LLM or a Domain Specific LLM. The quality of the data means more than the quantity. A local LLM would be trained with a tool such as pytorch to add weights and meta data and then you would apply quantization to make it accessible for consumer grade hardware. I actually do this, I am not just repeating what I read on a social media post. I am currently training an LLM on consumer grade hardware to provide the intelligence for NPCs in RPG games. All of its training is about a made up world in the game and cannot be found anywhere besides in my head. It appears your ideas about LLMs and/or AI is not fully developed and more research would be prudent.
Nothing you said is true today in regards to training an LLM, about 20 - 30 years ago you may have had a point, where did you get this information? Take a look at the docs on Hugging Face Docs and the docs for Llama.cpp for details on how you can actually train an LLM. A modern gaming machine would be more than enough consumer grade power to train an LLM.
Please also understand that all LLMs are not made for just chatbots. AI with LLM technology is used for many things including modern day hobbyist robotics. Using a TPU and not even a GPU. In fact an LLM can run on only CPU on modest device like your cell phone or a raspberry pi, with the help of tools like Llama.cpp.
If you have $7.00 USD (kindle), $13.00 USD (paper back) or a Kindle Unlimited account then this book could also be of great service to you The Local LLM Handbook: Ollama vs Llama.cpp for Engineers.
That is a great choice, the unified memory of the Mac will give you some great advantages, as well as a few caveats. But a great choice for training larger LLMs on consumer based hardware for local AI. The primary advantage is that you will be able to use larger LLMs without the penalty of swapping memory from System RAM to GPU vRAM. The caveat is that this comes with a bus transfer bottle neck. So your LLM will not run as fast as an LLM that can fit entirely in vRAM. As the transfer rate from your GPU to its vRAM is much faster than the system bus transfer of the Mac unified memory.