New blog post: https://bartwronski.com/2024/01/22/how-i-use-chatgpt-daily-scientist-coder-perspective/
"How I use ChatGPT daily (scientist/coder perspective)."

I recommend it to anyone working with technology, but especially if you think that LLMs are "useless" and are open-minded to see how they can be helpful, delightful, and playful.

How I use ChatGPT daily (scientist/coder perspective)

We all know how the internet works—lots of “hot takes,” polarizing opinions, trolling, and ignorance.  Recently, everyone has opinions on AI and LLMs/GenAI in particular. I won’t focus here on…

Bart Wronski

@BartWronski if you have the GPU for it, there are a few quite good 7b/13b models for use at home. the main advantage of that is that it's open source work, the models are far less likely to generate "I'm sorry but" responses, you're in full control of the system prompt, and privacy is perfectly maintained.

Here's a starting point https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

look for 4-bit / GGUF quantized models in particular. these fit into 8-12GB almost completely.

Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4

Track, rank and evaluate open LLMs and chatbots

@lritter thanks, I definitely want to give them a try. The only reason I didn't do it earlier was this initial setup.

@BartWronski the best frontend i've so far encountered is https://github.com/oobabooga/text-generation-webui/ which i can recommend very much.

as a bonus, it can fake an OpenAI web API on port 5000, so anything that interfaces with OpenAI and supports custom servers can connect to it.

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models with support for multiple inference backends.

A Gradio web UI for Large Language Models with support for multiple inference backends. - oobabooga/text-generation-webui

GitHub
@lritter @BartWronski 7/13 is a bit lacking in the long run. 33B 4-bit fits into 24GB of VRAM. If you want to try 70B, or more quantization bits, or have less VRAM, you can split the layers to process part of them on the GPU and the rest on the CPU.
@wolfpld @lritter I actually have an A6000 48GB GPU in my PC now, so can try with even larger models. (The work one has a 4090, so "only" 24GB but maybe I could play with local LLMs for work coding, as understandably so far I could not use any service - internal code and company policies)