Mastodawn

Magiwarriorx Jul 11, 2023

Guide on setting up a local GGML model?

Guide on setting up a local GGML model? - LemmyWorld

I’ve been messing around with GPTQ models with ExLlama in ooba, and have gotten 33b models @ 3k running smoothly, but was looking to try something bigger than my VRAM can hold. However, I’m clearly doing something wrong, and the koboldcpp.exe documentation isn’t clear to me. Does anyone have a good setup guide?

Show thread

actually-a-cat Jul 12, 2023

What’s the problem you’re having with kobold? It doesn’t really require any setup. Download the exe, click on it, select model in the window, click launch. The webui should open in your default browser.

Show thread

Magiwarriorx Jul 12, 2023

Note this is koboldcpp.exe and not KoboldAI.

The Github describes arguments to use GPU acceleration, but it is fuzzy on what the arguments do and completely neglects to mention what the values for those arguments do. I understand the --gpulayers arg, but the two ints after --useclblast are lost on me. It seems to be completely ignoring GPU acceleration, and I’m clueless where the problem lies. I figured it would be easier to ask for a guide and just start my GGML guide from scratch.

GitHub - LostRuins/koboldcpp: Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

Run GGUF models easily with a KoboldAI UI. One File. Zero Install. - LostRuins/koboldcpp

GitHub

Show thread

actually-a-cat

Those are OpenCL platform and device identifiers, you can use clinfo to find out which numbers are what on your system.

Also note that if you’re building kobold.cpp yourself, you need to build with LLAMA_CLBLAST=1 for OpenCL support to exist in the first place. Or LLAMA_CUBLAS for CUDA.