The #OpenCode is working. Check when the Nvidia is used for it
⎿ [Image #1]
● I'll check how #Ollama uses the #Nvidia GPU.
● Bash(nvidia-smi --query-gpu=name,memory.used,memory.total,utilization.gpu --format=csv,noheader 2>/dev/null)
⎿ NVIDIA GeForce RTX 2060, 952 MiB, 6144 MiB, 9 %
● Bash(ollama ps 2>/dev/null)
⎿ NAME ID SIZE PROCESSOR UNTIL
qwen2.5-coder:7b-16k 0533dd01f237 7.2 GB 30%/70% CPU/GPU 4 minutes from now
● The model is currently loaded and actively using the GPU. Here's the breakdown:
- Model: #qwen2.5-coder:7b-16k — 7.2 GB total
- Split: 30% CPU / 70% GPU — the model is split across RAM and VRAM because it doesn't fit entirely in the RTX 2060's 6 GB VRAM
- Current #GPU #VRAM usage: 952 MiB used of 6144 MiB (the rest of the model layers sit in system RAM)
- GPU compute utilization: 9% right now (idle between requests)
The GPU is used whenever OpenCode sends a prompt to Ollama — during inference, 70% of the model layers run on the #RTX2060 and 30% on the #CPU. The model stays loaded in VRAM for 4 more minutes of idle time before Ollama unloads it.







