Running #llama (latest as of now) on my Intel #Arc750 GPU with the new qwen35-9B model using SYCL (Intel's thing):
- Needs a special environment to compile (setvars.sh, icc compiler, ...)
- Only achieves ~80% compute throughput (nvtop)
- Produces gibberish output
- Bench: pp=131t/s tg=9.4t/s
So I tried using the independent Vulkan backend:
- No special env required
- Achieves 100% compute
- Reasonable output produced
- Bench: pp=393t/s tg=9.5t/s
What gives?


