TurboQuant model weight compression support added to Llamacpp
https://github.com/TheTom/llama-cpp-turboquant/pull/45
#HackerNews #TurboQuant #Llamacpp #model #weight #compression #AI #optimization #machinelearning

feat: TQ4_1S weight compression (Metal only, needs CUDA port) by TheTom · Pull Request #45 · TheTom/llama-cpp-turboquant
Summary TQ3_1S (3-bit, 4.0 BPW) and TQ4_1S (4-bit, 5.0 BPW) weight quantization using WHT rotation + Lloyd-Max centroids V2.1 fused Metal kernel: zero threadgroup memory, cooperative SIMD rotation...




