I'm fascinated by this section in an Apple paper about how they're using ASTC to compress models to 4 bit, then using the hardware decode to decompress with no overhead. I don't understand how ASTC could ever be even remotely close to 4bit quantization in terms of NRMSE though…
ASTC was made for images and doesn't really generalize to model weights very well, at least from my experiments. They extracted significant vectors into a LORA adapter first, but even with a custom ASTC encoder tuned for weight heuristics I still got nowhere near int4 quality