Speculative KV coding: losslessly compressing KV cache by up to ~4× using a predictor model

Lossless compression of a target model's KV cache by up to 4×, using a cheaper predictor model to drive an arithmetic coder.