After A LOT of studying BLAS internals, my PR to the gemm crate is finally open: it introduces mixed-precision BF16 matmuls (optimal for use cases like small models doing autoregressive decoding on CPU)
https://github.com/sarah-quinones/gemm/pull/40
#programming #rust #ai #inference #deeplearning #qwen #asr #opensource #rustlang








