One the photography/art side, I love 10 bit and 12 bit colour.

But gosh darn does it suck for cache line efficiency in image processing code :(

@fclc now i am wondering if it could be profitable to do packed math on them with all the weird masking/permute/compress isns avx-512 has

@Methylzero I had an idea last year around adding an extension to use the #FP16 FPUs as 10 bit int pipelines to save a cycle on IFMAs and I16ADD over the int16 MAC/add instructions, but they were seen as too niche (even for x86)

There was already precedent on this sort of thing (avx512 IFMA did this for the FP64 pipes)

Idea was saving a cycle (3.5 instead of 4.5) and saving some power (but not dealing with the extra 6 bits of a normal int16)

#simd #HPC