Thinking of revisiting the idea of changing decomposition of significand for SIMD (to get 16 digits that can be processed in parallel) by splitting the least-significant digit instead of the most-significant one. This should allow getting rid of an extra multiplication.
It will also eliminate annoying zero case that clz doesn't like.