@pervognsen FYI tininess detection before/after isn’t just FMA. It’s also FMUL and conversions between FP formats.
On the upside, it only affects the underflow flag unless the program has unmasked that exception (which isn’t portable anyway), or is flushing subnormals (widely supported, but non-standard). So if underflow is already set in your modeled FPSR, and you’re in default IEEE 754 mode (the norm for most programs), you can safely ignore it.
@pervognsen NaN propagation bit is slightly wrong too—754 not only doesn’t specify “which” NaN propagates, but some HW will propagate a NaN result that is not either NaN input (eg ARM with the DN bit set in FPCR).
And only generating one NaN generally won’t save you anyway, unless you have no mechanism to load binary data/reinterpret bit patterns.
@steve @pervognsen Any suggestions for fp -> int conversions in this design? Does FEAT_AFP have anything that helps with that?
Both seem like they compile to a bunch of compare-selects to handle edge cases. I guess maybe you could balance the number of compare-selects between architectures and standardise on NaN to zero, out-of-range to 0x80…00. Worst of both worlds :P
FJCVTZS comes to mind, but IIRC that wasn't cheap on x86?
@dougall @pervognsen That said, zero is kind of nice for NaN, because there are often computations that are guaranteed to produce either a value in a finite range _or_ NaN, and if you subsequently convert to integer and use that for a table lookup, sending NaN to zero avoids indexing out-of-bounds.
So on the whole, I would argue the ARM behavior is the best one, and it's not onerous to match on other HW.