@zwarich Curious about your thoughts on this, regarding making a simple 1:1 binary translator generate good code on the big three targets, since I know you did Rosetta 2. https://gist.github.com/pervognsen/9787fb384b28b2d5412e5262354729a7#executing
December Adventure 2024

December Adventure 2024. GitHub Gist: instantly share code, notes, and snippets.

Gist

@pervognsen FYI tininess detection before/after isn’t just FMA. It’s also FMUL and conversions between FP formats.

On the upside, it only affects the underflow flag unless the program has unmasked that exception (which isn’t portable anyway), or is flushing subnormals (widely supported, but non-standard). So if underflow is already set in your modeled FPSR, and you’re in default IEEE 754 mode (the norm for most programs), you can safely ignore it.

@pervognsen NaN propagation bit is slightly wrong too—754 not only doesn’t specify “which” NaN propagates, but some HW will propagate a NaN result that is not either NaN input (eg ARM with the DN bit set in FPCR).

And only generating one NaN generally won’t save you anyway, unless you have no mechanism to load binary data/reinterpret bit patterns.

@pervognsen the other big non-portable corner cases to be aware of in FP are out-of-range conversions from FP-integer (x86 produces 0x80…00, ARM saturates and sends NaN to zero), and whether or not FMA(0, inf, qnan) raises invalid.
@steve @pervognsen where's my hardware feature for nan canonicalization plz (ofc this is not really that big of a deal either way, but part of me finds it amusing how much of a deal it is v.s. how few applications actually care about the particular nan values aside from them being consistent)
@dotstdy @pervognsen multiplication by 1 is the canonicalize operation.
@steve @dotstdy @pervognsen Or if you’re looking for an excuse to use avx512, VFIXUPIMMPS.