@zwarich Curious about your thoughts on this, regarding making a simple 1:1 binary translator generate good code on the big three targets, since I know you did Rosetta 2. https://gist.github.com/pervognsen/9787fb384b28b2d5412e5262354729a7#executing
December Adventure 2024

December Adventure 2024. GitHub Gist: instantly share code, notes, and snippets.

Gist

@pervognsen FYI tininess detection before/after isn’t just FMA. It’s also FMUL and conversions between FP formats.

On the upside, it only affects the underflow flag unless the program has unmasked that exception (which isn’t portable anyway), or is flushing subnormals (widely supported, but non-standard). So if underflow is already set in your modeled FPSR, and you’re in default IEEE 754 mode (the norm for most programs), you can safely ignore it.

@pervognsen NaN propagation bit is slightly wrong too—754 not only doesn’t specify “which” NaN propagates, but some HW will propagate a NaN result that is not either NaN input (eg ARM with the DN bit set in FPCR).

And only generating one NaN generally won’t save you anyway, unless you have no mechanism to load binary data/reinterpret bit patterns.

@pervognsen the other big non-portable corner cases to be aware of in FP are out-of-range conversions from FP-integer (x86 produces 0x80…00, ARM saturates and sends NaN to zero), and whether or not FMA(0, inf, qnan) raises invalid.
@steve @pervognsen where's my hardware feature for nan canonicalization plz (ofc this is not really that big of a deal either way, but part of me finds it amusing how much of a deal it is v.s. how few applications actually care about the particular nan values aside from them being consistent)
@dotstdy @pervognsen multiplication by 1 is the canonicalize operation.
@steve @dotstdy @pervognsen Or if you’re looking for an excuse to use avx512, VFIXUPIMMPS.

@steve @pervognsen Any suggestions for fp -> int conversions in this design? Does FEAT_AFP have anything that helps with that?

Both seem like they compile to a bunch of compare-selects to handle edge cases. I guess maybe you could balance the number of compare-selects between architectures and standardise on NaN to zero, out-of-range to 0x80…00. Worst of both worlds :P

FJCVTZS comes to mind, but IIRC that wasn't cheap on x86?

@dougall @pervognsen ARM added FRINT[32,64][Z,X] in ... 8.5 (maybe?) that match the x86 behavior, but that's (arguably) the less desirable one.
@dougall @pervognsen FWIW, in numeric codes, I have never cared what happened to NaN in these conversions, but I have often wanted to preserve sign via clamping.

@dougall @pervognsen That said, zero is kind of nice for NaN, because there are often computations that are guaranteed to produce either a value in a finite range _or_ NaN, and if you subsequently convert to integer and use that for a table lookup, sending NaN to zero avoids indexing out-of-bounds.

So on the whole, I would argue the ARM behavior is the best one, and it's not onerous to match on other HW.

@steve @pervognsen Ah, right, yeah, that makes a lot of sense. (I was lost in the FCVT* instructions)