https://escolatrader.net/icon-icx/?fsp_sid=37229
Disregard, I’m a dumbass who tried to oversimply the base case
#compiler and #C people, this is a bug right? https://godbolt.org/z/vzjnhWGMT
AFAIK optimizing away the explicitly requested type at above O1 & up should not be the behaviour, especially when the type requested is *higher* than the original type.
A way around it is to separate the function that casts from the function that explicitly does the math, but that still feels like a bug
_Float16 HSHaxpy(_Float16 a ,_Float16 x,_Float16 y){ /* a version of the level 1 BLAS AXPY function in mixed precision in this case data is received in FP16, "upgraded" to FP32, math is done before then being reconverted to FP16. */ float A = a; float X = x; float Y = y; float ANS = (a*x)+y; _Float16 ans = ANS; return ans; } float Saxpy(float a ,float x, float y){ /* a version of the level 1 BLAS AXPY function in mixed precision in this case data is received in FP32, math is done, then returned */ float A = a; float X = x; float Y = y; float ANS = (a*x)+y; return ANS; } _Float16 HSHaxpy_manual(_Float16 a ,_Float16 x,_Float16 y){ /* a version of the level 1 BLAS AXPY function in mixed precision in this case data is received in FP16, we call SAXPY, math is done before then being reconverted to FP16. */ float A = a; float X = x; float Y = y; _Float16 ans = Saxpy(a,x,y); return ans; }