Trying to get "scaled double" #maths working for #BurningShip #fractal in #kf. Purpose of rescaling is to avoid floating point underflow to 0.0. Something is broken however, I suspect it has to do with glitch detection as dead-center miniships do much better).

Burning Ship iterations:
Xrn = Xr * Xr - Xi * Xi + Cr;
Xin = abs(2 * Xr * Xi) + Ci;

Perturbed Burning Ship iterations:
xrn = (2 * Xr + xr) * xr - (2 * Xi + xi) * xi + cr;
xin = 2 * diffabs(Xr * Xi, Xr * xi + xr * Xi + xr * xi) + ci;

Rescaled perturbed Burning Ship iterations (S * s = 1, actual orbit is X + x * s):
xrn = (2 * Xr + xr * s) * xr - (2 * Xi + xi * s) * xi + cr;
xin = 2 * diffabs(Xr * Xi * S, Xr * xi + xr * Xi + xi * xr * s) + ci;

diffabs(Z,z) evaluates abs(Z+z)-abs(Z) without catastrophic precision loss, by doing case analysis on the signs of Z and Z+z. Technique invented by laser blaster at fractalforums.com.

I think I fixed it!

When Xr and Xi are small, (Xr * Xi) can #underflow to 0, so ((Xr * Xi) * S) is 0 already, Changing the multiplication order to ((Xr * S) * Xi) seems to work much better!

The risk of #overflow to infinity in (Xr * S) is probably none, because Xr is small (less than escape radius) but I should double-check deep zooms near the 1e600 threshold for switching from scaled double to long double.

#FloatingPoint

Be sure *not* to enable `-ffast-math` for this kind of code, as it can do optimizations that break it again, unpredictably.

Scaled-double with SIMD vector size 2 takes 10mins vs long double's 16mins (wall-clock time) at this Burning Ship location (zoom depth 1e410).

However, the scaled-double version looks a bit rougher even downscaled to thumbnail size. I'm not sure why, but it's a bit disappointing after all the effort of implementing it.

Similar issue with a Mandelbrot location at 4e533 zoom depth; this is a regression from the previous release so it should be possible to fix it again...

Simple enough fix in the end.

The SIMD path iterates in chunks (default 64), and rolls the last chunk back when any pixel in the SIMD vector (default size 2) escapes or is detected as glitched, then finishes iterating each pixel in the vector on its own one by one.

I forgot to save/restore the 'test1' variable used for storing the squared magnitude for checking escape/glitch, so it or 'test2' (the previous value of 'test1', used for low bailout smooth colouring) contained garbage data, which sometimes made it through to the output pixels.

I think that happened when the pixel's iteration count was 1 more than an integer multiple of the chunk size, but I'm not sure.

Well, the original example in this thread still fails spectacularly with scaled-double. I seem to have only fixed it for off-axis Burning Ship locations.

I think it's underflow to 0.0 of the reference orbit imaginary part `Xi` (stored in double precision without scaling), which causes problems in perturbed pixel orbits when the product of Xr and Xi is scaled up in `diffabs(Xr * S * Xi, ...)`.

In this case I think the terms should have magnitudes roughly:
Xr 1e0
S 1e130
Xi 1e-420 (underflows to 0)
So their product should have magnitude 1e-290 which is in the range of double.

Definition:
diffabs(X,x) := |X+x|-|X|

When the first argument is roughly opposite the second argument, the result is negative:
diffabs(-x,x) = -|x|
If the first argument is 0 due to unwanted underflow, the result is positive and about the same size:
diffabs(0,x)= |x|
This makes it go wrong.

Mandelbrot at this location works fine with scaled-double, because the Xi does not need to be scaled up for diffabs().