Mastodawn

I successfully implemented approximate reciprocal (1/x) and reciprocal square root (1/sqrt(x)) for MRISC32 today.

I use a 256-entry LUT and get about 7-8 bits of precision (full precision with two Newton-Raphson iterations).

Those are quite cheap instructions. Single cycle/no latency, and less than 40 ALMs in the FPGA for 32-bit floating-point.

Useful for the Quake 3D rendering loops. I got another couple of FPS by switching from FDIV to FRECIPA.

Lo and behold, Quake 2 is working on my homebrewn MRISC32 FPGA computer (after bumping the GCC version of my MRISC32 GCC toolchain)!

The frame rates are nothing to brag about (it's a 100MHz CPU after all), but it runs! 🎉

Vimeo