Because of historic compatibility reasons, I sort of ended up rewriting a subset of #Microblaze-compatible core for synthesising on tight corners inside Lattice's ICE40/ECP5 #FPGA:s.

It's in #Verilog with some #LiterateProgramming preprocessing. Userspace (and I/O+interrupt support) only. Explicit support for combining code and data bus, optionally for 8-bit memory access (as in HyperRAM), or for synthesising instruction memory as block RAM, optionally with a secondary debug interface. Explicit support for resetting the core without resetting the whole FPGA. AXI-like, Wishbone-compliant, and serial I/O support, and I/O-mappable interrupt support. The register file can be pared down. Arithmetics can be divided up into chunks of a parametrically specified size, all the way down to bit-serial if need be, and slow-but-smol microcoded multiplication and division are optionally available. Some optional extensions for fixed-point transcendental calculations were originally planned, but right now, only binary logarithms and CORDIC are ready.

The original commercial interest in it is likely to go away in the near future. Would there be interest in a GPL release of this sort of thing?

This is not at all the sort of context that MicroBlaze was originally designed for, even in the Xilinx world, and I'm not sure that the specific backwards compatibility reasons exist outside this particular niche (=> I would probably not be doing maintenance work on the core after release without a good $€parate r€a$on), but if you have a use case that might match something like these criteria, please let me know.

(Obligatory LBNL: only deterministic automation was used in writing this code. GenAI has not touched any part of it.)

Oh! There's one more feature that was planned for, and is nearly ready: the support for Propeller-like multitasking, in that the processor's internal state can be cleanly separated from the logic, and rotated, at a cost of zero clocks (but a pipeline flush), at instruction boundaries. This way, in order to have multiple functional processor cores, you'd only need logic blocks for the register files (+ few other tidbits), but the control and arithmetics subsystem would be reusable.

The reason for doing it this way is, it effectively gives you the benefits of RTOS without the costs of actually implementing a software kernel (or the instructions needed to support a modern kernel). It hard-limits the number of parallel tasks, which modern software (RT)OSes tend to frown upon, but the limit is parametric and unbounded, so it actually works quite well in gateware contexts.

The catch is, the instruction pipeline is (currently) not included in the internal state, so it will need to be fully flushed at every task switch. This can probably be improved on, but then again, the constraints of this design didn't call for aggressive pipelining, anyway. Plus, a pipeline flush is still cheaper than a software-controlled context switch.

I did some work for supporting assigning tasks timeslices at different frequencies, but this subsystem is kind of messy and incomplete, so what I can release without significant preparation work would likely be a simple task ring, with a wee bit of optimisation for skipping over a halted (= waiting for interrupts) task. As I said, it's basically Propeller-like task sequencing.

EDIT: For clarity, specific interrupts can be tied to specific tasks. The intent was to allow combining several apparent Microblaze-compatible cores within a design, to bring down the logic block count, so the whole thing is supposed to behave approximately as multiple independent CPU cores sitting on common buses.