I implemented the (unratified) Zibi extension: https://github.com/riscv/zibi/releases/tag/v0.6

With LLVM it seems to be worth about 1.2% CoreMark/MHz. The improvements are localised to the list benchmark. Unfortunately no GNU toolchain support for this extension yet.

Release Branch with Immediate(Zibi) Extension v0.6 · riscv/zibi

Branch with Immediate (Zibi) Ratification Plan. Contribute to riscv/zibi development by creating an account on GitHub.

GitHub
@wren6991 Compare-and-branch with immediate comparand sounds _brutal_ from an encoding space perspective, versus macro-fusion of separate compare-with-immediate and branch.

@pervognsen Even for a 5-bit immediate? Also they used the remaining 1/4 of the funct3 values under the BRANCH opcode. These have been sat reserved for a long time waiting for people to come up with some more interesting branch conditions, and I think there's reluctance to put non-branch instructions under that opcode.

I'm not enthused about fusion here because:

* Either destructive of original value (c.addi; c.beqz/bnez) or greater code size plus a clobber (c.li; beq/bne)
* For compressed case, has a range of +-256B instead of +-4096B
* More complex to implement than just some new branch conditions

@wren6991 Ah, 5-bit immediate comparand sounds very reasonable. I didn't realize the existing encoding had that much slack remaining.
@pervognsen Yep, the immediate is squeezed into the rs2 register specifier. Values are -1 and 1..31 (since 0 can already be achieved using the zero register and beq/bne). Here's the RTL:
@wren6991 Yeah, the rs2 aliasing makes sense; I guess I'm surprised there was still room in funct3 of the existing branch instruction class for new uses like that. Is it looking likely for ratification?
@pervognsen Kind of hard to tell. It's going through internal review at the moment and nobody seems to be objecting to it but the process is a bit opaque.
@pervognsen @wren6991 The compare and branch branch encodings used a 3-bit opcode and only 6 instructions where defined, so there were two "empty" slots, where putting anything but a branch wouldn't make much sense.
The immediate is just a 5-bit one using the 5-bit of the second GPR specifier.

@wren6991 Maybe you can share your results on the mailing list: https://lists.riscv.org/g/sig-scalar-efficiency/topics

BTW, will you implement P once it's ready?

@camelcdr I do plan to implement P once there is a single spec with all of the instructions in it 😅
@camelcdr And yeah I was going to try embench too, but a lot of those benchmarks are heavily stdlib-dependent and I'm building my libraries with GCC (just using scripts from riscv-gnu-toolchain with --enable-llvm) so it's going to make things look worse than they are. I guess I can try those SiFive GCC patches for Zibi.
@camelcdr Zibi is not looking so hot in embench actually (compilation only, LLVM main):