New paper on GLS254 curve: https://eprint.iacr.org/2023/1688

TL;DR: applied complete formulas on it, made it faster, also defined proper signatures (with short length: 48 Bytes).
- x86 (Skylake): raw ECDH in 31615 cycles (new record), sign in 18374, verify in 27376. That's four times faster than Ed25519 (and twice faster than RSA verification, btw).
- ARMv8 (Cortex-A55): raw ECDH in 77435 cycles (new record), sign in 55526, verify in 68649.
- Also made code for archs without a carryless mul (RISC-V SiFive-U74, ARM Cortex M4). Perf is poorer, not abysmal (about 2.5x slower than Ed25519, no record here, but usable).

Fancy++: the x86, ARMv8 and RISC-V code is entirely in Rust. (All constant-time, of course, it should go without saying.)

Faster Complete Formulas for the GLS254 Binary Curve