Open-Source RISC-V: Energy Efficiency of Superscalar, Out-of-Order Execution

https://arxiv.org/abs/2505.24363

#HackerNews #OpenSource #RISC-V #EnergyEfficiency #Superscalar #OutOfOrderExecution

Ramping Up Open-Source RISC-V Cores: Assessing the Energy Efficiency of Superscalar, Out-of-Order Execution

Open-source RISC-V cores are increasingly demanded in domains like automotive and space, where achieving high instructions per cycle (IPC) through superscalar and out-of-order (OoO) execution is crucial. However, high-performance open-source RISC-V cores face adoption challenges: some (e.g. BOOM, Xiangshan) are developed in Chisel with limited support from industrial electronic design automation (EDA) tools. Others, like the XuanTie C910 core, use proprietary interfaces and protocols, including non-standard AXI protocol extensions, interrupts, and debug support. In this work, we present a modified version of the OoO C910 core to achieve full RISC-V standard compliance in its debug, interrupt, and memory interfaces. We also introduce CVA6S+, an enhanced version of the dual-issue, industry-supported open-source CVA6 core. CVA6S+ achieves 34.4% performance improvement over CVA6 core. We conduct a detailed performance, area, power, and energy analysis on the superscalar out-of-order C910, superscalar in-order CVA6S+ and vanilla, single-issue in-order CVA6, all implemented in a 22nm technology and integrated into Cheshire, an open-source modular SoC. We examine the performance and efficiency of different microarchitectures using the same ISA, SoC, and implementation with identical technology, tools, and methodologies. The area and performance rankings of CVA6, CVA6S+, and C910 follow expected trends: compared to the scalar CVA6, CVA6S+ shows an area increase of 6% and an IPC improvement of 34.4%, while C910 exhibits a 75% increase in area and a 119.5% improvement in IPC. However, efficiency analysis reveals that CVA6S+ leads in area efficiency (GOPS/mm2), while the C910 is highly competitive in energy efficiency (GOPS/W). This challenges the common belief that high performance in superscalar and out-of-order cores inherently comes at a significant cost in area and energy efficiency.

arXiv.org
Clockhands For Faster CPU Execution

When you design your first homebrew CPU, you probably are happy if it works and you don’t worry as much about performance. But, eventually, you’ll start trying to think about how to mak…

Hackaday

H. Xiao and S. Ainsworth, "Hacky Racers: Exploiting Instruction-Level Parallelism to Generate Stealthy Fine-Grained Timers"¹

Side-channel attacks pose serious threats to many security models, especially sandbox-based browsers. While transient-execution side channels in out-of-order processors have previously been blamed for vulnerabilities such as Spectre and Meltdown, we show that in fact, the capability of out-of-order execution itself to cause mayhem is far more general.
We develop Hacky Racers, a new type of timing gadget that uses instruction-level parallelism, another key feature of out-of-order execution, to measure arbitrary fine-grained timing differences, even in the presence of highly restricted JavaScript sandbox environments. While such environments try to mitigate timing side channels by reducing timer precision and removing language features such as SharedArrayBuffer that can be used to indirectly generate timers via thread-level parallelism, no such restrictions can be designed to limit Hacky Racers. We also design versions of Hacky Racers that require no misspeculation whatsoever, demonstrating that transient execution is not the only threat to security from modern microarchitectural performance optimization.
We use Hacky Racers to construct novel backwards-in-time Spectre gadgets, which break many hardware countermeasures in the literature by leaking secrets before misspeculation is discovered. We also use them to generate the first known last-level cache eviction set generator in JavaScript that does not require SharedArrayBuffer support.

#arXiv #ResearchPapers #OutOfOrderExecution #Spectre #Meltdown #SpeculativeExecution

__
¹ https://arxiv.org/abs/2211.14647

Hacky Racers: Exploiting Instruction-Level Parallelism to Generate Stealthy Fine-Grained Timers

Side-channel attacks pose serious threats to many security models, especially sandbox-based browsers. While transient-execution side channels in out-of-order processors have previously been blamed for vulnerabilities such as Spectre and Meltdown, we show that in fact, the capability of out-of-order execution \emph{itself} to cause mayhem is far more general. We develop Hacky Racers, a new type of timing gadget that uses instruction-level parallelism, another key feature of out-of-order execution, to measure arbitrary fine-grained timing differences, even in the presence of highly restricted JavaScript sandbox environments. While such environments try to mitigate timing side channels by reducing timer precision and removing language features such as \textit{SharedArrayBuffer} that can be used to indirectly generate timers via thread-level parallelism, no such restrictions can be designed to limit Hacky Racers. We also design versions of Hacky Racers that require no misspeculation whatsoever, demonstrating that transient execution is not the only threat to security from modern microarchitectural performance optimization. We use Hacky Racers to construct novel \textit{backwards-in-time} Spectre gadgets, which break many hardware countermeasures in the literature by leaking secrets before misspeculation is discovered. We also use them to generate the first known last-level cache eviction set generator in JavaScript that does not require \textit{SharedArrayBuffer} support.

arXiv.org