This is surprising: It looks like the Cortex-M7 may be able to execute a three-argument addition in a single cycle, at least under narrow circumstances.
Specifically, it appears that a two-instruction sequence of the form
ADD.W R1, R0, R3
ADD.W R1, R1, R2
...can successfully pair and dual-issue, performing the equivalent of
R1 = R0 + R2 + R3
in one cycle. This isn't impossible; the register file has four read ports to support the fancy MAC instructions.
(Other forms of adds might also qualify, my script is just restricted to testing 32-bit instruction encodings for the time being, for simplicity.)


