Some more random observations on dual-issue restrictions on Cortex-M7:
- Only one "shifted operand" instruction can be issued per cycle (e.g. ADD R0, R1, R2, LSL #4)
- Bitfield manipulation operations count as having shifted operands
- Sign and zero extension operations _also_ count as having shifted operands
- Immediates that don't fit entirely in the bottom 8 bits do too (e.g. AND R0, #0x7e0)
- DSP/SIMD operations can only participate in dual-issue from the _lower address_ of a pair of instructions, even if there's no data dependency -- this includes the things you'd expect but also REV for swapping bytes in a word!
- The input for a shifted operand has to be available one cycle earlier than other inputs or you take a stall (this includes bitfield operations)
- Recall that this is an in-order processor despite being dual-issue, so if the instruction at the lower address can't issue due to a stall, neither issues.
Some of these observations appear to be novel and contradict some other reverse engineers, but I'm very confident in my tests.


