Today I learned that x86 has a floating point version of the `nop` instruction, called `fnop`. It does nothing, but unlike the regular `nop` it uses the floating point unit to do nothing.
@acqrel please tell me there are simd variants too
@ethan Not that I'm aware of. But the regular `nop` can take a memory operand like `nop dword ptr [rax]` if you want a `nop` that does nothing with the address.

@acqrel @ethan https://elixir.bootlin.com/linux/v6.13.7/source/arch/x86/include/asm/nops.h#L41 has a collection of nops by length, from 1 byte up to 11 bytes

and Linux has code for optimizing adjacent nops into bigger nops: https://elixir.bootlin.com/linux/v6.13.7/source/arch/x86/kernel/alternative.c#L223

nops.h - arch/x86/include/asm/nops.h - Linux source code v6.13.7 - Bootlin Elixir Cross Referencer

Elixir Cross Referencer - source code of Linux v6.13.7: arch/x86/include/asm/nops.h

@acqrel @ethan Does this mean one can smuggle data inside nops operands/variants? 
@squeakable @acqrel I would be super interested if there's been a practical instance of nop stenography
@acqrel @ethan There's also `lock xchg eax, eax` if you want to crash some old systems (f00f bug). Which does nothing, but with the memory bus locked. The bug in those old systems was that it unlocked the memory bus when it did a thing, but since this does nothing it never does.
@acqrel Leftover from when FPU was on a separate (optional) chip?
@acqrel Things you didn't know existed... I guess it's a remnant from the x86 chip being physically distinct from the x87 (FPU)?
What's the difference between the x86 NOP and FNOP instructions?

I was reading the Intel instruction manual and noticed there is a 'NOP' instruction that does nothing on the main CPU, and a 'FNOP' instruction that does nothing on the FPU. Why are there two separ...

Stack Overflow
@etchedpixels @acqrel Very interesting! Thanks for the link!
@acqrel basically me

does extra steps and achieves
nothing
@acqrel you might be delighted to learn that itanium has five different nops, one for each kind of execution unit. they need to be bundled in proper batches for optimal parallelism in nothing-doing, but good assembler will do that for you
@acqrel As someone who writes amd64 assembly, this is the tip of the iceberg
That ISA runs on pure nightmare fuel
@acqrel This is cursed and fumny and now I'm curious what the point is.
@acqrel is this a side effect of when the fpu was a separate chip or?
@acqrel @darkphoenix unlike nop, which is technically a xchg ax, ax, fnop is a real fnop IIRC. Which is floating pointless.
@acqrel am confused. Need someone to make a tier list of NOPs
@acqrel well, I have several different ways of doing nothing, too. Why should a CPU not have the same? 🤣

@acqrel that reminds me of the famous IBM mainframe utility IEFBR14. It did nothing, but was useful to e.g. allocate space for files with no other side effects. It famously consisted of one one-byte instruction (BR 14, meaning: jump back).

Nevertheless, over the years, several updates to the program have been rolled out 😆

@acqrel fnop absolutely does something though. It can cause a waiting fp exception to be delivered to the cpu, so invoking fnop and asserting that nothing weird will happen will eventually cause stuff to blow up. :D

@acqrel just for the fun of it, i made a simple assembly program that calls nop/fnop a hundred thousand times, then loops ten thousand times, for a total of one trilliion nops.

on my system (fedora 41 amd64, i7-13700HX), fnop is a whopping five times slower than nop. additionally, the nop binary comes out to 102 KiB, while fnop is 200 KiB. nop is a one byte instruction and fnop is two bytes, so that makes sense.

under qemu-x86_64-static, the nop binary ran in only 10.8ms (faster than native!), while fnop took over twenty-four seconds! clearly qemu needs to better optimise the extremely important use case of running billions of floating point no-ops.

@acqrel The FPU is really a co-processor. In the ancient days, the 80387 was literally a different chip, which did it's own bus mastering, etc. so if you wanted to do something like hit a slow IO port, (from the FPU... cause reasons), then you might need to fnop it. But more usefully, also you might need the fnop in there so you have extra bytes to patch code. A really interesting way in which patching occured years ago: https://www.youtube.com/watch?v=-vW21ziRsLk
Why were Inky and Sue's AI not updated for Ms. Pac-Man?

YouTube

@acqrel that dates back to the times when the 80x87 family of Math Coprocessors was a thing.

https://www.lo-tech.co.uk/wiki/80x87_Math_Coprocessors

My first PC had a AM386 SX-25 CPU from AMD and I later got a IIT 3C87-25 FPU for it.

This days ended with the Intel 486DX that came with the FPU integrated and it's in every x86 ever since.

#history #fpu

80x87 Math Coprocessors - Lo-tech Wiki

@acqrel @http Now we just need AVX NOP as well
@acqrel My favourite NOP (what, you don’t have a favourite NOP? Weirdo) is the MIPS SSNOP. On MIPS, most privileged-mode things were managed by coprocessor 0, which held the MMU and so on. MIPS was designed to minimise hardware synchronisation (hence branch delay slots, needing to read %hi at least two instructions after a multiply, and so on) and most coprocessor 0 operations took multiple cycles to complete. The OS programmer was required to put a bunch of NOPs after most of these operations (in theory you could execute other instructions as long as they were fine seeing either state, but definitely don’t access memory until the MMU is updated). Then came superscalar implementations. Suddenly, NOPs (which were actually encoded as a shift by 0) went to the ALU pipeline but loads and stores could access in parallel. How do you fix that? You introduce a superscalar NOP, which does nothing in all pipelines for one cycle.

But is there also some kind of negative `fnop`, @acqrel ?

What about nan `fnop`? Negative-NaN-fnop? Next-float-after-fnop-by-bitwise-value-fnop?

@acqrel so... it does nothing to a certain degree of accuracy? Sounds about right 😁 - but given the history of x86 where the fpu was a distinct chip, it might make sense...
@acqrel @vulpescubile Sometimes the floating point unit needs a break too!
@acqrel ah but the difference is that one is a part of the actual CPU and the other is the 8087 FPU "do nothing" instruction lol

deeply unserious architecture
@acqrel
Floating NOP : that is totally zen or nirwana