I used this to create JSON CPU tests for the 8080 and V20 CPUs. You can find those here:
http://github.com/singlestepTests/8088/
http://github.com/singlestepTests/V20
These test suites include 10,000 executions of each opcode, including undefined opcodes, capturing the entire bus state of each instruction.
These can be used to data-mine statistics about instruction execution, flag usage, or verify an emulator's accuracy.
The only reason this works is that the 8086 and the 8087 were designed as full coprocessors. The 8087 maintains an identical copy of the CPU's prefetch instruction queue.
To do so, the CPU must tell the 8087 when it reads from the queue. It has a 2-bit status value for this purpose.
Here's what that looks like.
Here we perform 1 + 2 and then PUSH PSW to capture the result and flags.
If you look carefully at the end you will see the accumulator pushed to the 8080's stack via <- w 03
I found this very convenient import for NASM that allows us to write 8080 code, so I didn't really have to do anything except import it.
I hope to be able to add tests for the V20's 8080 instruction set to my V20 test suite.
It will be a bit awkward as you'll have to analyze the bus states to check the flags, but, maybe I can do a little post-processing magic to extract them for you and add them to the JSON.
I still have some ideas - if the first thing we do after resetting the 286 is set the trap flag, then we should be able to detect execution of the trap handler as our instruction boundary.
This will of course slow things down incredibly, but I'm not sure what other option there is.
But if that method works, we could in theory even do this with a 386. There was a CMOS 386 designed for embedded devices with a 16-bit data bus, the 386 EX.
There are just barely enough pins on an Arduino GIGA to connect everything.