Thinking about moving more of the 100baseT1 decode pipeline to the GPU. This is definitely going to end up being one of the more heavily end to end accelerated protocol decodes in the library, at least for now

Ok yeah it's definitely going to happen and i have a plan.

After some data shuffling that currently happens on the CPU but will probably move to GPU long term, I'll run one GPU thread per detected packet (packet start search already happens on GPU) and decode the rest of the packet out to timestamps and data bytes.

Then call out to the existing CPU based logic (for now) to go from a set of bytes + timestamps to Ethernet waveform segments and packets (that last bit is impossible to GPU until/unless I change the data model for packets. Which probably does need to happen but is going to take some refactoring)