Unexpected Discovery:
Training a #RetNet model on the CPU isn't *nearly* as slow as I'd expected.
Yes, it's slower than the GPU but probably by less than a factor of 10 - not 100's of times slower.
Which means I can periodically move my model to the CPU and train on nice *long* sequence lengths realistically.
My expectations were warped by having moved from an Intel Atom netbook in 2016 to NVidia EC2 spot instances playing with simple FFNs.
Never tried running or training on CPU since.
Qiita - 人気の記事