CuDf is even worse on matrix multiplication despite being accelerated. I still do not understand what is going on (exact same workload takes 15,000ms vs 80ms???):
| Wordpress | https://felixquinihildebet.wordpress.com/ |
| Threads: | https://www.threads.net/@mean.absolute.error |