I see lots of confusion around why Polars & DuckDB are faster than Pandas. It’s a combination of three things:
1) Polars & Duck are multithreaded compiled libraries while Pandas is single threaded mix of compiled numpy & Py code
2) Polars & Duck both have query optimizers that plan execution based on where your code is going. Pandas just does the steps you tell it to, in the order you tell it.
3) Polars & Duck “stream” from disk meaning they can operate on more data than will fit in RAM.