this is such a good debugging story ("Rust std fs slower than Python!? No, it's hardware!ā€) https://xuanwo.io/2023/04-rust-std-fs-slower-than-python/
Rust std fs slower than Python!? No, it's hardware!

An infrastructure engineer, focused on distributed storage system

@b0rk Excellent in-depth dive into how to investigate these types of issues! strace helped my team identify that Python was segfaulting due to UCS-4 string use attempting to allocate space for a >57KiB regular expression.

(Switch the internal representation to UCS-8, the problem goes away. Only happened in dev/test, 'cause only in dev/test do we enable every single route in the entire application, resulting in that monster regex. Thanks, Django.)

strace is love.
strace is life.

@b0rk But this also touches on a peculiarity of Python I've mentioned before. There are edge cases where Python is legitimately faster than equivalent C code. Memory pre-allocation including no-mmove array/list growth, complex dead code removal, and so forth.

Python cheats.
As much as it can possibly get away with.

We joke that Python's hash table implementation powering sets, dictionaries, &c. couldn't be optimized further without risking the creation of a singularity.

"Complex dead code removal" being, for example, the deletion of the entirety of this function, becoming a no-op.

def do():
n = 0
for i in range(27*10**42):
n += 2

n is never used, so the loop is irrelevant and just goes away. (Pypy JIT to blame for this one.)

C… would iterate 27 tredecillion times to do nothing.

@alice C compilers (when optimisation is turned on) can be even smarter, e.g. https://godbolt.org/z/Y7q8PE485 shows GCC turning something similar to your loop into just a constant return. Clang can do even crazier stuff like turning sum over integers into n(n+1)/2 (slightly different to prevent overflow): https://godbolt.org/z/PPKsn1Y8q. More similar stuff here: https://www.youtube.com/watch?v=bSkpMdDe4g4
Compiler Explorer - C (x86-64 gcc (trunk))

int do_stuff(void){ int n = 0; for (int i = 0; i < (1 << 29); i+=3){ n+=2; } return n; }

@Smoljaguar Indeed; compiler tech (especially LLVM-derived stuff) has come a long way since the original "Pypy faster than C (for a carefully crafted example)" articles. I'm old. 😜 "Constant expression" optimization was a fun thing to dabble with in my own esolang. (In my case, helped by expression dependency graphing; if all dependencies are known at compile time, that expression be a constant.)