Claire Huang wrote an undergraduate honor's thesis, supervised by @steveblackburn and @caizixian https://www.steveblackburn.org/pubs/theses/huang-2025.pdf

She uses sampling PEBS counters and data linear addressing (DLA) on Intel chips to attempt to understand the structure and attribution of load latencies in MMTk.

After identifying L1 misses in the trace loop as a significant overhead, she adds prefetching and reduces GC time by 10% or so across a range of benchmarks, and more on Zen4.

Lots of fun details in Huang's thesis: static vs dynamic prefetching (static is fine), computation of how much one could gain if cache-miss latency were eliminated, what the GC time would be if hardware prefetchers were disabled (20-80% slower; see attached figure); mutator time without prefetchers (sometimes it's better??!?); how to use "perf mem"; all good stuff!

I don't know what's in the water at ANU but they have been doing lots of great work at all levels recently

@wingo Thanks Andy.

What’s in the water? Just a wonderfully talented and intellectually generous bunch of students. In particular there’s a great culture of generosity in the lab that @caizixian and other students have built. That culture encourages talented undergrads like Claire to join and flourish. I’m very grateful.

@steveblackburn @caizixian sincere congrats on building such an environment. as a faraway « consumer » of y’all’s work i appreciate whatever it is that went into the making; the generosity shines through!

@wingo @caizixian

It did not come from nowhere. @xiyang set the tone, passing the torch to the current students. He epitomised technical excellence + intellectual generosity. Thank you, Xi 🙏