Wade Brainerd

58 Followers
110 Following
67 Posts
Programmer in Portland, ME
Websitehttps://wadeb.com/
GitHubhttps://github.com/wadetb
A thing happened that I had been vaguely worried about - my kid read my website 😬

New article! What do you do when profiling your code shows the slowdown isn't in your code, but deep in the kernel? Why, you grab the kernel source and go spelunking.

How a routine profiling session turned into a Linux kernel patch:

https://rovarma.com/articles/from-profiling-to-kernel-patch-the-journey-to-an-ebpf-performance-fix/

From profiling to kernel patch: the journey to an eBPF performance fix | Ritesh Oedayrajsingh Varma

A story about how an innocent profiling session led to a change to the Linux kernel that makes eBPF map-in-map updates much faster.

HEY EVERYBODY! We're doing REAC again and we are announcing call for submissions! Please come and talk about interesting problems! From the call:

"Talks about pipelines, workflows, culture, hard-earned lessons from the experience of engine-making, experiments and failures - all the topics that are fundamental to our craft, but hardly feature in traditional computer-graphics venues, are welcome.”

We look forward to hearing from you!

https://enginearchitecture.org/2025.htm

REAC: 2025 Conference.

Google LLM assistant really struggles - they should be able to make this kind of interaction bulletproof and I don't understand why it's like this? I asked for a simple reminder and it messed up the time math, off by roughly 12 hours.
@TomF @mtothevizzah @dotstdy @archo The main goal here is to use RAM and I/O bandwidth more efficiently, but it depends on low latency streaming - ideally a few frames.
I/O bandwidth and RAM aren't our worst problems on that kind of platform as long as we're still shipping COD on PS4, and we'd still need to ship all that texture data to players somehow.
@TomF @mtothevizzah @dotstdy @archo
Ideally, register hits only when the pixel shader detects _magnification_ into the top 2 mips, so the atomic adds only fire when there are unstreamed-yet-needed pages. This will oscillate if on-screen needs exceed RAM, but that's extremely unlikely, and would be hidden by TAA anyway. But it'd eliminate all pixel shader cost in stable regions of the frame.
@TomF @mtothevizzah @dotstdy @archo The pixel shader registers hits w/atomic adds to the buffer, for only X% (1/1000?) of pixels chosen randomly each frame, and only when the top 2 mips of a streamed texture are sampled - identified via texcoord derivative + shared across all texture channels - NOT hw tracking.
@TomF @mtothevizzah @dotstdy @archo The CPU's job is just to keep the top-hit-pages resident.
On PS5 a compute shader could do this and probably write out the IO command buffer too :)
@TomF @mtothevizzah @dotstdy @archo Ok, speaking roughly again, you'd maintain a stochastic hit counter for each page and decay it each frame - 16GB of RAM equals 16MB of u32-per-page atomic hit counters, and that could be made smaller by exploiting higher level knowledge.
@mtothevizzah @TomF @dotstdy @archo It'd maximize the memory-saving-precision of sampler feedback, while limiting worst case blurriness from camera cuts.
Aki and I had a design for an intern to try on PS5 a few years ago but it didn't get done.