Mastodawn

Arvīds Kokins Dec 6, 2024

Was thinking again a while ago what waste PBR textures can be under most lighting.

Kind of weird to do a 4x texture memory increase - assuming BC1-5 and no alpha/metalness, e.g. BC1 base color + BC5 normal map & BC4 roughness - that will only show up under specific lighting conditions and elsewhere appears flat.

Though doubling texture resolution in both dimensions is also a 4x increase that might never show up (esp. with upscaling) so all things considered, maybe 4x isn't that bad.

Tom Forsyth Dec 6, 2024

@archo Yeah but at sensible resolutions, the higher-rez textures will never be loaded. So all they're wasting is disk space and their own production time. Whereas PBR is burning my precious DRAM for minor LSB differences. Boooooo.

(I say this with honest love to all my PBR shader writers)

(it's a joke. This is a bit)

(or is it)

Arvīds Kokins Dec 6, 2024

@TomF Texture loading time from disk and download time/bandwidth would be wasted as well, which seems relevant with modern 150 GB games. (Also not a lot of such games fit on 1 TB consoles.)

But I suspect for high res textures there could be a way to stream the biggest mips on-demand from the CDN to the GPU based on what the GPU needs which AFAIK nobody's doing yet. PBR seems uniquely disadvantaged in this regard (but at least the number of parameters doesn't seem to be growing infinitely).

Josh Simmons Dec 6, 2024

@archo @TomF cod already does network streaming for high res mips :')

Josh Simmons Dec 6, 2024

@archo @TomF https://www.youtube.com/watch?v=5RurbYnwZ6Y

On Demand Texture streaming - How we made all our Cod's fit on one PS4

YouTube

Arvīds Kokins Dec 6, 2024

@dotstdy @TomF IIRC there was also some software that used a virtual file system to download Steam game files on-demand (I forgot the name), which together with local mip streaming from separate files would work similarly.

The part I was specifically thinking nobody's done yet was detecting which mips are actually being requested by the pixel shader. It seems this wasn't mentioned in the video but I'm curious if they've been experimenting with something like that as well.

Tom Forsyth Dec 6, 2024

@archo @dotstdy It's called "virtual texturing" and it's been tried many ways over the years. My favourite talk is this one by Sean Barrett: https://www.youtube.com/watch?v=MejJL87yNgI

Also called "megatextures" and implemented in the "id Tech 5" engine and used in a bunch of games, most notably Rage.

I don't remember if it's persisted in newer engines like UE5. There's a certain overhead to using it, and I think most of the time analytic solutions are just as effective and cheaper e.g. https://tomforsyth1000.github.io/blog.wiki.html#%5B%5BKnowing%20which%20mipmap%20levels%20are%20needed%5D%5D

Virtual Textures (aka Megatextures) talk (2008)

YouTube

Tom Forsyth Dec 6, 2024

@archo @dotstdy Hardware implementations go as far back as 1998 when we showed a Permedia3 streaming a giant dataset, all demand-paged in only when the pixel shader requested the data.

The problem with all these demand-paged approaches is you get this gigantic stall in the middle of rendering your scene.

Tom Forsyth Dec 6, 2024

@archo @dotstdy So you need to solve this with two things:

1. have a fallback, i.e. lot the pixel shader use the mipmap level it has, not the one it wants, and fetch the desired data async.

2. predict what the shader will want and prefetch aggressively. OK, but how do you do that? You use the analytic methods. But this reduces the advantage of the pixel shader method, but the costs are still there.

Arvīds Kokins Dec 7, 2024

@TomF @dotstdy Oh yeah I'm aware of VT, was specifically talking about detecting shader demand in the context of streaming content from the internet (not even as granularly as VT impls do it, whole mips would be fine), sorry for the confusion.

It seems as if the solutions have been implemented 90% from one end (megatextures going from disk to GPU) and 90% from the other end (mips downloaded from a CDN depending on upcoming content/possibly draw calls) but haven't seen an end-to-end solution.

Tom Forsyth Dec 7, 2024

@archo @dotstdy Got it. Xbox games can download in the background, and will stop if you hit a chunk that hasn't arrived yet, but doing mip-by-mip would indeed be pretty aggressive!

I bet Flight Simulator does this. It streams the entire world, so it kinda has to, right?

Arvīds Kokins Dec 7, 2024

@TomF @dotstdy I suspect that for Flight Simulator it's easy to determine how far each model is from the camera so they likely wouldn't have to measure at pixel level to start downloading some chunk of the world. But I haven't looked into how their streaming works.

Josh Simmons Dec 7, 2024

@archo @TomF tbh i'm not sure many titles use sampler feedback for their streaming. it seems kinda not really that effective in practice, and the existing huge piles of heuristics work quite well even if it's a bit of a nightmare.

Josh Simmons Dec 7, 2024

@archo @TomF but there's a difference there between two problems, one being "how do i decide which mips to load" where the choices are between "feedback from the gpu sampling" and "cpu heuristics". And then the second problem is "where do i pull those mips from", where game like cod and flight simulator are pulling (some of) them dynamically from the network, and most games just pull them from the disk package (maybe with a "high res texture dlc" for the stupid mip levels).

Josh Simmons Dec 7, 2024

@archo @TomF In practice with sampler feedback (at least with the feature, ymmv if you're doing something bespoke with software paging) it seems that you really can't pull a large amount of data without compromising performance. So in order to use it you're stochastically sampling sampler feedback at an extremely low rate. Plus, I think titles just don't need to use it when they already have a highly tuned, working, texture streamer, so there's not necessarily a huge demand to change it up.

@dotstdy @archo Also if you rely only on sampler feedback, as the camera turns, the edge of the screen (which is where your eyes are naturally looking) are always wrong. I like the term "Just Too Late" rendering instead of Just In Time. It looks kinda crappy.

So as Josh says, you always need good PREDICTIVE heuristics anyway, and if you do then why even use sampler feedback?

Michael Vance Dec 7, 2024

@TomF @dotstdy @archo Turnkey sampler feedback failed the "no implicit behavior" test of inserting bits into your shaders, as well. But the primary piece is as you note, that your predictive heuristics are needed and Good Enough(tm). The most interesting uses for sampler feedback are actually for offline analysis in tooling to build offline CPU-side predictive guidance hierarchies/etc. for your streamer.

@wadeb is on here if you want to ask specifics about COD on-demand texture streaming.

Wade Brainerd Dec 18, 2024

@mtothevizzah @TomF @dotstdy @archo I feel like there's still space for stochastic sampler feedback - but not as an alternative to traditional streaming, and not as a side-channel hw feature.

Wade Brainerd Dec 18, 2024

@mtothevizzah @TomF @dotstdy @archo Roughly, the usual predictive streamer manages through mip N-2, then the largest mips are managed page-by-page via stochastic sampler feedback.

Wade Brainerd Dec 18, 2024

@mtothevizzah @TomF @dotstdy @archo It'd maximize the memory-saving-precision of sampler feedback, while limiting worst case blurriness from camera cuts.
Aki and I had a design for an intern to try on PS5 a few years ago but it didn't get done.

Tom Forsyth Dec 18, 2024

@wadeb @mtothevizzah @dotstdy @archo The real problem is eviction. If you only get data back to the CPU about page faults, how does it know what it can safely evict? You need some sort of frame counter on each page, and then the GPU has to check those and tell the CPU which pages are LRU, and it's all getting excitingly complex and expensive again. I mean - I realise this is all how standard CPU VM works - it's really really ugly and "how does this function at all" and so on, but...

Wade Brainerd Dec 18, 2024

@TomF @mtothevizzah @dotstdy @archo Ok, speaking roughly again, you'd maintain a stochastic hit counter for each page and decay it each frame - 16GB of RAM equals 16MB of u32-per-page atomic hit counters, and that could be made smaller by exploiting higher level knowledge.

Wade Brainerd Dec 18, 2024

@TomF @mtothevizzah @dotstdy @archo The CPU's job is just to keep the top-hit-pages resident.
On PS5 a compute shader could do this and probably write out the IO command buffer too :)

Wade Brainerd Dec 18, 2024

@TomF @mtothevizzah @dotstdy @archo The pixel shader registers hits w/atomic adds to the buffer, for only X% (1/1000?) of pixels chosen randomly each frame, and only when the top 2 mips of a streamed texture are sampled - identified via texcoord derivative + shared across all texture channels - NOT hw tracking.

Wade Brainerd Dec 18, 2024

@TomF @mtothevizzah @dotstdy @archo
Ideally, register hits only when the pixel shader detects _magnification_ into the top 2 mips, so the atomic adds only fire when there are unstreamed-yet-needed pages. This will oscillate if on-screen needs exceed RAM, but that's extremely unlikely, and would be hidden by TAA anyway. But it'd eliminate all pixel shader cost in stable regions of the frame.

Wade Brainerd Dec 18, 2024

@TomF @mtothevizzah @dotstdy @archo The main goal here is to use RAM and I/O bandwidth more efficiently, but it depends on low latency streaming - ideally a few frames.
I/O bandwidth and RAM aren't our worst problems on that kind of platform as long as we're still shipping COD on PS4, and we'd still need to ship all that texture data to players somehow.

Tommy Schmid Dec 7, 2024

@TomF @dotstdy @archo On PC, definitely. Not only do you have to contend with the latency of readback, and potentally spinning platters, but the API also requires all the shaders be instruments. Bang for buck is awful.
On consoles with shared memory and dedicated IO HW though, you can get the just too late time down to 1-2 frames after the first texel is requested, and at that point you can probably get away with some _really_ dumb heuristics for fallbacks, and only predicting on camera cuts.

Arvīds Kokins Dec 7, 2024

@TomF @dotstdy I would agree with "just too late" looking crappy but it's also ubiquitous and happens in ways that are far more noticeable (e.g. culling).

I also specifically recall GR:Wildlands (2017) doing "just too late" mip streaming, most noticeably with road textures (but probably at draw call granularity and going all the way between low and high LOD).

I agree with SF not adding much though. It at most removes the need for reducing detail in graphics options manually.

Tom Forsyth Dec 7, 2024

@archo @dotstdy Yeah I really hate just-too-late pixel-feedback culling especially. It's really obvious. Pop pop pop.

In general the problem with occlusion culling is it doesn't help the worst case, and although it helps framerate in the normal case, it also adds artifacts. Makes me very hesitant to add it to any engine - very poor bang-for-the-buck.