GDC 2015: “Enhancing Your Unity Mobile Games (Presented by ARM)” by Carl Callewaert of Unity Technologies, Roberto Lopez Mendez of ARM, Tony Prosser of Realtime UK, and Angelo Theodorou (@encelo?) of ARM https://gdcvault.com/play/1022408/Enhancing-Your-Unity-Mobile-Games

I thought this presentation was mostly useless.

The first presenter’s section was so high-level that I didn’t get anything from it. They covered topics like “mobile devices use batteries” and “when you optimize, focus on the performance bottleneck.”

1/6

Enhancing Your Unity Mobile Games (Presented by ARM)

The ARM Guide to Unity shows developers how to get the most out of Unity when developing under the unique challenges of mobile platforms.The talk offers a preview of the next release of the ARM Guide to Unity and highlights the most relevant...

And “mobile devices don’t have as much bandwidth as desktop devices, so consider using texture compression.”

They did say one thing I thought was interesting: 4x MSAA is free on Mali GPUs. Perhaps they have some kind of lossless compression support that they use under the hood? Such support is quite common; I wish they could talk more about that.

The next presenter described adjusting reflection cubemaps so they align better, and then applying this principle to shadows.

2/6

@GDCPresoReviews 4x MSAA is almost free on Mali GPUs because you can keep your multisampled data on tile memory and only write back the resolved final output. This works when you do your main scene render pass and manage to keep your data in tile throughout and before you need to start doing your postprocess, you do your storeop out to main memory but have the GPU resolve your MSAA samples before your tile goes back to main mem.
@Biovf right, the reason I mentioned compression is that, when the tile memory is finally flushed to main memory, without compression, there would be 4x the amount of data, which doesn’t sound “free” to me
@Biovf from what I understand, all GPUs implement multisampling using something like a run-length encoding scheme, which gets unpacked at resolve time. Which is interesting; I would have enjoyed a section discussing this in a hardware-specific way in this presentation

@Biovf

@TomF explained in https://mastodon.gamedev.place/@TomF/115810687409793919 that the multisampling itself isn’t actually faster on tile-based GPUs than immediate GPUs, but the *resolve* is faster if it’s done in the same render pass that produced the samples. (Because there’s no round-trip to memory)

Tom Forsyth (@[email protected])

@GDCPresoReviews @sol_hsa Because the whole tile is rendered on-chip, then the MSAA is resolved to a single colour per pixel, then that is written to memory. None of the subpixel colour/depth data ever goes to DRAM. Whereas a conventional GPU might compress & cache that MSAA data, but it can't cache the ENTIRE screen, so some of it is going to leak out to real DRAM.

Gamedev Mastodon
@GDCPresoReviews Sure, that makes complete sense. Perhaps I wasn't clear but that was what I intended to express with my messages.
I did mention that you only write back to DRAM the resolved output and also gave an example of this working within a single render pass to highlight how it becomes benefitial. I do admit that I didn't explicitly bridge it to immediate mode GPUs so that's on me for failing to do that

@Biovf

I suppose I may be the only person in the universe that didn’t assume that “multisampling” meant “multisampling and resolving within the same render pass” 😅

@GDCPresoReviews it's all good, this stuff could be explained better anyway.
On mobile GPUs it is always the same lesson: bandwidth is lava so minimise it at all costs. Reading in and out of tile is what kills your perf in 90% of cases so minimise it. Not only that but ALU has grown orders of magnitude over the years whereas mem bw has barely had linear growth so that ends up, in real world use cases, being your main limiting factor
@GDCPresoReviews When you're designing your solutions for mobile that means you need to think very differently from desktop because it is not just about speed but the GPU arch will influence how you should author your solutions and algorithms. Most folks ignore that and then complain that everything was fast on desktop but sucks on mobile

@Biovf

(Hence: amazing-looking Switch games)