GDC 2015: “Enhancing Your Unity Mobile Games (Presented by ARM)” by Carl Callewaert of Unity Technologies, Roberto Lopez Mendez of ARM, Tony Prosser of Realtime UK, and Angelo Theodorou (@encelo?) of ARM https://gdcvault.com/play/1022408/Enhancing-Your-Unity-Mobile-Games

I thought this presentation was mostly useless.

The first presenter’s section was so high-level that I didn’t get anything from it. They covered topics like “mobile devices use batteries” and “when you optimize, focus on the performance bottleneck.”

1/6

Enhancing Your Unity Mobile Games (Presented by ARM)

The ARM Guide to Unity shows developers how to get the most out of Unity when developing under the unique challenges of mobile platforms.The talk offers a preview of the next release of the ARM Guide to Unity and highlights the most relevant...

And “mobile devices don’t have as much bandwidth as desktop devices, so consider using texture compression.”

They did say one thing I thought was interesting: 4x MSAA is free on Mali GPUs. Perhaps they have some kind of lossless compression support that they use under the hood? Such support is quite common; I wish they could talk more about that.

The next presenter described adjusting reflection cubemaps so they align better, and then applying this principle to shadows.

2/6

I’ve heard about this technique before, but I’ve never studied it, so I appreciated the explanation. They described that you actually use ray tracing to figure out where to sample the cube map. You use proxy geometry, and shoot a reflection ray from the point of reflection into the proxy geometry. Depending on where it hits, you sample a different location in the cube map.

(This explanation didn’t make sense to me, though: wouldn’t you need a BVH for the proxy geometry?)

3/6

They then describe how to use this for shadows: you bake the cubemaps with transmittance values in the alpha channel. When you sample from it, the alpha channel tells you whether your pixel is lit or not. And for soft shadows, you can just sample a lower-res mipmap of the cube map. So that’s neat.

The next section was from a company that makes offline-rendered demo reels. And I was like “um, cool?” They made the assets for a real-time demo.

4/6

They were like “our triangle budget for the whole scene was the same number of triangles we usually use in just a character’s fingernail” and my reaction was like “well what did you expect?” The worst part was that their company’s name is, no joke, “Realtime UK.”

The last section was a Unity guy showing off lighting in Unity, but it had nothing to do with ARM whatsoever. He was like “We use Enlighten! It updates as you move stuff around! So cool!” and I was like “why are you here”

5/6

Review: 3/10 they get some points for the cubemap stuff, but that’s pretty much it
@GDCPresoReviews well, considering that ARM bought Geomerics in December 2013, doing a bit of Enlighten marketing in early 2015 was a normal part of a sponsored session. 😄
@encelo Huh! I didn’t know that. 👍
@GDCPresoReviews depending on the simplicity of the proxy geo you don’t need a BVH. like, if it’s just a cube, six ifs will do the trick

@rrika

Ah, got it 👍

I had a feeling it was something like that, because the geo in their demo was really simple

@GDCPresoReviews 4x MSAA is almost free on Mali GPUs because you can keep your multisampled data on tile memory and only write back the resolved final output. This works when you do your main scene render pass and manage to keep your data in tile throughout and before you need to start doing your postprocess, you do your storeop out to main memory but have the GPU resolve your MSAA samples before your tile goes back to main mem.
@Biovf right, the reason I mentioned compression is that, when the tile memory is finally flushed to main memory, without compression, there would be 4x the amount of data, which doesn’t sound “free” to me
@Biovf from what I understand, all GPUs implement multisampling using something like a run-length encoding scheme, which gets unpacked at resolve time. Which is interesting; I would have enjoyed a section discussing this in a hardware-specific way in this presentation

@Biovf

@TomF explained in https://mastodon.gamedev.place/@TomF/115810687409793919 that the multisampling itself isn’t actually faster on tile-based GPUs than immediate GPUs, but the *resolve* is faster if it’s done in the same render pass that produced the samples. (Because there’s no round-trip to memory)

Tom Forsyth (@[email protected])

@GDCPresoReviews @sol_hsa Because the whole tile is rendered on-chip, then the MSAA is resolved to a single colour per pixel, then that is written to memory. None of the subpixel colour/depth data ever goes to DRAM. Whereas a conventional GPU might compress & cache that MSAA data, but it can't cache the ENTIRE screen, so some of it is going to leak out to real DRAM.

Gamedev Mastodon
@GDCPresoReviews Sure, that makes complete sense. Perhaps I wasn't clear but that was what I intended to express with my messages.
I did mention that you only write back to DRAM the resolved output and also gave an example of this working within a single render pass to highlight how it becomes benefitial. I do admit that I didn't explicitly bridge it to immediate mode GPUs so that's on me for failing to do that

@Biovf

I suppose I may be the only person in the universe that didn’t assume that “multisampling” meant “multisampling and resolving within the same render pass” 😅

@GDCPresoReviews it's all good, this stuff could be explained better anyway.
On mobile GPUs it is always the same lesson: bandwidth is lava so minimise it at all costs. Reading in and out of tile is what kills your perf in 90% of cases so minimise it. Not only that but ALU has grown orders of magnitude over the years whereas mem bw has barely had linear growth so that ends up, in real world use cases, being your main limiting factor
@GDCPresoReviews When you're designing your solutions for mobile that means you need to think very differently from desktop because it is not just about speed but the GPU arch will influence how you should author your solutions and algorithms. Most folks ignore that and then complain that everything was fast on desktop but sucks on mobile

@Biovf

(Hence: amazing-looking Switch games)

@GDCPresoReviews you got me. 😄 The stand-alone presentation I gave at Unite 2013 about Unity mobile optimizations I think was a bit more in depth (https://argos.vu/wp-content/uploads/2016/06/Unite_2013-Optimizing_Unity_Games_for_Mobile_Platforms.pdf). But you are still looking at a sponsored session that has to sell the product a bit. 😉

@encelo

Oh, thanks for the link to the slides; I’ll check them out 👍

Understood about the sponsored session. I’m trying to rate them on the same scale as I rate all other presentations. The sponsored ones usually don’t end up rating as highly, in general…

@encelo 2011 is the first year they have sponsored sessions in the vault