Mastodawn

Gokhan Avkarogullari 2d ago

We brought significant architectural advancements and feature set improvements to A19/M5/M5 Pro and Max GPUs:
- Scalable GPU Neural Accelerators
- 2nd Generation Dynamic Caching Shader Architecture
- 3rd Generation Ray Tracing Acceleration
- The best GPU Driven Pipeline Architecture is even better
- Many new graphics features that makes games and Pro Apps even better
- Rate increases and new performance features throughout the design

We released two tech talks.

Gokhan Avkarogullari

We released two tech talks today going over how to take advantage of the new architecture, features and associated developer tools.

Accelerate your machine learning workloads with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111432/

https://www.youtube.com/watch?v=wgJX1HndGl0Boost

your graphics performance with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111431/

https://www.youtube.com/watch?v=_5yEcJfB6nk

Accelerate your machine learning workloads with the M5 and A19 GPUs - Tech Talks - Videos - Apple Developer

Discover how to take advantage of the M5 and A19 GPUs to accelerate machine learning. Find out how to use the Neural Accelerators inside...

Apple Developer

Ignacio Castaño🍉2d ago

@gavkar btw, none of the links work for me :)

Ignacio Castaño🍉2d ago

@gavkar here are the proper links:

https://www.youtube.com/watch?v=wgJX1HndGl0Boost

https://www.youtube.com/watch?v=_5yEcJfB6nk

Accelerate your machine learning workloads with the M5 and A19 GPUs | Apple Developer

Discover how to take advantage of the M5 and A19 GPUs to accelerate machine learning. Find out how to use the Neural Accelerators inside each GPU core to per...

YouTube

Gokhan Avkarogullari 2d ago

@castano thanks. Copy paste to Mastodon deleted some line breaks.
I fixed the original post. Thanks for letting me know

Ignacio Castaño🍉2d ago

@gavkar Are there any changes to the internal lossy texture encoding? I’m writing an article comparing the different implementations (PBRIC4, AFRC). Wondering if there are any differences between Apple generations or whether I should expect the same results.

Gokhan Avkarogullari 2d ago

@castano there are some changes we had to make to make it work for scattered writes. If you are asking for whether the compression ratios have changed for the same content, of top of my head, there should not be much difference.

Ignacio Castaño🍉2d ago

@gavkar Ah, I’ll have to get an M5 to benchmark it then. I spent some time reverse engineering the format, will have to revisit that. I imagine smaller block sizes may help implement scattered writes.

BTW, what’s the use case? What applications/algorithms benefit from this feature?

Jonathan Wight 2d ago

@castano @gavkar M5 Max here happy to run benchmarks.

Ignacio Castaño🍉2d ago

@schwa @gavkar Thanks for the offer, the app and assets are not really designed for redistribution, but I can clean it up and package it if you are willing to run some unsigned code locally.
The output is some html with tables like this:

Jonathan Wight 2d ago

@castano @gavkar Happy to.

Ignacio Castaño🍉1d ago

@schwa @gavkar Thanks for sharing the results!

That's exactly the same quality results as the M4, so I'm guessing it's exactly the same format (unless the changes only occur when enabling compute stores).

Gokhan Avkarogullari 2d ago

@castano almost all games use compute shaders. When developers marked a texture “compute write” we disabled lossless compression. They could make it compressed again by “optimizeForGPU”.
On m5 we support scattered writes as well. Which means even for textures marked as “compute write” they get lossless compression.

Lossy compression (I am assuming you don’t mean ASTC, BC) is on the same HW path.

If I misunderstood you and you meant ASTC or BC then you should expect no changes.

@gavkar @castano If MTLComputeEncoder::optimizeContentsForGPUAccess recompresses a texture, how does the driver know when to decompress it again for upcoming compute writes, considering that Metal doesn't have image layouts?

Ignacio Castaño🍉2d ago

@k0bin @gavkar I imagine that if the texture will be written to, the block is decompressed when loaded to local mem, and compressed on the fly when evicted.

Ignacio Castaño🍉2d ago

@gavkar No, that's what I meant. I always hoped that tile shaders would prove useful to compress on tile eviction, even though in practice it did not perform well. But the compute use case is something I don't see how you could replicate with a custom codec.

I imagine you have to keep the block contents in local memory/registers and compress when evicted. Maybe the new dynamic scheduler is good enough to handle that sort of workload?

@gavkar Will you also publish recordings of the GDC talks?

Gokhan Avkarogullari 2d ago

@k0bin I am not sure what happens with those. I will check and let you know when I get a response.