We brought significant architectural advancements and feature set improvements to A19/M5/M5 Pro and Max GPUs:
- Scalable GPU Neural Accelerators
- 2nd Generation Dynamic Caching Shader Architecture
- 3rd Generation Ray Tracing Acceleration
- The best GPU Driven Pipeline Architecture is even better
- Many new graphics features that makes games and Pro Apps even better
- Rate increases and new performance features throughout the design

We released two tech talks.

We released two tech talks today going over how to take advantage of the new architecture, features and associated developer tools.

Accelerate your machine learning workloads with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111432/

https://www.youtube.com/watch?v=wgJX1HndGl0Boost

your graphics performance with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111431/

https://www.youtube.com/watch?v=_5yEcJfB6nk

Accelerate your machine learning workloads with the M5 and A19 GPUs - Tech Talks - Videos - Apple Developer

Discover how to take advantage of the M5 and A19 GPUs to accelerate machine learning. Find out how to use the Neural Accelerators inside...

Apple Developer
@gavkar btw, none of the links work for me :)
Accelerate your machine learning workloads with the M5 and A19 GPUs | Apple Developer

Discover how to take advantage of the M5 and A19 GPUs to accelerate machine learning. Find out how to use the Neural Accelerators inside each GPU core to per...

YouTube
@castano thanks. Copy paste to Mastodon deleted some line breaks.
I fixed the original post. Thanks for letting me know
@gavkar Are there any changes to the internal lossy texture encoding? I’m writing an article comparing the different implementations (PBRIC4, AFRC). Wondering if there are any differences between Apple generations or whether I should expect the same results.
@castano there are some changes we had to make to make it work for scattered writes. If you are asking for whether the compression ratios have changed for the same content, of top of my head, there should not be much difference.

@gavkar Ah, I’ll have to get an M5 to benchmark it then. I spent some time reverse engineering the format, will have to revisit that. I imagine smaller block sizes may help implement scattered writes.

BTW, what’s the use case? What applications/algorithms benefit from this feature?

@castano @gavkar M5 Max here happy to run benchmarks.
@schwa @gavkar Thanks for the offer, the app and assets are not really designed for redistribution, but I can clean it up and package it if you are willing to run some unsigned code locally.
The output is some html with tables like this:
@castano @gavkar Happy to.

@schwa @gavkar Thanks for sharing the results!

That's exactly the same quality results as the M4, so I'm guessing it's exactly the same format (unless the changes only occur when enabling compute stores).

@castano almost all games use compute shaders. When developers marked a texture “compute write” we disabled lossless compression. They could make it compressed again by “optimizeForGPU”.
On m5 we support scattered writes as well. Which means even for textures marked as “compute write” they get lossless compression.

Lossy compression (I am assuming you don’t mean ASTC, BC) is on the same HW path.

If I misunderstood you and you meant ASTC or BC then you should expect no changes.

@gavkar @castano If MTLComputeEncoder::optimizeContentsForGPUAccess recompresses a texture, how does the driver know when to decompress it again for upcoming compute writes, considering that Metal doesn't have image layouts?
@k0bin @gavkar I imagine that if the texture will be written to, the block is decompressed when loaded to local mem, and compressed on the fly when evicted.

@gavkar No, that's what I meant. I always hoped that tile shaders would prove useful to compress on tile eviction, even though in practice it did not perform well. But the compute use case is something I don't see how you could replicate with a custom codec.

I imagine you have to keep the block contents in local memory/registers and compress when evicted. Maybe the new dynamic scheduler is good enough to handle that sort of workload?

@gavkar Will you also publish recordings of the GDC talks?
@k0bin I am not sure what happens with those. I will check and let you know when I get a response.