FFmpeg 8.1 just got released! I wrote a piece about the Vulkan compute codecs for the Khronos blog:
https://www.khronos.org/blog/video-encoding-and-decoding-with-vulkan-compute-shaders-in-ffmpeg
Video Encoding and Decoding with Vulkan Compute Shaders in FFmpeg

In this blog we explore how FFmpeg uses Vulkan Compute to seamlessly accelerate encoding and decoding of even professional-grade video on consumer GPUs — unlocking GPU compute parallelism at scale, without specialized hardware. This approach complements Vulkan Video's fixed-function codec support, extending acceleration to formats and workflows it doesn't cover.

The Khronos Group

@lynne Really cool stuff, thanks for your work!

I'd like to make a short comment about the "Compromises" section, specifically
> The lesson is clear: to be consistently fast, maintainable, and widely adopted, compute-based codec implementations need to be fully GPU-resident — no CPU hand-offs.

as there has been some movement with that recently that could be of interest for you - I'm still planning to write a blog post about it.

1/3

@lynne In #gstreamer 1.28 we added a (Linux only) feature to use udmabuf to allocate buffers in a way so they can be directly be imported by GPUs and display engines. The goal here was zero-copy video playback in e.g. the #Gnome video player, allowing pass through buffers from libav, dav1d etc. directly to rendering engines and possibly further to #Wayland compositors (and again further to the display engine).

2/3

@lynne
On devices with unified memory that essentially reduces the cost of the handover to a CPU cache flush - and AFAICS unified memory appears to become more common even for workstations.

For video playback this can improve performance with sw decoding quite a bit. I *think* this approach could also work for / help with hybrid de-/encoders. So for codecs where a pure GPU approach is not feasibly this *could* be something to look into going forward, helping from a different angle :)

3/3

@rmader I very well know about mapping RAM data into the GPU memory space to use it without copying in Vulkan, I practically wrote the book on it, and was the first one to figure out you can map any address, even unaligned addresses.
We even let the GPU use the compressed codec packets directly from RAM with no copies.

But hybrid decoders are rubbish, and will always be rubbish, except on rubbish devices where they may be slightly faster, but rubbish devices are rubbish anyway. I know that, because I know a developer spent a year forcing GPU onto dav1d, only to end in a disappointment.

@lynne Oh, ok...thanks, that's good to know and sad to hear :(

I was under the impression that it was at least partially an adoption problem - i.e. drivers/APIs etc. not being ready - will trust you on your judgement, though, if that has already been tried extensively.

FTR.: the "rubbish" devices where I'm still hoping to see improvements are e.g. #linuxmobile phones or older laptops where hardware codecs are often hard to get working well, or are missing for modern codecs like AV1.