Mastodawn

aeva Mar 9, 2025

if you think about it, vulkan logical device implies the existence of vulkan illogical device

aeva Mar 9, 2025

I'm slowly working through the vulkan spec writing a compute-only vulkan program from scratch that doesn't render anything, and it's going pretty well because the spec is really well written and I already know more or less exactly what I want to do anyway, but I just want to say just how silly (fun) it feels to write a program like this because you get to just skip over large swaths of the API.

Like, I'm working from the spec because the tutorials all make it more complicated.

aeva Mar 9, 2025

also the tutorials I reviewed all did the annoying thing where the tutorial squirrels away the stuff you're trying to learn or reference into abstractions that only serve the needs of the tutorial writer, which given my goal is very specifically to *not draw anything*, there's really not much of a point to any of them lol. I'm really not the intended audience here though :3

aeva Mar 10, 2025

I think it's cute that practically every vulkan command has one or more optional args to let you enter Hard Mode

(sorry for the double post, I added this to the wrong thread)

aeva Mar 10, 2025

ok even with just the compute-only subset vulkan is a slog D:

aeva Mar 10, 2025

I wonder how many people have actually managed to knuckle down and write a complete, useful vulkan program from scratch (no copy pasting from tutorials and stack overflow, no offloading significant parts to 3rd party libraries like VMA)

To think if I power through and get this thing working I could potentially be like the 20th person to bother

aeva Mar 12, 2025

oh, update on my little vulkan compute project, last night I got as far as repeatedly dispatching an empty compute shader and allocating some memory 😎 I'm in the home stretch! I think I just need to figure out the resources / resource binding stuff and then I'll be able to start on my DSP experiment :3

which mostly means the next things are figuring out the least effort way of getting audio data into C++ (probably stb_vorbis?) and writing even more boilerplate for alsa...

aeva Mar 13, 2025

Success! I got the vulkan compute shader cranking out the fibonacci series and reading it back to the CPU through a 8 byte persistently mapped buffer. Should be smooth sailing from here.

aeva Apr 20, 2025

ok *whew* I finally did it! I implemented convolution reverb as a vulkan compute shader, and the results seem to be correct. I have it convolving the audio up front at the moment, but it seems to be reasonably fast and the results seem more or less correct. I'm using SDL3 to verify the output. It doesn't look like it'll be too crazy to rework it such that the stream is generated live.

aeva Apr 20, 2025

it turns out the main difficulty working with vulkan is accidentally breaking your laptop in half

aeva Apr 20, 2025

I reworked it so the convolution shader processes the audio in tandem with playback, so I'm *very* close to getting this working with live audio streams.

But more importantly, I used this to convolve my song "strange birds" with a choir-ish fanfare sound effect from a game I used to play as a kid and the result is like the grand cosmos opened up before me and I'm awash in the radiant light of the universe. Absolutely incredible.

aeva Apr 21, 2025

I want to power through and get this into a state where I can use it with live instruments, but I am completely exhausted 😴

aeva May 2, 2025

I reworked some things and now my audio convolving compute shader can convolve ~11 milliseconds worth of audio samples with an average processing time of ~7 milliseconds. That's with one channel with a bit rate of 22050. When the bit rate is 44100, the average processing time is a paltry ~8 milliseconds.

aeva May 2, 2025

also sometime in the last week I made it so it can operate entirely on a live input stream from SDL3 rather than a wave file, so in theory I can incorporate this into a modular setup now, but the results are higher latency than I'd like, and SDL3 doesn't give you much control over audio latency.

aeva May 2, 2025

Apparently my best frame time can get as low as 3 ms. I think vulkan should let me VK_QUEUE_GLOBAL_PRIORITY_REALTIME this program, but sadly vulkan is being a coward about it.

aeva May 2, 2025

ok the problem I'm having with latency now is that the audio latency in the system grows over time and I'm not sure why. like it starts snappy and after running for a short while it gets super laggy :/

I'm guessing it's because SDL3 can and will resize buffers as it wants to, whereas I'd rather it just go crazy if it under runs.

aeva May 2, 2025

What I want to do is have a fixed size buffer for input and output, enough that I can have the output double or tripple buffered to smooth over hitches caused by linux. if my program can't keep up I don't want it to quietly allocate more runway I want it to scream at me LOUDLY and HORRIBLY, but it wont do that because I'll rejigger my program until it is perfect.

What actually happens is (sdl? poopwire?) just infinitybuffers so it never hitches and I get a second of latency after a little bit

aeva May 2, 2025

I like that pipewire has an option to not be terrible ("pro audio" mode) and it doesn't work

aeva May 2, 2025

99% of audio problems on linux these days are just programmers refusing to just fucking use alsa. I'm part of the problem, because I'm using SDL3 instead because the API is simple. SDL3 is part of the problem because when I tell it to just fucking use alsa it uses pipewire instead! and pipewire is part of the problem because it's just completely terrible. like, wayland terrible.

aeva May 2, 2025

want to have low latency audio on linux? we have a tool for it, it's called STOP PILING LAYERS OF BOILERPLATE ON TOP OF ALSA YOU IDIOTS YOU ABSOLUTE FOOLS

aeva May 2, 2025

I'm like 30% sure SDL3 is not the problem or at least not the only problem because I tried resetting the streams every frame with SDL_ClearAudioStream and it still accumulates latency (in addition to also now sounding atrocious due to missing samples).

I've also seen this happen with pipewire before in other situations, and it was resolved by bypassing pipewire.

aeva May 3, 2025

*spaces out* so anyways, this is usually the point where I'd try to cut this down to a simple loop back with as few layers as possible and gradually build back towards my program until I either find where the fault is or or have something working properly. That would mean targeting ALSA directly, except that appears to not be possible without uninstalling pipewire-alsa, which I can't without uninstalling Steam :/

aeva May 3, 2025

so abnormally, this means starting with a pipewire loopback instead and seeing if all you brave defenders of the status quo are flickering my lights or not.

aeva May 3, 2025

this makes me unhappy, but the single silver lining here is pipewire's API docs seem to be a little more newbie friendly than ALSA's

aeva May 4, 2025

ok I did it. I've got a program that writes a pipewire stream of F64 audio samples where each sample is the total elapsed time since the first frame, expressed in mintues.

I've got a second program that reads that pipewire stream, and checks the offset against it's own elapsed time since the first sample processed. This program prints out the calculated drift ever second.

The results are interesting.

aeva May 4, 2025

In the first version of this, both programs just measured the time using std::chrono::steady_clock::time_point. This resulted in an oscillating drift that was well under a millisecond at its peak and nothing to be concerned about.

This is good! That means there's no place what so ever within pipewire on my computer for this specific audio setup where any intermediary buffers might be growing and adding more latency as the programs run.

This is not the interesting case.

aeva May 4, 2025

In the second version, I changed the first program to instead calculate elapsed time as the frame number * the sampling interval, and left the second program alone.

In this version, the calculated drift is essentially the difference between the progress through the stream vs the amount of time that actually passed from the perspective of the observer. In this version, the amount of drift rises gradually. It seems the stream is advancing just a touch faster than it should.

aeva May 4, 2025

The samples in the stream are reporting that more time has elapsed in the "recording" than actually has transpired according to the clock. The amount of drift accumulated seems to be a millisecond every few minutes.

I'm honestly not sure what to make of that.

aeva May 4, 2025

anyways, for the curious, I put the source code for the experiment up here https://github.com/Aeva/slowtime

aeva May 4, 2025

also interesting is the drift is faster if I have the second program's monitor pin hooked up to my sound card, but there's still drift either way.

aeva May 4, 2025

I think my conclusions from this are

1. the latency drift I observed with my experiments with pipewire today is probably inconsequential.

2. there is probably nothing sinister about pipewire.

3. if you have a chain of nodes that are a mix of push or pull driven and have different buffering strategies, you are in the Cool Zone

4. my program is probably going to have to handle "leap samples" in some situations. I admit I wasn't expecting that, but it feels obvious in retrospect.

aeva May 4, 2025

5. the unplayable latency accumulation in my convolution experiment is problematic, but it is unrelated to the latency drift I observed today. This is probably going to be solved by stripping out all the SDL3 audio stuff and replacing it with using pipewire directly. this is thankfully only a minor inconvenience for me.

aeva May 5, 2025

nice, pipewire has some special case stuff for filters

aeva May 9, 2025

holy smokes I got it working :O!! i got my audio convolver working using the pipewire API directly!! and the latency seems to be very adequate for real time play :D

aeva May 9, 2025

my revised opinion on pipewire is that I like that the API is wizards only. I'm a wizard, so that makes me feel special.

aeva May 9, 2025

that or I'm just good at creating wizard problems for myself. either way I'm in a good mood.

https://github.com/Aeva/convolver/blob/c5d1ca8ec8a4aafd640def16d68e1c84bbc6b240/src/convolver.cpp#L509

convolver/src/convolver.cpp at c5d1ca8ec8a4aafd640def16d68e1c84bbc6b240 · Aeva/convolver

Contribute to Aeva/convolver development by creating an account on GitHub.

GitHub

aeva May 10, 2025

god damn this thing is so fucking cool. I've got it hooked up to my drum machine right now and the fm drum in particular is pretty good at isolating parts of the impulse response sample. I'm using a short sample from the Nier Automata song "Alien Manifestation" to convolve the drum machine and it sounds *amazing*. It's a shame I can't modulate the drum parameters on this machine, or I'd be doing some really wild stuff with this right now.

aeva May 10, 2025

some small problems with this system:

1. I've had to turn down the sampling rate so I can convolve longer samples. 22050 hz works out ok though for what I've been messing with so far, so maybe it's not that big a deal. longer samples kinda make things muddy anyway

2. now I want to do multiple convolutions at once and layer things and that's probably not happening on this hardware XD

aeva May 10, 2025

I'll probably have to switch to an fft based system for non-realtime convolution to make this practical for designing dynamic sound tracks for games that can run on a variety of hardware, otherwise I'll probably have to opt for actually recording my songs and stitching it together by some more conventional means

aeva May 10, 2025

this thing is also really good at warming up my laptop XD

idk if I'm done playing around with this prototype yet, but I'd like to explore granular synthesis a bit soon. I think there's probably a lot of cool ways it can be combined with convolution, like having the kernel morph over time.

aeva May 10, 2025

probably first is reworking this program so i can change out the convolution kernel without restarting it or at least make it so i don't have to recompile it each time

aeva May 10, 2025

anyways i highly recommend building your own bespoke audio synthesis pipeline from scratch, it's a lot of fun

aeva May 11, 2025

It occurred to me just now that I might be able to make this faster be rewriting it as a pixel shader. Each pixel in the output is an audio sample. Each PS thread reads a sample from the impulse response and the audio stream, multiplies them together, and writes out the result. To perform the dot product, the draw is instanced, and the add blend op is used to combine the results. I've also got a few ideas for variations that might be worthwhile.

aeva May 11, 2025

Like, having the vertex shader or a mesh shader read the sample from the audio stream, have the PS read the impulse response, and stagger the draw rect. Main snag there is the render target might have to be 512x1 or something like that, or I'll have to do counter swizzling or something.

aeva May 11, 2025

Also FP32 RGBA render targets would probably just batch 4 samples together for the sake of keeping the dimensions lower I guess.

aeva May 11, 2025

I think this should be likely to be a lot faster, because I've made a 2D convolution kernel a lot slower by rewriting it to be compute in the past 😎 but if any of ya'll happen to have inside knowledge on if ihv's are letting raster ops wither and die because AAA graphics programmers think rasterization is passe now or something absurd like that do let me know.

aeva May 16, 2025

I figure I should probably start recording my convolution experiments for reference, and this thread seems as good a place as any to post them.

Tonight's first experiment: An excerpt from a The King In Yellow audio book convolved with a short clip from the Chrono Cross OST (Chronopolis)

aeva May 16, 2025

Tonight's second convolution experiment: The same audio book excerpt, but convolved with a frog instead.

Recordings of speech seem to convolve really well with music and weird samples like this, but it really depends on the voice and what you pick as a kernel.

aeva May 16, 2025

I should remember to try the inverse of the first experiment later (but not tonight)

aeva May 16, 2025

I had a really great thing going with the chronopolis sample as the impulse response, and using my drum machine to drive it yesterday. The FM drum is really great for isolating specific sounds from the impulse response. I did try to record it, but I recorded the unfiltered line in on accident instead, so I'll have to redo it later

aeva May 17, 2025

Experiment 3: Impulse response is a clip from the audio book where the guy is dramatically saying the word "Carcosa". I got a pretty trippy dark ambiance out of it with the drum machine with it earlier, but I didn't feel like recreating it, so I ran a bunch of songs through it instead and Fire Coming Out Of A Monkey's Head sounded the most interesting with it. The Chrono Cross songs I tried didn't feel distorted enough to bother posting, and this one kinda doesn't either but its interesting.

aeva May 17, 2025

Experiment 4: Same impulse response as the previous one, it's the clip from the audio book where the guy is saying "Carcosa", but this time I'm convolving it with VCV Rack. I've got a feedback loop of two sine wave oscillators that are modulating eachother's frequency. the output of the one that is functioning as the carrier is being feathered by a pair of low frequency oscillators before applying an envelope.

aeva May 17, 2025

I'm really blown away by what I can do with fm synthesis + convolution.

aeva May 17, 2025

Experiment 4a: here's another with that same impulse response and nearly the same vcvrack patch, but this time it sounds like a Cello or something instead

aeva May 18, 2025

Experiment 5: in which a clever internet person gives me a home made sound to play with https://mastodon.gamedev.place/@aeva/114531222098257965

aeva (@[email protected])

Attached: 1 audio @[email protected] and here it is with the drum machine. doesn't change the sound all that much, but it's pleasant imo. makes it sound a bit more fruity, especially with the fm drum.

Gamedev Mastodon

aeva May 19, 2025

Experiment 6: "snowmeltdown" aka lowfi sounds to show them and show them all to

(noodling around with the fm drum on the drum machine, and a short clip of rain or snow melting as the impulse response)

aeva May 19, 2025

it's ten minutes long but it's ten really good minutes long imo

cancel May 18, 2025

@aeva what's your convolution thing sound like with this impulse response i generated (wav file is very loud and bright on its own be careful) https://cancel.fm/stuff/share/gen%20IR%20for%20aeva.wav

aeva May 18, 2025

@cancel I'll give it a try this evening. Is there a particular sort of song or recording you'd like me to convolve?

cancel May 18, 2025

@aeva i think anything that isn't tonal would work

aeva May 18, 2025

@cancel should I leave the little bit of leading silence in the sample?

cancel May 18, 2025

@aeva Shouldn’t matter

aeva May 18, 2025

@cancel here it is, convolved with a bunch of random things I found on the bbc sound archive. in order of appearance:

https://sound-effects.bbcrewind.co.uk/search?q=07031100

https://sound-effects.bbcrewind.co.uk/search?q=NHU05032132

https://sound-effects.bbcrewind.co.uk/search?q=07066018

https://sound-effects.bbcrewind.co.uk/search?q=07066021

BBC Sound Effects

BBC Sound Effects

@aeva wild and spooky. now hook it all up to geo nodes in B !

aeva May 17, 2025

@dbat :D!!

Evie 🏳️‍⚧️May 17, 2025

@aeva I swear I heard this in a horror movie but I forgot which

Fabian Giesen May 11, 2025

@aeva The actual reason for that was almost certainly memory access patterns. Thread invocations in PS waves are generally launched and packed to have nice memory access patterns (as much as possible), compute waves and invocations launch more or less in order and micro-managing memory access is _your_ problem.

This really matters for 2D because there's lots of land mines there wrt ordering, but for 1D, not so much.

Fabian Giesen May 11, 2025

@aeva To give a concrete example: suppose you're doing some simple compute shader where all you're doing is

cur_pixel = img.load(x, y)
processed = f(cur_pixel, x, y)
img.store(x, y, cur_pixel)

and you're dispatching 16x16 thread groups, (x,y) = DispatchThreadID, yada yada, all totally vanilla, right?

Fabian Giesen May 11, 2025

@aeva well, suppose we're working in 32-thread waves internally (totally hypothetical number)

now those 32 invocations get (in the very first thread group) x=0,...,15 for y=0 and then y=1.

Say the image is R8G8B8A8 pixels and the internal image layout stores aligned groups of 4 texels next to each other and then goes to the next y, and the next 4-wide strip of texels is actually stored something like 256 bytes away or whatever.

Fabian Giesen May 11, 2025

@aeva so, x=0,..,3 y=0 are all good, these are all adjacent, straight shot, read 16 consecutive bytes, great.

x=0,...,3 y=1 in threads 16..19 are also good, these are the next 16 bytes in memory.

But if we have 256-byte cache lines (another Totally Hypothetical Number), well, those 32 bytes are all we get.

x=4,..,7 for y=0 and 1 are in the cache line at offset 256, x=8,...,11 for y=0,1 at offset 512, x=12,...,15 at offset 768.

Fabian Giesen May 11, 2025

@aeva And caches are usually built to have multiple "banks" that each handle a fraction of a cache line. Let's say our hypothetical cache has 16 16-byte banks to cover each 256B cache line.

Well, all the requests we get from that nice sequential load go into the first 2 banks and the rest gets nothing.

So that's lopsided and causes problems, and will often mean you lose a lot of your potential cache bandwidth because you only actually get that if your requests are nicely distributed over mem.

Fabian Giesen May 11, 2025

@aeva long story short, this whole thing with your thread groups being a row-major array of 16x16 pixels can kind of screw you over, if the underlying image layout is Not Like That.

This happens all the time.

Ordering and packing of PS invocations into waves is specifically set up by the GPU vendor to play nice with whatever memory pipeline, caches, and texture/surface layouts it has.

In CS, all of that is Your Job, generally given no information about the real memory layout.

Good luck!

Fabian Giesen May 11, 2025

@aeva If you do know what the real memory layout is, you can make sure consecutive invocations have nice memory access patterns, but outside consoles (where you often get those docs), eh, good luck with that.

The good news is that with 1D, this problem doesn't exist, because 1D data is sequential everywhere.

So as long as you're making sure adjacent invocations grab adjacent indices, your memory access patterns are generally fine.

(Once you do strided, you're back in the danger zone.)

Fabian Giesen May 11, 2025

@aeva also I want to emphasize that this Purely Hypothetical Example with row-major invocation layout in CS vs. a column-heavy layout in the HW is of course entirely hypothetical and in no way inspired by real events such as https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/

Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling | NVIDIA Technical Blog

As part of my GDC 2019 session, Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method, I presented an optimization technique named thread…

NVIDIA Technical Blog

Janne Moren May 11, 2025

@rygorous @aeva
On the CPU it's generally best to organize structured data as separate contiguous arrays for each element. But with the graphics pedigree of GPUs does that still hold, or does it handle interleaved data better?

aeva May 11, 2025

@jannem @rygorous It depends on the IHV whether SoA or AoS is better and in what situations. Usually there will be a document outlining recommendations somewhere.

aeva May 11, 2025

@rygorous that sounds likely. I don't think I accounted for memory layout of the texture. I assume this is also why Epic seems to be so fond of putting everything in scan line order these days?

aeva May 11, 2025

@rygorous so, my program as written is two linear memory reads, some basic arithmetic, and some wave ops. I think it should be pretty cache efficient, or at least I don't have any obvious ideas for making it moreso. I would think all the extra raster pipeline stuff would not be worth it, but the opportunity to move one of the loads into an earlier shader stage to effectively make it invariant across the wave and make use of the ROP to implement most of the dot product seems maybe worthwhile?

aeva May 11, 2025

@rygorous the ROP is, like, free math, right?

Fabian Giesen May 11, 2025

@aeva Not really. The "math" may be free but the queue spots are not and you'll likely end up waiting longer in the shader to get to emit your output then you would've spent just doing the math directly

Fabian Giesen May 11, 2025

@aeva Looking at the shader you posted yesterday (?) at https://github.com/Aeva/convolver/blob/excelsior/src/convolver.cs.glsl, you're in the Danger Zone(tm)

convolver/src/convolver.cs.glsl at excelsior · Aeva/convolver

Contribute to Aeva/convolver development by creating an account on GitHub.

GitHub

Fabian Giesen May 11, 2025

@aeva the issue is SliceStart is derived from LaneIndex (Subgroup invocation ID) which is then multiplied by Slice

Fabian Giesen May 11, 2025

@aeva for 1D there's not much way to go wrong honestly, it's mainly a 2D (and up) problem

Irenes (many)May 10, 2025

@aeva we've been meaning to, tbh

aeva May 10, 2025

@ireneista it's very satisfying to make sounds

@aeva I built my own audio system and hate every time I have to work on it, so I guess different strokes and all that.

(fwiw:
https://shirakumo.github.io/libmixed/
https://shirakumo.github.io/cl-libmixed/
https://shirakumo.github.io/harmony/ )

libmixed: About libmixed

aeva May 10, 2025

@shinmera mine rewards me with magnificent sounds every time i play with it 😌

@aeva Mine frequently rewards me with ear-destroying noise and incomprehensible bugs

aeva May 10, 2025

@shinmera puzzles :D

James Widman May 11, 2025

@shinmera @aeva [i know nothing about audio processing so i'm like 99.9% sure that there's a good reason why the following doesn't make sense; asking the following out of curiosity]

can the ear-destruction be avoided by like... doing some kind of analysis/checks on the final sample before sending it to the audio device...? (e.g. checking & asserting that its amplitude is less than some upper bound?)

[but if it were that easy, it probably would have been the fist thing anyone would try, so]

aeva May 11, 2025

@JamesWidman @shinmera I just try to remember to turn the volume down before testing new changes

James Widman May 11, 2025

@aeva @shinmera if i ever do audio programming, i will try to remember to make my program start with a giant ASCII-art splash screen that asks if the volume is set correctly before proceeding and makes me type "yes", because i would definitely forget sometimes (:

@JamesWidman @aeva Doing that kind of analysis would be quite difficult, since you can't just check individual samples, and it's not immediately obvious what is a symphonic sequence and what is erroneous noise. A lot of the time what causes horrendous noise is also not in the data, but in the way the data is sent (buffer over or underruns).

As aeva said, usually making sure the volume is low enough is a good enough fix.

Rob May 10, 2025

@aeva Agreed! My DSP project is the most coding fun I've had in years, with bonus fun sounds too 🥳

The Graphics Programmer to Audio Programmer pipeline is real 😂

aeva May 10, 2025

Eniko (moved ➡ gamedev.place)May 10, 2025

aeva May 10, 2025

Ronflaix May 10, 2025

Frankly @aeva ? I'd love to if I understood where to start and the involved math, it'd be a pleasure to suck at it as I do with rendering!

crypticcelery 🔙 EH23 May 10, 2025

@aeva something about scrolling up through this thread and the length of it make me somewhat doubt that statement…