Cursed project end of day 1: clinfo runs
kinda start of day 2: I got the first buffer copies working 🙃
It's a bit surprising, but I guess I'll have to start writing a compiler already on day 2 🙃

At the conf I was like "two weeks and I have something pretty functional" and I'm already here at day 2 having something functional 🙃

Though I'm sure I'll waste 5 days just on image support...

yo, got the first kernel launching, but like not doing much, but at least there are binaries uploaded and the binaries aren't making the GPU angy!
got the first test launching a kernel passing! It's not doing much, just writing the result of sizeof into a buffer, but it is working!

I hoped I could have done anything substantial done today, but I was dealing with synchronization issues.

With that out of the way I can finally get my first fma kernel to run successfully...

anyway...

1: add fp32................Wimp pass 0.00 @ {0x0p+0, 0x0p+0}
add passed
PASSED sub-test.
PASSED test.

let's run the CTS, I mean, how bad could it be?

Pass 2486 Fails 115 Crashes 1284

yeah... crashes are mostly just unsupported NIR instructions.

also have the passes are "non supported" things 🙃

commonfs: "PASSED 18 of 18 tests."

getting there.

to pass this it required me to wire up boolean comparisons, predicates and stuff..

So the biggest item left in terms of general shader generation is control flow, which I needed boolean predicates for as well 🙃

Pass 2628 Fails 132 Crashes 1125

Yeah.. I should wire up control flow 🙃

Implemented basic control flow:

Pass 3280 Fails 142 Crashes 463

yeah soo.. is it day 5? I think it's day 5. Which is a bit weird because it feels like day 4. But maybe also because I started like almost at the end of the day? Maybe that doesn't count? Does it even matter? No, but anyway....

Status at the end of day 5:

Pass 3577 Fails 179 Crashes 129

What's missing?
- Image support
- Atomics
- Scratch
- Some math is failing validation.
- buffer synchronization issues, still.. I honestly don't know.
- optional gallium/nir stuff

I guess that's uhm... well... I guess?
oh shoo.. I'm like 10% slower than Nvidia's implementation 🙃
heh.. but I validate more pixels correctly, that's funny
you all don't want to know the most cursed part about this 🙃
@karolherbst we do actually, and you know we do 8-D

Pass 3769 Fails 112 Crashes 4 Timeouts 0

Something something atomics... and a few other random things, should be like 3 or 4 bugs in total...

okay... I found one of the atomic bugs.. it's when two kernels are launched back to back and apparently they can interact weirdly with each other. When I force a flush+wait between them those fails go away... curious
or maybe it's an ordering issue? mhh

Pass 3871 Fails 10 Crashes 4 Timeouts 0

I think that's good enough for an initial MR 🙃

Here it is: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37831

Hopefully that's not too much of a shock.

I don't have a blog, so my MR is the blog post I guess?

add a new gallium compute only driver for nvidia hardware, named nocl (!37831) · Merge requests · Mesa / mesa · GitLab

Depends on: !37169 (only a few commits...

GitLab
okay.. got fp16 working, that wasn't even hard, took me like 20 minutes 🙃

oh wow.. apparently I designed the internal SVM APIs in mesa like what Nvidia added in CUDA 10.2 with cuMemAddressReserve and the likes...

so I guess it should be trivial to wire up SVM as well.

@karolherbst Or maybe USM?
@bashbaug yeah that as well, but I haven't implemented USM yet
@karolherbst I guess this is technically PoCL competition
@DenJohn oh they also implement on top of CUDA? I didn't know 🙃

@karolherbst @DenJohn I was headed here to point towards PoCL too.

But this looks awesome! I was wondering if this can be used on Windows as an OpenCL driver.

@karolherbst interesting! Pocl does something similar too too the same effect, BTW.

And if you can fix typos, there's a cl_gl_shring that needs an a ;-)

@karolherbst evil. Very evil.
The only possible good thing it can do is to convince people to leave cuda.
@karolherbst how would two kernels affect each other, unless they stomp on each other's memory? Did the hardware support concurrent kennel execution?
@oblomov either that or just executed in a different order
@karolherbst there's hardware whose queues are OOO?
@karolherbst Welcome to the jungle...!

@karolherbst PASSED 19 of 18 tests

oh the remaining one must be an off-by-one error

*gets out*