I've got a new blog article: "Training an LLM in Swift, Part 1: Taking matrix multiplication from Gflop/s to Tflop/s"

Clunky title, I know. Also, fair warning: there's nearly as much assembly, C and Metal as Swift in the article. So the title is clunky *and* misleading.

But I had fun writing it (the code, not the title).

https://www.cocoawithlove.com/blog/matrix-multiplications-swift.html

@cocoawithlove hey Matt, love the detail as always ❤️

Can you elaborate on your "Why not Swift concurrency?" pull? I'm not sure I follow what DispatchQueue is doing differently to TaskGroup based on what you've written.

@tonyarnold TaskGroup is async, which means the closure needs to escape, which conflicts with the non-escaping `Span` (which is how I'm safely passing non-copied arrays around avoiding both copies and ref counts).
@cocoawithlove ahhh, yep, gotcha - thanks for the clarification.
@cocoawithlove actually so cool, can’t wait to read

@cocoawithlove Lovely read! I have no clue about anything but enjoy these glimpses into Metal in particular. Felt like I could follow along.

Here's a typo for you in return :)

-which allows C t use fused-multiply-addition
+which allows C to use fused-multiply-addition