I've got a new blog article: "Training an LLM in Swift, Part 1: Taking matrix multiplication from Gflop/s to Tflop/s"

Clunky title, I know. Also, fair warning: there's nearly as much assembly, C and Metal as Swift in the article. So the title is clunky *and* misleading.

But I had fun writing it (the code, not the title).

https://www.cocoawithlove.com/blog/matrix-multiplications-swift.html

@cocoawithlove hey Matt, love the detail as always ❤️

Can you elaborate on your "Why not Swift concurrency?" pull? I'm not sure I follow what DispatchQueue is doing differently to TaskGroup based on what you've written.

@tonyarnold TaskGroup is async, which means the closure needs to escape, which conflicts with the non-escaping `Span` (which is how I'm safely passing non-copied arrays around avoiding both copies and ref counts).
@cocoawithlove ahhh, yep, gotcha - thanks for the clarification.