NVIDIA wprowadza CuTe DSL w CUTLASS 4 – Python zbliża się do C++ w wydajności

Czy da się dogonić C++ wydajnością, pisząc w Pythonie – i to bez czarów, bez sugar-coata i bez tygodni czekania na kompilację? NVIDIA twierdzi, że tak: nowy CuTe DSL w CUTLASS 4 obiecuje „C++-owe” osiągi Tensor Cores z wygodą pythonowych API.

Czytaj dalej:
https://pressmind.org/nvidia-wprowadza-cute-dsl-w-cutlass-4-python-zbliza-sie-do-c-w-wydajnosci/

#PressMindLabs #cutedsl #cutlass #gemm #nvidia #pythonjit

#cutlass is a short, broad sabre or slashing sword with a straight or slightly curved blade sharpened on the cutting edge and a hilt often featuring a solid cupped or basket-shaped guard
#Cutlass #JetFire Αυτό ήταν το πρώτο αυτοκίνητο με εργοστασιακό Turbo στην ιστορία (Βίντεο) https://www.zougla.gr/automoto/automoto-news/afto-itan-to-proto-aftokinito-me-ergostasiako-turbo-stin-istoria-vinteo/?utm_source=dlvr.it&utm_medium=mastodon
Someone discovered that slapping the word "cutlass" on a #kernel magically boosts #performance by 100 tflops! ⚡🔪 Meanwhile, #GitHub is busy throwing #AI #buzzwords around like confetti, because who needs actual substance when you have Sparkly New Features™? 🙄🎉
https://github.com/triton-lang/triton/pull/7298 #cutlass #boost #tech #news #optimization #HackerNews #ngated
[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

GitHub
[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

GitHub

Hackernews post title something like "Nvidia software runs significantly faster when kernel name has 'cutlass' in it."

WHAT?

Hackernews commenter replies "The Volkswagon emissions testing model"

AH, I SEE 😂😂😂

https://github.com/triton-lang/triton/pull/7298
https://news.ycombinator.com/item?id=44530581

#hn #nvidia #github #programming #cutlass

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

GitHub
😂 Ah, the classic tale of tech sorcery where simply naming your kernel "cutlass" magically unlocks 100 #tflops of speed! Meanwhile, x.com is still busy booting you off your browser faster than you can say "incompatibility." 🏴‍☠️🔗📉
https://twitter.com/cis_female/status/1943069934332055912 #techhumor #cutlass #xcom #incompatibility #HackerNews #ngated
sophia (@cis_female) on X

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM

X (formerly Twitter)