๐ŸŽ‰๐ŸŒˆ Behold, the NumKong 2000โ€”a mind-boggling parade of mixed precision #kernels, designed to make your head spin faster than a washing machine on hyperdrive! ๐Ÿคฏ๐ŸŒ€ With a dazzling array of Float6 to #Float118 across 7 languages, it's the Swiss Army knife of numericsโ€”but only if you have 48 spare minutes and a PhD in deciphering technobabble. ๐Ÿ“š๐Ÿ”
https://ashvardanian.com/posts/numkong/ #NumKong2000 #MixedPrecision #TechInnovation #Numerics #HackerNews #ngated
NumKong: 2'000 Mixed Precision Kernels For All ๐Ÿฆ

Over 2'000 SIMD kernels for mixed-precision BLAS-like numerics across 7 languages โ€” from Float6 to Float118, on RISC-V, Intel AMX, and Apple SME, in 5 MB.

Ash's Blog
NumKong: 2'000 Mixed Precision Kernels For All ๐Ÿฆ

Over 2'000 SIMD kernels for mixed-precision BLAS-like numerics across 7 languages โ€” from Float6 to Float118, on RISC-V, Intel AMX, and Apple SME, in 5 MB.

Ash's Blog
๐ŸŽ‰ Behold, the magical #AI that promises to turn your sleepy dreams into high-performance #GPU #kernels while you snore! ๐Ÿ˜ดโœจ Just toss in any #PyTorch model, and this overhyped digital fairy godmother will allegedly transform it into something useful by morning. Because clearly, we all needed yet another excuse to nap on the job. ๐Ÿ’ค๐Ÿ–ฅ๏ธ
https://github.com/RightNow-AI/autokernel #Magic #SleepyDreams #Productivity #HackerNews #ngated
GitHub - RightNow-AI/autokernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels. - RightNow-AI/autokernel

GitHub
GitHub - RightNow-AI/autokernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels. - RightNow-AI/autokernel

GitHub

Yuchen Jin (@Yuchenj_UW)

์ž‘์„ฑ์ž๋Š” ๋ชจ๋ธ์—๊ฒŒ B200s์šฉ ์ปค๋„์„ FlashAttention-4๋ณด๋‹ค ๋” ์ž˜ ์ž‘์„ฑํ•˜๊ฒŒ ํ•˜๊ฑฐ๋‚˜, NanoGPT๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ์•„์ด๋””์–ด๋ฅผ ๋‚ด๊ฒŒ ํ•˜๋Š” ๋“ฑ ์‹คํ—˜์ ยท๊ฐœ๋ฐœ์ž์šฉ ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์–ธ๊ธ‰ํ•˜๋ฉฐ ๊ณง ํ…Œ์ŠคํŠธํ•˜๊ฒ ๋‹ค๊ณ  ๋ฐํ˜”์Šต๋‹ˆ๋‹ค.

https://x.com/Yuchenj_UW/status/2029642799277318503

#nanogpt #flashattention #gpu #kernels

Yuchen Jin (@Yuchenj_UW) on X

@DeryaTR_ @_overment ๐Ÿซก I have some too, like asking it to write kernels on B200s better than FlashAttention-4, or come up with new research ideas to make nanogpt faster, will test today

X (formerly Twitter)
corn, specifically known as #flintcorn or #Indiancorn, which are heirloom varieties of maize characterized by their multicolored #kernels
Elliot Arledge

Systems engineer and educator. Building and teaching GPU programming, CUDA, and low-level ML systems.

Vim / Neovim
53.3%
Emacs
9.7%
Nano
29.5%
Micro
1.8%
Other (comment)
5.6%
Poll ended at .