πŸŽ‰πŸŒˆ Behold, the NumKong 2000β€”a mind-boggling parade of mixed precision #kernels, designed to make your head spin faster than a washing machine on hyperdrive! πŸ€―πŸŒ€ With a dazzling array of Float6 to #Float118 across 7 languages, it's the Swiss Army knife of numericsβ€”but only if you have 48 spare minutes and a PhD in deciphering technobabble. πŸ“šπŸ”
https://ashvardanian.com/posts/numkong/ #NumKong2000 #MixedPrecision #TechInnovation #Numerics #HackerNews #ngated
NumKong: 2'000 Mixed Precision Kernels For All 🦍

Over 2'000 SIMD kernels for mixed-precision BLAS-like numerics across 7 languages β€” from Float6 to Float118, on RISC-V, Intel AMX, and Apple SME, in 5 MB.

Ash's Blog
NumKong: 2'000 Mixed Precision Kernels For All 🦍

Over 2'000 SIMD kernels for mixed-precision BLAS-like numerics across 7 languages β€” from Float6 to Float118, on RISC-V, Intel AMX, and Apple SME, in 5 MB.

Ash's Blog
πŸŽ‰ Behold, the magical #AI that promises to turn your sleepy dreams into high-performance #GPU #kernels while you snore! 😴✨ Just toss in any #PyTorch model, and this overhyped digital fairy godmother will allegedly transform it into something useful by morning. Because clearly, we all needed yet another excuse to nap on the job. πŸ’€πŸ–₯️
https://github.com/RightNow-AI/autokernel #Magic #SleepyDreams #Productivity #HackerNews #ngated
GitHub - RightNow-AI/autokernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels. - RightNow-AI/autokernel

GitHub
GitHub - RightNow-AI/autokernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels. - RightNow-AI/autokernel

GitHub

Yuchen Jin (@Yuchenj_UW)

μž‘μ„±μžλŠ” λͺ¨λΈμ—κ²Œ B200s용 컀널을 FlashAttention-4보닀 더 잘 μž‘μ„±ν•˜κ²Œ ν•˜κ±°λ‚˜, NanoGPTλ₯Ό 더 λΉ λ₯΄κ²Œ λ§Œλ“€κΈ° μœ„ν•œ μƒˆλ‘œμš΄ 연ꡬ 아이디어λ₯Ό λ‚΄κ²Œ ν•˜λŠ” λ“± μ‹€ν—˜μ Β·κ°œλ°œμžμš© ν™œμš© 사둀λ₯Ό μ–ΈκΈ‰ν•˜λ©° κ³§ ν…ŒμŠ€νŠΈν•˜κ² λ‹€κ³  λ°ν˜”μŠ΅λ‹ˆλ‹€.

https://x.com/Yuchenj_UW/status/2029642799277318503

#nanogpt #flashattention #gpu #kernels

Yuchen Jin (@Yuchenj_UW) on X

@DeryaTR_ @_overment 🫑 I have some too, like asking it to write kernels on B200s better than FlashAttention-4, or come up with new research ideas to make nanogpt faster, will test today

X (formerly Twitter)
corn, specifically known as #flintcorn or #Indiancorn, which are heirloom varieties of maize characterized by their multicolored #kernels
Elliot Arledge

Systems engineer and educator. Building and teaching GPU programming, CUDA, and low-level ML systems.

Vim / Neovim
53.3%
Emacs
9.7%
Nano
29.5%
Micro
1.8%
Other (comment)
5.6%
Poll ended at .