Mastodawn

sayzard Mar 31

AISatoshi (@AiXsatoshi)

자택 컴퓨팅 자원이 1000 TFLOPS를 넘었다고 언급하며, 로컬 LLM을 돌리는 고성능 개인 컴퓨팅 환경에 관심 있는 해외 사용자들을 초대했다. 로컬 AI 추론과 컴퓨팅 자원 확장 흐름을 보여준다.

https://x.com/AiXsatoshi/status/2038925060539637866

#localllm #tflops #aicompute #llm #hardware

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

我が家の計算資源も1000TFLOPS超えましたよ。海外のローカルLLMガチ勢、計算資源ギークの方々フォローお願いします

X (formerly Twitter)

AI Daily Post Mar 7

New benchmark shows that larger CUDA tiles can cut Flash Attention throughput by 18‑43 % across sequence lengths. The study dives into kernel design, TFLOPS loss, and what it means for transformer model efficiency on NVIDIA GPUs. Open‑source researchers can use these insights to tune their kernels and reclaim performance. #FlashAttention #CUDATiles #GPUPerformance #TFLOPS

🔗 https://aidailypost.com/news/large-cuda-tiles-reduce-flash-attention-tflops-by-1843-across

LinuxGizmos.com [Unofficial]Jan 18

Axiomtek Previews Jetson Thor T5000/T4000 Developer Kit for Robotics Systems

https://fed.brid.gy/r/https://linuxgizmos.com/axiomtek-previews-jetson-thor-t5000-t4000-developer-kit-for-robotics-systems/

Reddit Tech VN Bot Dec 11

So sánh hiệu năng GPU qua benchmark nhân ma trận BF16 8192x8192. B200 dẫn đầu với 1629,45 TFLOPS và thời gian 306,85ms, vượt trội H200 SXM (680 TFLOPS), MI300X (464,9 TFLOPS) và các dòng RTX. Tesla V100 và Colab T4 "chậm như rùa". Kết luận: Mini PC Strix Halo (khoảng 59 TFLOPS) đủ dùng, thêm RTX 3090 nếu cần CUDA. #GPU #TFLOPS #ĐánhGiáHiệuNăng #MáyTínhChơiGame #AI #AMD #NVIDIA #ROCm #MLX #Kaggle #Colab #DGXSpark #TechNews #CôngNghẹ #TestingGPU #Benchmarks #ViễnThông #TechCompare #VietnamTech

Hacker News Oct 3, 2025

Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it

https://github.com/triton-lang/triton/pull/7298

#HackerNews #Fp8 #cutlass #tflops #performance #optimization #HackerNews #triton

[Gluon][Tutorial] Persistent attention by Mogball · Pull Request #7298 · triton-lang/triton

Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the so...

GitHub

N-gated Hacker News Jul 11, 2025

😂 Ah, the classic tale of tech sorcery where simply naming your kernel "cutlass" magically unlocks 100 #tflops of speed! Meanwhile, x.com is still busy booting you off your browser faster than you can say "incompatibility." 🏴‍☠️🔗📉
https://twitter.com/cis_female/status/1943069934332055912 #techhumor #cutlass #xcom #incompatibility #HackerNews #ngated

sophia (@cis_female) on X

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM

X (formerly Twitter)

Hacker News Jul 11, 2025

FP8 is ~100 tflops faster when the kernel name has "cutlass" in it

https://twitter.com/cis_female/status/1943069934332055912

#HackerNews #FP8 #tflops #cutlass #performance #optimization #AI

sophia (@cis_female) on X

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM

X (formerly Twitter)

CEOTECH.IT Jan 15, 2025

Nintendo Switch 2: potenza in TFLOPS svelata
#Console #DLSS #GameNews #Gamer #Gaming #GamingIndustry #Leak #Nintendo #NintendoSwitch2 #Notizie #NVIDIATegra #Rumors #Switch2 #TechNews #Tecnologia #TFLOPS #VideoGame #Videogiochi #XboxSeriesS

https://www.ceotech.it/nintendo-switch-2-potenza-in-tflops-svelata/

Nintendo Switch 2: potenza in TFLOPS svelata

Nuove fughe di notizie suggeriscono la potenza di Nintendo Switch 2. È meno potente di Xbox Series S, ma il supporto a DLSS potrebbe fare la differenza.Tra

CeoTech

Benjamin Carr, Ph.D. 👨🏻‍💻🧬Mar 3, 2024

#China's secretive #Tianh 3 #supercomputer uses homegrown hybrid #CPU — rivals US systems with 1.57 #Exaflops of performance. #NUDT #MT3000 features a unique heterogeneous architecture that includes general-purpose CPU cores with 96 control cores and 1,536 accelerator cores. Netting the MT-3000 processor reportedly achieves 11.6 FP64 #TFLOPS of peak performance and demonstrates a power efficiency of 45.4 #GigaFLOPS/Watt at operational frequency of 1.20 GHz https://www.tomshardware.com/tech-industry/supercomputers/chinas-secretive-tianhe-3-supercomputer-uses-homegrown-hybrid-cpu-rivals-us-systems-with-157-exaflops-of-performance-report #hpc #sanctions

China's secretive Tianhe 3 supercomputer uses homegrown hybrid CPU — rivals US systems with 1.57 Exaflops of performance: Report

Tianhe 3 could achieve peak performance of 1.57 ExaFLOPS.

Tom's Hardware