Mixed-precision numerics in scientific applications: survey and perspectives

#GPU #MixedPrecision #Review

https://hgpu.org/?p=30704

Mixed-precision numerics in scientific applications: survey and perspectives

The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hard…

hgpu.org
🎉🌈 Behold, the NumKong 2000—a mind-boggling parade of mixed precision #kernels, designed to make your head spin faster than a washing machine on hyperdrive! 🤯🌀 With a dazzling array of Float6 to #Float118 across 7 languages, it's the Swiss Army knife of numerics—but only if you have 48 spare minutes and a PhD in deciphering technobabble. 📚🔍
https://ashvardanian.com/posts/numkong/ #NumKong2000 #MixedPrecision #TechInnovation #Numerics #HackerNews #ngated
NumKong: 2'000 Mixed Precision Kernels For All 🦍

Around 2'000 SIMD kernels for mixed-precision BLAS-like numerics — dot products, batched GEMMs, distances, geospatial, ColBERT MaxSim, and mesh alignment — from Float6 to Float118, leveraging RISC-V, Intel AMX, Arm SME, and WebAssembly Relaxed SIMD, in 7 languages and 5 MB.

Ash's Blog
I missed the fact that #hpgmp (High-performance GMRES mixed-precision) is now a separate project (finally). https://github.com/hpg-mxp/hpg-mxp #mixedprecision #hpc
GitHub - hpg-mxp/hpg-mxp

Contribute to hpg-mxp/hpg-mxp development by creating an account on GitHub.

GitHub

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

#Intel #AVX #MixedPrecision #FEM #Package

https://hgpu.org/?p=29481

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

In this paper we develop the first fine-grained rounding error analysis of finite element (FE) cell kernels and assembly. The theory includes mixed-precision implementations and accounts for hardwa…

hgpu.org
@freemin7 working in both I'd say #1. Gamedev has less computational physics and since death of SLI also no distributed computing component, but rendering itself is a flavour of computing and the brutal runtime and memory ceilings make it nothing less than #HPC. After all, a modern gaming #GPU has the same oompf as an entire supercomputer from 2 decades ago. There is plenty optimization techniques originating in gamedev that made it into HPC and vice versa. Prime example is #mixedprecision. 🖖🧐

Are you an European scientist working in climate and weather? Then you may want to check this hackathon that we are organizing in Amsterdam. We want to help you improve the performance and energy efficiency of your code using Graphics Processing Units, auto-tuning, and mixed-precision techniques!

#Climate #Weather #HPC #GPU #EnergyEfficiency #AutoTuning #MixedPrecision

Help me by reposting this (if you can)

https://www.esiwace.eu/events/2nd-esiwace3-hackathon

2nd ESiWACE3 Hackathon

ESiWACE3 Hackathon on Optimisation and Tuning of Earth-System Models

ESiWACE

Oh hey! #mixedprecision! That’s my thing!

What is it Jensen? OCP? FP8? MXfloat? Death to TF32?

How distributed training works in Pytorch: distributed data-parallel and mixed-precision training - The Triangle Agency

Click the link to discover all our marketing tools and unlimited access B2B email leads. Leads Vault In this tutorial, we will learn how to use nn.parallel.DistributedDataParallel for training our models in multiple GPUs. We will take a minimal example of training an image classifier and see how we can speed up the training. Let’s […]

The Triangle Agency
Neben der direkten Anbindung an NumPy führt das Machine-Learning-Framework eine neue Methode für asynchrones paralleles Modelltraining ein.
Machine Learning: TensorFlow 2.4 rechnet mit NumPy-APIs
Machine Learning: TensorFlow 2.4 rechnet mit NumPy-APIs

Neben der direkten Anbindung an NumPy führt das Machine-Learning-Framework eine neue Methode für asynchrones paralleles Modelltraining ein.