Enhancing Transformer Performance and Portability through Auto-tuning Frameworks

#CUDA #LLM #AutoTuning #PerformancePortability #Package

https://hgpu.org/?p=30329

Enhancing Transformer Performance and Portability through Auto-tuning Frameworks

Abstract Transformer-based models such as BERT and GPT2 have become the foundation of many modern applications, yet their execution requires substantial computational and memory resources. To addre…

hgpu.org

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

#CUDA #LLM #Compilers #AI #PerformancePortability #Package

https://hgpu.org/?p=29940

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

The rapid growth of deep learning has driven exponential increases in model parameters and computational demands. NVIDIA GPUs and their CUDA-based software ecosystem provide robust support for para…

hgpu.org
🧪Curious about high performance across GPUs? Our new paper benchmarks a parallel FSI code on CUDA, SYCL & OpenMP across top systems. See Aristotle Martin present it at #ISC2025 on June 11, 10:45 in Hamburg! #HPC #GPUcomputing #PerformancePortability

Thesis: Acceleration as a Service (XaaS) Source Containers

#HPC #MPI #PerformancePortability #LLM #Package

https://hgpu.org/?p=29925

Acceleration as a Service (XaaS) Source Containers

In this thesis, we address the challenge of performance portability in heterogeneous computing environments. Performance portability refers to the ability of an application to maintain high perform…

hgpu.org

Exploring SYCL for batched kernels with memory allocations

#SYCL #CUDA #PerformancePortability #Package

https://hgpu.org/?p=29911

Exploring SYCL for batched kernels with memory allocations

Batched kernels with memory allocations is a common pattern in HPC, appearing in multi-dimensional FFTs, neural networks processing, or split computation of numerical operators. Its efficient suppo…

hgpu.org

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

#SYCL #TaskScheduling #PerformancePortability #HPC #Package

https://hgpu.org/?p=29823

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing …

hgpu.org

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

#Kokkos #CUDA #HIP #OpenMP #PerformancePortability #Package

https://hgpu.org/?p=29747

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications

OpenMP provides a cross-vendor API for GPU offload that can serve as an implementation layer under performance portability frameworks like the Kokkos C++ library. However, recent work identified so…

hgpu.org

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

#SYCL #OpenCL #CUDA #LLVM #PerformancePortability #LoadBalancing #HybridComputing

https://hgpu.org/?p=29717

CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL

The performance and energy efficiency offered by heterogeneous systems are highly useful for modern C++ applications, but the technological variety demands adequate portability and programmability.…

hgpu.org
We're used to leaning on children's books in Computer Science - with Gulliver's big-endian vs little-endian. Back at Supercomputing hashtag#SC24, I spoke at the hashtag#Intel booth all about open standards, performance portability, and the journey up the Yellow Brick Road to see the Wizard of Oz. Check out the video of the talk on YouTube:
https://youtu.be/xO8FGAOScpo?si=_BnVilvTBa0Ns6dX
#performanceportability #OpenMP #SYCL
University of Bristol: The Role of Open Standard Programming Models for HPC | Intel Software

YouTube

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

#SYCL #oneAPI #Bioinformatics #Databases #HPC #PerformancePortability #Package

https://hgpu.org/?p=29596

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search

The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study…

hgpu.org