Enhancing Transformer Performance and Portability through Auto-tuning Frameworks
Enhancing Transformer Performance and Portability through Auto-tuning Frameworks
HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration
The rapid growth of deep learning has driven exponential increases in model parameters and computational demands. NVIDIA GPUs and their CUDA-based software ecosystem provide robust support for para…
Thesis: Acceleration as a Service (XaaS) Source Containers
Exploring SYCL for batched kernels with memory allocations
Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems
CPU-GPU co-execution through the exploitation of hybrid technologies via SYCL
#SYCL #OpenCL #CUDA #LLVM #PerformancePortability #LoadBalancing #HybridComputing
Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with Protein Database Search
#SYCL #oneAPI #Bioinformatics #Databases #HPC #PerformancePortability #Package
The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study…