Scalable GPU-Based Integrity Verification for Large Machine Learning Models

#SYCL #oneAPI #Rust #Security #Package

https://hgpu.org/?p=30327

Scalable GPU-Based Integrity Verification for Large Machine Learning Models

We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. …

hgpu.org

SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching

#CUDA #oneAPI #MPI #Chemistry #Biology #Package

https://hgpu.org/?p=30085

SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching

Subgraph isomorphism is a fundamental graph problem with applications in diverse domains from biology to social network analysis. Of particular interest is molecular matching, which uses a subgraph…

hgpu.org

Ich habe den binären Vanilla #Blender (4.5.0) dazu bekommen, mittels #OneAPI mit der Intel Arc GPU der Notebook APU unter Fedora Linux zu rendern. \o/

APU: https://www.intel.de/content/www/de/de/products/sku/240958/intel-core-ultra-7-processor-268v-12m-cache-up-to-5-00-ghz/specifications.html

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

#SYCL #OpenCL #OpenMP #OneAPI #Benchmarking #Package

https://hgpu.org/?p=29924

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

As multicore vector processors improve in computational and memory performance, running SIMT (Single Instruction Multiple Threads) programs on CPUs has become increasingly appealing, potentially el…

hgpu.org
Update on my #retroHPCcomputing project: it seems the first pcie slot is dead, but the X99 mobo has 7 slots, so this is not a big deal. Shown also the #XeonPhi the old Tesla #Nvidia c2075, the gt960 with its riser cable (prior to vertical install on the enthoo 719 case) & the RAID array.
I opted for Centos 7.3 as an initial choice to compile the Phi stack (will likely move to Alma 8 as I found instructions to compile the stack on CentOs 8 relatives). Hopefully, I have #OneApi 2021.2 somewhere
Update on my #retroHPCcomputing project: it seems the first pcie slot is dead, but the X99 mobo has 7 slots, so this is not a big deal. Shown also the #XeonPhi the old Tesla #Nvidia c2075, the gt960 with its riser cable (prior to vertical install on the enthoo 719 case) & the RAID array.
I opted for Centos 7.3 as an initial choice to compile the Phi stack (will likely move to Alma 8 as I found instructions to compile the stack on CentOs 8 relatives). Hopefully, I have #OneApi 2021.2 somewhere

Thesis: Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

#oneAPI #FPGA #Python

https://hgpu.org/?p=29840

Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

There is a growing interest in the computer architecture community to incorporate heterogeneity and specialization to improve performance. Developers can write heterogeneous applications that consi…

hgpu.org

ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming

#SYCL #CUDA #oneAPI #AI #Triton #Compilers #Intel

https://hgpu.org/?p=29825

ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming

In the era of LLMs, dense operations such as GEMM and MHA are critical components. These operations are well-suited for parallel execution using a tilebased approach. While traditional GPU programm…

hgpu.org

Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <https://doi.org/10.1002/cpe.8313>

This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <https://doi.org/10.1016/j.jcp.2022.111413>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is

a. too much effort
b. probably not worth it.

Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.

6/