https://youtu.be/scI8jB2e5UQ


Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
This paper presents a portable, GPU-accelerated implementation of a QR-based singular value computation algorithm in Julia. The singular value ecomposition (SVD) is a fundamental numerical tool in …
Performance Portable Gradient Computations Using Source Transformation
Asynchronous-Many-Task Systems: Challenges and Opportunities – Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution in localized areas across expansive domains. Tod…
Kokkidio: Fast, expressive, portable code, based on Kokkos and Eigen
Working on my first project using the performance portability layer Kokkos. My first impression is mostly good, with the single big exception being their View type. Too many bells and whistles included, in my opinion, for high performance computing.
Kokkos also supports C style dynamic allocation, but the docs suggest that Views may be more naturally mapped onto various hardware. Can anyone confirm this? It seems surprising to me that all the extra overhead of Views would outperform some malloc equivalent.
Or do folks have any tips for a Kokkos newbie in general?