2.4. Writing Tile Kernels — CUDA Programming Guide

@mhoemmen

What's wrong with

import cuda.tile;

in C++? Just asking for a friend 😊
Thanks a lot for doing this! 🎉

@DanielaKEngert this is a reasonable request! i will ask about this!

note that you might see declarations of class templates in the header file, but the compiler knows about those things and implements them not necessarily by compiling C++ code

@mhoemmen

Excerpts from my talk this year (actual measurements):

@DanielaKEngert EXCELLENT (always glad to see experiments!!!)
@DanielaKEngert btw, NVCC doesn't support modules yet, but we're working on it!

@mhoemmen
To put the shown slides into perspective: the mentioned small application is using (pretty much) latest Boost, Qt 6, Apache Xerces, (unadorned) Asio, NLohmann.Json, and the hardened C++23 standard library *without* changing *any* of the original sources - i.e. everything still compiles in the original #include form, as checked-out from the repositories. Therefore the direct comparision that would otherwise be impossible.

In other words: a full transition to modules would possibly be even faster to build.

The hesitation of people towards big modules is the no.1 performance killer that people tend to bitch about. And then there is that abomination called CMake that is the guarantee for thwarting such comparisions.

@DanielaKEngert i totally agree! this sort of evidence is most helpful : - )