It is really wild how fast one implement stuff on the #GPU using just #openmp
Talking about zero to hero in a couple of weeks. There are some optimizations that involve the #simd directive that is not entirely clear to me when to use them, while #gcc also uses some very non transparent rules to map code to vector #ptx instructions but the gain is worth the pain of not understanding these two (and a few other things)

@openmp_arb

There are probably some type conversions that take place to accommodate 64bit coding styles on the host to what are 32bit devices with 64bit memory addressing that one must know to squeeze every cycle out of the #gpu , but if one aims for performance portability such considerations are of secondary importance