You need to sign in or sign up before continuing.
CUDA-Kernels instead of OpenACC
@jahns Currently packing/unpacking kernels for the GPU support are generated using OpenACC. Example for packing 8-Byte-data:
static void xt_ddt_pack_8(
size_t count, ssize_t *restrict displs, void const *restrict src,
void *restrict dst, enum xt_memtype memtype) {
XtPragmaACC(
parallel loop independent deviceptr(src, dst, displs)
if (memtype != XT_MEMTYPE_HOST))
for (size_t i = 0; i < count; ++i)
((int8_t*)dst)[i] = *(int8_t*)((unsigned char *)src + displs[i]);
}
Alternatively we could write the kernels in CUDA-code and compile them at runtime using NVRTC. This approach is a little bit more complex for us, but the advantages would be:
- no dependencies on OpenACC
- we could compile at runtime for the architecture that is actually being used
- the configure would not have to determine any compiling/linking flags for the CUDA support (or provided by the user); the CUDA-root directory would be sufficient