dkrz-sw issueshttps://gitlab.dkrz.de/groups/dkrz-sw/-/issues2022-12-16T18:07:04Zhttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/4Mpich hindexed workraound does not work with static linking2022-12-16T18:07:04ZSergey KosukhinMpich hindexed workraound does not work with static linkingNeither a generic user application nor YAXT itself calls the `yaksi/yaksa` functions from the generated `mpich_workaround.c`. Therefore, the `mpich_workaround.o` from `libyaxt_c.a` is not picked up by the linker and the MPICH library hap...Neither a generic user application nor YAXT itself calls the `yaksi/yaksa` functions from the generated `mpich_workaround.c`. Therefore, the `mpich_workaround.o` from `libyaxt_c.a` is not picked up by the linker and the MPICH library happily uses its own bugged versions of the functions. This is why, I guess, `test_ut_run` fails.
I see two and a half solutions to the problem:
1. Install `mpich_workaround.o` to the `/prefix/lib` and tell the users that they are supposed to include this object file in their executables when linking to YAXT.
2. Introduce a dependency of a core object file (a file that is always needed whenever YAXT is used) on `mpich_workaround.o`. For example, we can extend `mpich_workaround.c` with an additional string `void xt_mpi_workaround_dummy() {}` and add one of the following to `xt_init.c`:
- introduce a dependency between the translation units without making redundant calls to the dummy function at the runtime:
```c
#ifdef WHATEVER_NEW_MACRO
void xt_mpi_workaround_dummy();
void xt_mpi_workaround_trigger()
{
xt_mpi_workaround_dummy();
}
#endif
```
- introduce the dependency that should not get optimized out by whatever smart linker:
```c
#ifdef WHATEVER_NEW_MACRO
void xt_mpi_workaround_dummy();
#endif
...
void
xt_finalize(void)
{
...
#ifdef WHATEVER_NEW_MACRO
xt_mpi_workaround_dummy();
#endif
}
```https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/7Libtool patches2022-06-29T19:12:25ZSergey KosukhinLibtool patches1. There is [a new version of Libtool](http://savannah.gnu.org/forum/forum.php?forum_id=10139). However, it looks like we have to apply the patches we have for the new version too:
- [ ] check what patches in [contrib](https://gitlab...1. There is [a new version of Libtool](http://savannah.gnu.org/forum/forum.php?forum_id=10139). However, it looks like we have to apply the patches we have for the new version too:
- [ ] check what patches in [contrib](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/tree/62afd60119383ef7b6aba23a8baf45c0063fcc5f/contrib) are needed and applicable to version `2.4.7` (and generate corresponding patch files if needed);
- [ ] update [scripts/reconfigure](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/scripts/reconfigure) to cover the case of `"$libtoolversion"` in `(2.4.7)`, which is what we have for MacPorts-provided `libtool` on MacOs;
- [ ] update the monkey-patching in [m4/acx_use_libtool_configuration.m4](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/m4/acx_use_libtool_configuration.m4) to cover the new version of Libtool.
2. There are several things to fix if we ever decide to support NAG on MacOS:
- solve the BSD sed compatibility problem for [contrib/06ltmain_nag_pthread-patch/ltmain_nag_pthread.patch](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/contrib/06ltmain_nag_pthread-patch/ltmain_nag_pthread.patch#L10) (see #6);
- allow for the configure-time compiler recognition behind the MPI compiler wrappers on MacOS: [the current solution](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/m4/acx_use_libtool_configuration.m4#L102) covers `linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*` but not `darwin* | rhapsody*`;
- modify the value of `archive_cmds` in `./libtools`:
```patch
-archive_cmds="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring \$single_module"
+archive_cmds="\$CC -Wl,-dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -Wl,-install_name,\`echo \$rpath/\$soname\` \$verstring \$single_module"
```
`` \`echo \$rpath/\$soname\` `` is to drop the leading whitespace, there might be a better solution though.https://gitlab.dkrz.de/dkrz-sw/yac/-/issues/1Question on compiling and linking with YAC2022-07-06T20:52:21ZMoritz HankeQuestion on compiling and linking with YACHi,
I wrote two toy models that are being used to check the coupling with a hydrology model. These toy models are part of the repository of the other model. The toys are compiled as follows:
```
mpicc -o toy_land.exe -O0 -g ../code/src/t...Hi,
I wrote two toy models that are being used to check the coupling with a hydrology model. These toy models are part of the repository of the other model. The toys are compiled as follows:
```
mpicc -o toy_land.exe -O0 -g ../code/src/toy_land.c `pkg-config --libs yac` `pkg-config --cflags yac`
```
The toys themselves use NetCDF. Since, YAC also uses netcdf, pkg delivers the respective arguments for compiling and linking with NetCDF. However, the arguments from pkg do not contain the rpath to the libraries. In order the run the toys I have explictily to set the `LD_LIBRARY_PATH`...
I could also explicitly set the rpath when compiling the toys, but then this path would have to match the library from the pkg agruments.
What is method to handle this issue? Would it be possible for YAC to already provide the rpath?Sergey KosukhinSergey Kosukhinhttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/8yaxt GPU support2022-09-23T06:25:29ZMoritz Hankeyaxt GPU supportCurrently, we rely on MPI to provide the support for user data that is in GPU memory. yaxt just passes data pointers and the appropriate MPI-datatypes to MPI. Internally, MPI determines the location of the data and apply the datatype acc...Currently, we rely on MPI to provide the support for user data that is in GPU memory. yaxt just passes data pointers and the appropriate MPI-datatypes to MPI. Internally, MPI determines the location of the data and apply the datatype accordingly. However, especially OpenMPI is terribly bad, when the data is in GPU memory und the MPI-datatype is not just a trivial vector. It will use individual `cuMemcopy` for the packing/unpacking of each contiguous section of memory. When comparing `mo_communication_orig` and `mo_communication_yaxt` (from ICON) this can make a difference of 100-1000x (`mo_communication_orig` does the packing/unpacking manually in the user code using `OpenACC` and only passes a pointer to one contiguous section of memory to MPI).
Since, it does not seem like OpenMPI will fix this in the near future. There was the plan to overcome this issue directly in yaxt. That is why I am writing a new exchanger based on `xt_exchanger_irecv_isend` that will do the packing/unpacking directly without the use of `MPI_Pack`/`MPI_Unpack`. Once this works for data in CPU memory, I am planing to add support for GPU memory by porting the packing/unpacking routines to GPU-code using `OpenACC`.
@jahns @m300488 @k202076 How should yaxt handle this explicit support for GPUs?
- check for `OpenACC`-support in `configure`
- `--enable-gpu-support`
- `-DHAVE_GPU_SUPPORT`
- (de)activate GPU support by new exchanger or deactivate exchanger completely if there is no `OpenACC`-support?Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/9CUDA-Kernels instead of OpenACC2022-09-23T08:18:25ZMoritz HankeCUDA-Kernels instead of OpenACC@jahns
Currently packing/unpacking kernels for the GPU support are generated using OpenACC.
Example for packing 8-Byte-data:
```
static void xt_ddt_pack_8(
size_t count, ssize_t *restrict displs, void const *restrict src,
void *rest...@jahns
Currently packing/unpacking kernels for the GPU support are generated using OpenACC.
Example for packing 8-Byte-data:
```
static void xt_ddt_pack_8(
size_t count, ssize_t *restrict displs, void const *restrict src,
void *restrict dst, enum xt_memtype memtype) {
XtPragmaACC(
parallel loop independent deviceptr(src, dst, displs)
if (memtype != XT_MEMTYPE_HOST))
for (size_t i = 0; i < count; ++i)
((int8_t*)dst)[i] = *(int8_t*)((unsigned char *)src + displs[i]);
}
```
Alternatively we could write the kernels in CUDA-code and compile them at runtime using [NVRTC](https://docs.nvidia.com/cuda/nvrtc/index.html#introduction). This approach is a little bit more complex for us, but the advantages would be:
- no dependencies on OpenACC
- we could compile at runtime for the architecture that is actually being used
- the configure would not have to determine any compiling/linking flags for the CUDA support (or provided by the user); the CUDA-root directory would be sufficientMoritz HankeMoritz Hankehttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/14MPI checks in configure2023-07-11T11:51:37ZXingran WangMPI checks in configureHi, the MPI-checks in configure are not sufficient for `new exchanger`.
I have tested new exchanger(master branch) with OpenMPI as ICON mo_communication on levante gpu partition. There were two versions of OpenMPI used, they were `open...Hi, the MPI-checks in configure are not sufficient for `new exchanger`.
I have tested new exchanger(master branch) with OpenMPI as ICON mo_communication on levante gpu partition. There were two versions of OpenMPI used, they were `openmpi-4.1.2-hzabdh` and `openmpi-4.1.4-3qb4sy`. They both passed YAXT MPI-checks in configuration, however, the new exchanger worked only with `openmpi-4.1.4-3qb4sy` in ICON. The issue when using `openmpi-4.1.2-hzabdh` was:
```
Bus error: nonexistent physical address
==== backtrace (tid: 125677) ====
15: 0 0x0000000000012c20 .annobin_sigaction.c() sigaction.c:0
15: 1 0x0000000000002c30 memcpy_uncached_load_sse41() /home/k/k202066/.spack/stage/spack-stage-gdrcopy-2.2-5dxzbgq35iriw3n2zewaxri6q2d65ffl/spack-src/src/memcpy_sse41.c:76
```
log file when using `openmpi-4.1.2-hzabdh` is `/work/k20200/k202149/icon-base-libs-yaxt_new_exchanger/run/LOG.exp.qubicc_r02b07.yaxt_new_exchanger.run.3620955.o`
modification in YAXT source code to use new exchanger:
```
diff --git a/src/xt_config.c b/src/xt_config.c
index f352c3e9..b6fdfc3c 100644
--- a/src/xt_config.c
+++ b/src/xt_config.c
@@ -68,7 +68,7 @@
#include "core/ppm_xfuncs.h"
struct Xt_config_ xt_default_config = {
- .exchanger_new = xt_exchanger_mix_isend_irecv_new,
+ .exchanger_new = xt_exchanger_irecv_isend_ddt_packed_new,
.exchanger_team_share = NULL,
.idxv_cnv_size = CHEAP_VECTOR_SIZE,
.flags = 0,
```
build script for YAXT:
```
module --force purge
spack unload -a
module load nvhpc/22.5-gcc-11.2.0 git patch
# spack load openmpi@4.1.2%nvhpc
spack load openmpi@4.1.4%nvhpc/3qb4sy
SW_ROOT='/sw/spack-levante'
CUDA_ROOT="${SW_ROOT}/nvhpc-22.5-v4oky3/Linux_x86_64/22.5/cuda"
# MPI_ROOT="${SW_ROOT}/openmpi-4.1.2-hzabdh"
MPI_ROOT="${SW_ROOT}/openmpi-4.1.4-3qb4sy"
# nordc is needed if OpenACC is used in a shared library
# -lnvToolsExt for profiling
# build yaxt with CUDA, OpenACC directives on GPU
# libcuda path and -lcuda shouldn't be provided to ld(1), dlopen() is used at runtime for dynamical link.
${sourcedir}/configure \
CC="${MPI_ROOT}/bin/mpicc" FC="${MPI_ROOT}/bin/mpif90" \
CFLAGS="-O2 -g -I${CUDA_ROOT}/include -acc=gpu -gpu=cc80,nordc -Minfo" \
LDFLAGS="-lnvToolsExt -acc=gpu -gpu=cc80,nordc" \
MPI_LAUNCH="/usr/bin/srun -p gpu -A k20200 -N 1 " \
--with-idxtype=long --disable-static
make
make check
make install
```
And ICON I used was [yaxt_new_exchanger-levante_phase2_gpu](https://gitlab.dkrz.de/icon/icon-dkrz/-/tree/yaxt_new_exchanger-levante_phase2_gpu)https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/15Bug in "Intel(R) MPI Library 2019 Update 7 for Linux"2023-07-19T08:00:50ZMoritz HankeBug in "Intel(R) MPI Library 2019 Update 7 for Linux"Someone at BSC tried to run ICON in coupled mode using "Intel(R) MPI Library 2019 Update 7 for Linux".
It crashed with the following backtrace:
![image](/uploads/81318f909213796b13aed418dc76498f/image.png)
I suggested using OpenMPI and n...Someone at BSC tried to run ICON in coupled mode using "Intel(R) MPI Library 2019 Update 7 for Linux".
It crashed with the following backtrace:
![image](/uploads/81318f909213796b13aed418dc76498f/image.png)
I suggested using OpenMPI and now it runs...therefore I assume that this version of Intel MPI contains a bug, which is not covered by the configure time checks of yaxt.
We should probably try to extend the checks.Moritz HankeMoritz Hankehttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/16Compatibility with newer autotools2023-09-12T13:05:46ZThomas JahnsCompatibility with newer autotoolsYAXT is currently tested and released with autoconf 2.69, automake 1.14.1 and libtool 2.4.6.
I've gone through changes in recent autoconf and automake and we seem fine in that regard by now or a workaround for changed behaviour is at lea...YAXT is currently tested and released with autoconf 2.69, automake 1.14.1 and libtool 2.4.6.
I've gone through changes in recent autoconf and automake and we seem fine in that regard by now or a workaround for changed behaviour is at least known. In case anyone encounters an issue running scripts/reconfigure with newer tools, please report here.
What's still missing is porting the libtool patches to 2.4.7, so that's mostly what this issue is meant to track.Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/17Feature request: configure option for CUDA support2023-07-19T22:47:04ZSergey KosukhinFeature request: configure option for CUDA supportAs far as I understand, the CUDA support gets enabled/disabled automatically based on the results of several configure-time checks. It would be nice to have a configure option to control that:
- option enabled => the feature is enabled a...As far as I understand, the CUDA support gets enabled/disabled automatically based on the results of several configure-time checks. It would be nice to have a configure option to control that:
- option enabled => the feature is enabled and the configure script fails if the feature cannot be enabled;
- option disabled => the feature is disabled and the configure script does not run the extra configure-time checks;
- (optionally) option not specified => enable the feature if possible and disable otherwise.
I'd prefer that over playing with the `ac_cv_header_cuda_h` cache variable.
When devising a name/interface for the option, it might make sense to keep in mind that there might be a request to support `AMD ROCm` platform in a similar way in the future.
**P.S.** Is the spelling `iff` [here](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/92a287383e07085e3537528ba1fbfe5c043a2182/configure.ac#L313) and in several other places intended or a copy/paste typo?https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/18Feaure request: skip rebuilding after re-configuration2023-07-14T13:19:07ZSergey KosukhinFeaure request: skip rebuilding after re-configurationAs far as I remember, Autoconf implements logic such that if a config file (i.e. one generated with `AC_CONFIG_HEADER`) remains the same after re-configuration, it's not touched and, therefore, the re-compilation of all files that depend...As far as I remember, Autoconf implements logic such that if a config file (i.e. one generated with `AC_CONFIG_HEADER`) remains the same after re-configuration, it's not touched and, therefore, the re-compilation of all files that depend on it is skipped. Of course, the logic in YAXT is more sophisticated and there are much more dependencies to keep under control. However, it would be nice if YAXT kept as many files with unchanged timestamps as possible after re-configuration. Especially, because in ICON, it triggers unnecessary rebuilding of all of its dependants.https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/19Bad performance in xmap generation2024-03-13T14:19:02ZMoritz HankeBad performance in xmap generation[Enrico](@k202136) noticed that the xmap generation in ICON (ICON<-->HAMOCC) took very long. After some investigation, he found out that the changing the ordering of the indices in the idxlists significantly improved it.
This is a repro...[Enrico](@k202136) noticed that the xmap generation in ICON (ICON<-->HAMOCC) took very long. After some investigation, he found out that the changing the ordering of the indices in the idxlists significantly improved it.
This is a reproducer for this issue: [reproducer.f90](/uploads/a3d533bfcb1f0a81d3c64b7587b973d1/reproducer.f90)
```
$ mpif90 reproducer.f90 -I ../../../bin/yaxt/include/ -L ../../../bin/yaxt/lib/ -lyaxt -lyaxt_c -o reproducer
$ mpirun -n 4 ./reproducer
xmap (bad): 0.98948631399999998
xmap (opt): 1.2026120000000473E-003
```Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/20Memory leak?2023-09-28T11:46:17ZMoritz HankeMemory leak?Measurements using release-v0.10.0 in YAC showed a memory leak in yaxt (with IntelMPI and OpenMPI). Valgrind Massif provided the following information:
```
n1: 2098490496 0x926B9F1: opal_datatype_create_desc (in /home/logiciels/local/op...Measurements using release-v0.10.0 in YAC showed a memory leak in yaxt (with IntelMPI and OpenMPI). Valgrind Massif provided the following information:
```
n1: 2098490496 0x926B9F1: opal_datatype_create_desc (in /home/logiciels/local/openmpi/4.1.1_gcc112/lib/libopen-pal.so.40.30.1)
n4: 2098490496 0x66EF37A: ompi_datatype_create (in /home/logiciels/local/openmpi/4.1.1_gcc112/lib/libmpi.so.40.30.1)
n2: 1121608064 0x66EF614: ompi_datatype_create_indexed (in /home/logiciels/local/openmpi/4.1.1_gcc112/lib/libmpi.so.40.30.1)
n1: 1121608064 0x67207E5: PMPI_Type_indexed (in /home/logiciels/local/openmpi/4.1.1_gcc112/lib/libmpi.so.40.30.1)
n1: 1121608064 0x5092DBC: match_indexed (xt_mpi_stripe_parse.c:164)
n1: 1121608064 0x5092DBC: parse_stripe (xt_mpi_stripe_parse.c:321)
n1: 1121608064 0x509673C: generate_datatype (xt_redist_p2p.c:200)
n1: 1121608064 0x509673C: generate_msg_infos.part.0 (xt_redist_p2p.c:221)
n1: 1121608064 0x5096E9C: generate_msg_infos (xt_redist_p2p.c:210)
n4: 1121608064 0x5096E9C: xt_redist_p2p_off_custom_new
```
This would indicate that at least some of the intermediate MPI datatypes generated in the parsing are not freed. I checked that all the redists are correctly freed in YAC.
@jahns: Do you have an idea how this leak could occur? I saw that some of the `XT_MPI_STRP_PRS_MATCH_*` can create a datatype but still return `false`. Could this be the cause?Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/21C++ header for yaxt2023-11-15T10:12:29ZMoritz HankeC++ header for yaxtSome yaxt interfaces contains C99 variable length arrays arguments. This can be an issue for some C++ compilers.
Should we generate/provide C++ header files? Alternatively one could make current yaxt headers compatible with C++ using so...Some yaxt interfaces contains C99 variable length arrays arguments. This can be an issue for some C++ compilers.
Should we generate/provide C++ header files? Alternatively one could make current yaxt headers compatible with C++ using some macros.Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/23version check in configure.ac returns strange results2024-02-15T13:20:18ZJan Streffingversion check in configure.ac returns strange resultsHey everyone, first time I try to compile YAXT (outside of the NATESM workshop). On juwels i run:
```
autoconf --version
autoconf (GNU Autoconf) 2.69
libtool --version
libtool (GNU libtool) 2.4.6
autoconf -i /usr/b...Hey everyone, first time I try to compile YAXT (outside of the NATESM workshop). On juwels i run:
```
autoconf --version
autoconf (GNU Autoconf) 2.69
libtool --version
libtool (GNU libtool) 2.4.6
autoconf -i /usr/bin/m4:configure.ac:62: bad expression in eval (bad input): (0r36:LT_+1) != (2+0) /usr/bin/m4:configure.ac:62: bad expression in eval (bad input): (0r36:PACKAGE_+1) != (0*6) configure.ac:62: error: autoconf versions 2.68 and newer require using libtool 2.4.2 or newer configure.ac:62: the top level autom4te: /usr/bin/m4 failed with exit status: 1
```
The check in line 62 of the configure.ac 2 is faulty. libtool is > 2.4.2https://gitlab.dkrz.de/dkrz-sw/yac/-/issues/2YAC compilation with nvidia compilers 23.3 fails while compiling icon-nwp2024-02-09T06:01:34ZPraveen Kumar PothapakulaYAC compilation with nvidia compilers 23.3 fails while compiling icon-nwpDear YAC developers,
I was trying to compile icon-nwp, while enabling YAC with nvidia 23.3 cpu compilers (not on Levante but in a local Swiss machine). The compilation always fails at YAC with the following error and more in the attach...Dear YAC developers,
I was trying to compile icon-nwp, while enabling YAC with nvidia 23.3 cpu compilers (not on Levante but in a local Swiss machine). The compilation always fails at YAC with the following error and more in the attached config.log file.
"/usr/bin/ld: /usr/lib64/crt1.o: in function `_start':
/home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:104: undefined reference to `main'
configure:3977: $? = 2 "
The question is, does YAC need higher version for compilation (min 23.9 of nvhpc compilers)? Or is it something more than this? Could I kindly have some tips on this. I would like to try running icon coupled seamless configuration.
Thanks a lot,
Praveen.
Attached in the YAC log file.
[config.log](/uploads/55977b68f1772e3e106aa7c38e3260a3/config.log)Moritz HankeMoritz Hanke