dkrz-sw issueshttps://gitlab.dkrz.de/groups/dkrz-sw/-/issues2023-12-08T12:33:31Zhttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/22bug in xt_xmap_intersection.c?2023-12-08T12:33:31ZMoritz Hankebug in xt_xmap_intersection.c?@k202136 reported that when using the `perf_hack` branch, the last commit (setting `stripify` always to zero in xt_xmap_dist_dir.c) leads to an error in xt_xmap_intersection.c line 876.
I checked the code and could find no obvious issue...@k202136 reported that when using the `perf_hack` branch, the last commit (setting `stripify` always to zero in xt_xmap_dist_dir.c) leads to an error in xt_xmap_intersection.c line 876.
I checked the code and could find no obvious issue. @jahns do you see anything?
@k202136: how can I reproduce this?Moritz HankeMoritz Hankehttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/11Alignment issue with xt_lib_state2023-07-11T12:01:12ZMoritz HankeAlignment issue with xt_lib_state[Luis](@m214089) reported an issue, which [Jörg](@k202076) and myself also encountered before (using gcc):
```
/usr/bin/ld: Warning: alignment 8 of symbol `xt_lib_state' in
externals/yaxt/src/.libs/libyaxt.a(xt_core_f.o) is smaller than ...[Luis](@m214089) reported an issue, which [Jörg](@k202076) and myself also encountered before (using gcc):
```
/usr/bin/ld: Warning: alignment 8 of symbol `xt_lib_state' in
externals/yaxt/src/.libs/libyaxt.a(xt_core_f.o) is smaller than 16 in
externals/yaxt/src/.libs/libyaxt_c.a(xt_init.o)
```
In `xt_init.c` the respective variable is defined as:
```
int xt_lib_state __attribute__((aligned(16),common))
```
And in Fortran it is declared as:
```
INTEGER(c_int), PUBLIC, BIND(c, name='xt_lib_state') :: xt_lib_state
```
[@Thomas](@jahns) do you have an idea what causes this issue?Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/3Test test_exported_symbols fails on MacOs2023-03-09T12:44:57ZSergey KosukhinTest test_exported_symbols fails on MacOs```
Failure in created libraries!
Unexpected symbols exported from library!
0000000000000000 T ___yaxt_MOD___copy_yaxt_Xt_modifier
0000000000000500 S ___yaxt_MOD___def_init_yaxt_Xt_modifier
00000000000004c0 D ___yaxt_MOD___vtab_yaxt_Xt_m...```
Failure in created libraries!
Unexpected symbols exported from library!
0000000000000000 T ___yaxt_MOD___copy_yaxt_Xt_modifier
0000000000000500 S ___yaxt_MOD___def_init_yaxt_Xt_modifier
00000000000004c0 D ___yaxt_MOD___vtab_yaxt_Xt_modifier
0000000000000430 T ___yaxt_MOD_xt_bounds_eq
00000000000003e0 T ___yaxt_MOD_xt_bounds_ne
0000000000000480 T ___yaxt_MOD_xt_idxempty_new
00000000000002b0 T ___yaxt_MOD_xt_idxmod_new_a1d
0000000000000160 T ___yaxt_MOD_xt_idxmod_new_a1d_a1d
0000000000000100 T ___yaxt_MOD_xt_idxmod_new_a1d_i4
00000000000000a0 T ___yaxt_MOD_xt_idxmod_new_a1d_i4_a1d
0000000000000040 T ___yaxt_MOD_xt_idxmod_new_a1d_i4_a2d
```
It looks like [this](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/tests/test_exported_symbols.in#L84-85) is missing `${acx_symprfx}`.https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/10MPICH 4.x fails the tests2023-03-08T23:11:32ZSergey KosukhinMPICH 4.x fails the testsI was looking for a version/configuration of MPICH that would pass YAXT tests.
I have tested the following full matrix:
- MPICH versions: `main` (a.k.a. `4.1.x`), `4.0.x`, `4.0.2`, `3.4.x`, `3.4.3`;
- datatype engines: `yaksa`, `dataloo...I was looking for a version/configuration of MPICH that would pass YAXT tests.
I have tested the following full matrix:
- MPICH versions: `main` (a.k.a. `4.1.x`), `4.0.x`, `4.0.2`, `3.4.x`, `3.4.3`;
- datatype engines: `yaksa`, `dataloop`
- YAXT versions: `master`, `0.9.3.1`
In all cases, I used the system installation of GCC 8.4.1 (i.e. the one from `/usr/bin`) on Levante.
The result was that all tests passed except for:
1. `test_redist_collection_parallel_run` fails in all configurations. It looks like a compiler bug to me: [this condition](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/release-0.9.3.1/tests/test_redist_collection_parallel_f.f90#L383) is true although all values in the arrays are equal. The test does not fail with DKRZ-provided GCC 11.2.0 and on my machine with Debian-provided GCC 10.2.1.
2. `test_ddt_run` (was added after the YAXT release `0.9.3.1`) fails with all `4.x` versions of MPICH and datatype engine `yaksa` (`dataloop` is fine). I haven't looked deep into this one. The only thing I can tell is that the failure looks like: `YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)`.
3. `test_exchanger_parallel_run` fails with all `4.x` versions of MPICH, regardless of the datatype engine. More specifically, the following command fails: `$MPI_LAUNCH -n 3 ./test_exchanger_parallel -m neigh_alltoall`. It looks like there is an assertion that fails [here](https://github.com/pmodels/mpich/blob/64a07944d56d9a6aebad2b8e1ce9ba78c63c151d/src/mpi/datatype/typerep/src/typerep_yaksa_pack.c#L286) (must be a different line in the case of `dataloop`). That happens because the variable `outcount` ([here](https://github.com/pmodels/mpich/blob/64a07944d56d9a6aebad2b8e1ce9ba78c63c151d/src/mpi/datatype/typerep/src/typerep_yaksa_pack.c#L240)) has a very strange value `8101236137501347664`, which comes from [this line](https://github.com/pmodels/mpich/blob/64a07944d56d9a6aebad2b8e1ce9ba78c63c151d/src/mpid/ch4/src/mpidig_pt2pt_callbacks.c#L415) as the value of `(rreq->dev.ch4.am).count`. And here I stopped digging.
I am attaching [test_yaxt_mpich.sh](/uploads/317ec5283ff44a15ee814ab4c4946150/test_yaxt_mpich.sh), which reproduces all of the above on Levante (testing with GCC 10+ requires configuring MPICH with two additional arguments: `FFLAGS=-fallow-argument-mismatch FCFLAGS=-fallow-argument-mismatch`).
It would be nice if someone took a deeper look at the second problem, came up with a minimal reproducer of the third one and submitted bug reports for both to the MPICH developers. It might also make sense to introduce a workaround for the first issue for the buggy compiler (not the first one, not the last one, I guess).Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/13Problems with MPICH 4.0.32023-01-09T07:22:45ZMoritz HankeProblems with MPICH 4.0.3All MPICH version 4.0.x contain a datatype related bug (see [YAC issue](https://gitlab.dkrz.de/YAC/YAC-dev/-/issues/31)). This bug could also effect yaxt users, since we generate a lot of datatypes (see [MPICH bug description](https://gi...All MPICH version 4.0.x contain a datatype related bug (see [YAC issue](https://gitlab.dkrz.de/YAC/YAC-dev/-/issues/31)). This bug could also effect yaxt users, since we generate a lot of datatypes (see [MPICH bug description](https://github.com/pmodels/mpich/issues/6341)). Should we check for this bug in yaxt configure as well?Thomas JahnsThomas Jahnshttps://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/6Further incompatibilities with BSD sed2022-06-24T14:13:24ZSergey KosukhinFurther incompatibilities with BSD sed1. BSD sed does not support the escaped alternation symbol, i.e. `\|`, but only the unescaped version, i.e. `|`, with the `-E` flag (some time in between MacOS `10.15.5` and MacOS `12.3.1`, the alternative `-r` flag got the support as we...1. BSD sed does not support the escaped alternation symbol, i.e. `\|`, but only the unescaped version, i.e. `|`, with the `-E` flag (some time in between MacOS `10.15.5` and MacOS `12.3.1`, the alternative `-r` flag got the support as well). In particular, this affects:
- [tests/test_exported_symbols.in](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/tests/test_exported_symbols.in#L78);
- [m4/acx_use_libtool_configuration.m4](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/m4/acx_use_libtool_configuration.m4#L144) (I have to admit that 837c625d was the right way to solve the problem and 78563389, which was adopted as d820b818, was not).
2. BSD sed does not support the word boundary sequence `\b`. In particular, this affects:
- the same [m4/acx_use_libtool_configuration.m4](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/m4/acx_use_libtool_configuration.m4#L144);
- and [contrib/06ltmain_nag_pthread-patch/ltmain_nag_pthread.patch](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/62afd60119383ef7b6aba23a8baf45c0063fcc5f/contrib/06ltmain_nag_pthread-patch/ltmain_nag_pthread.patch#L10) (one of the things that breaks building with NAG on MacOS).
It looks like we could replace `\b` with the end of a word sequence in the two cases above. The only problem is that the end of the word sequence is `/>` for GNU sed and `[[:>:]]` for BSD sed.
I have not checked whether there are more lines in the code affected by these problems.https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/5Configure-time downloading with curl does not work for the MPICH workaround2022-06-24T14:10:27ZSergey KosukhinConfigure-time downloading with curl does not work for the MPICH workaroundUsing `curl` is a nice fallback solution, especially for MacOS, which does not have `wget` by default.
Unfortunately, it does not work as it is for the MPICH workaround because:
1. Downloading the commit-based URLs from GitHub with `cur...Using `curl` is a nice fallback solution, especially for MacOS, which does not have `wget` by default.
Unfortunately, it does not work as it is for the MPICH workaround because:
1. Downloading the commit-based URLs from GitHub with `curl` requires the `-L` flag. The tag-based URLs do not seem to be affected because the OpenMPI workaround gets downloaded even without the flag.
2. The `-O` flag must be provided before each URL on the command line. Otherwise, the contents of the files are simply printed to the standard output.https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/2Workaround for test_xmap_all2all_fail_run2022-06-24T13:51:40ZSergey KosukhinWorkaround for test_xmap_all2all_fail_runThis is a request to extend the workaround for `test_xmap_all2all_fail_run`. The problem was originally described [here](https://gitlab.dkrz.de/icon/icon-cimd/-/merge_requests/36#note_72444) (the links in the original message are updated...This is a request to extend the workaround for `test_xmap_all2all_fail_run`. The problem was originally described [here](https://gitlab.dkrz.de/icon/icon-cimd/-/merge_requests/36#note_72444) (the links in the original message are updated):
> Thomas (@jahns), can we get a piece of advice from you here? We have a problem with `test_xmap_all2all_fail_run`. As far as I understand, we have to enable the `MPI_Abort` workaround on the DWD NEC machine. Since we do the cross-compilation and do not seem to have something like `srun` (more on this [here](https://gitlab.dkrz.de/icon/icon-cimd/-/blob/e304ef400dcf598fd7c89e336268a38c93429c3d/config/buildbot/dwd_nec#L18-41)), we enable the workaround manually (see [9ae388bb](https://gitlab.dkrz.de/icon/icon-cimd/-/merge_requests/36/diffs?commit_id=9ae388bb8fb007f867097962d1e2ad8d22d94304)). The problem is that the output of `mpirun` goes directly to the log file (search [here](https://buildbot.dkrz.de/builders/DWD_nec_yac2/builds/1178/steps/exp/logs/check_externals_DWD) for line `MPI_Abort(0xdeadbeef, 3)`) leaving the variable [`diags`](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/b4ec3d75c6045eeaf01d81e6dcbc53fb8c1cc545/tests/test_xmap_all2all_fail_run.in#L10-14) empty. As far as I understand, there is [a potential solution](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/7c5148b7d20d07bd6fb99f1ab465060a835d5ade/tests/test_xmap_all2all_fail_run.in#L65-67) in the `master` branch of YAXT but it looks like the file `test_xmap_all2all_fail.result.txt` is [not created anymore](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/commit/b2ae926f424711a28449e21693ffcec081769c4b) (I also assume that [the Fortran test](https://gitlab.dkrz.de/dkrz-sw/yaxt/-/blob/master/tests/test_xmap_all2all_fail_f.f90) would have to create such file too). What do you think we should do with this issue?
I am sorry if this has already been addressed one way or another.https://gitlab.dkrz.de/dkrz-sw/yaxt/-/issues/1Question: YAXT communication overlapping calculation2022-01-13T13:40:54ZFlorian PrillQuestion: YAXT communication overlapping calculation@jahns @k202077
I have the following YAXT-related question - I'd be happy about some best practice advice ... In short, I'd like to avoid an additional copy of a data array, which seems to be required.
My program overlaps communicatio...@jahns @k202077
I have the following YAXT-related question - I'd be happy about some best practice advice ... In short, I'd like to avoid an additional copy of a data array, which seems to be required.
My program overlaps communication with computation.
- It operates on a given multi-dimensional floating-point array `arr(:,:,:)`.
- `arr` is MPI-decomposed (in some irregular fashion).
- A halo synchronization pattern for `arr` has been implemented with YAXT.
Now, the overall program is as follows:
```
1. do some calculation on halo indices in "arr"
2. YAXT: launch an asynchronous "xt_redist_a_exchange"
3. do the remaining calculations (for the non-halo points)
4. perform stencil evaluations on non-halo points
5. YAXT: receive asynchronous data exchange from step 2
6. do the remaining stencil evaluations using the received indices
```
The problem here is that I cannot fill individual entries of `arr` in step 3, while the async communication is under way - even if it can be assured that these entries are no destination points of the YAXT communication.
The only remedy I can think of is splitting `arr` into a halo-indices part `arr_ifc` and an interior part `arr_interior`. Furthermore, in order to avoid an index translation, I decided to allocate `arr_ifc` and `arr_interior` for all points, which is really awkward.
Can you think of any implementation which requires only a single data array for this task?https://gitlab.dkrz.de/dkrz-sw/sct/-/issues/3Does only compile with gcc but no other compiler2020-03-29T18:45:55ZLuis KornbluehDoes only compile with gcc but no other compilerHi Hendryk,
I got the report back from CSCS/MeteoSwiss that sct does only compile with gcc, but not
icc, pgcc, and Cray cc.
It would be great to get this fixed.
Cheerio,
LuisHi Hendryk,
I got the report back from CSCS/MeteoSwiss that sct does only compile with gcc, but not
icc, pgcc, and Cray cc.
It would be great to get this fixed.
Cheerio,
LuisHendryk Bockelmannbockelmann@dkrz.deHendryk Bockelmannbockelmann@dkrz.dehttps://gitlab.dkrz.de/dkrz-sw/sct/-/issues/4PGI compiler problem: Illegal type conversion2020-03-29T18:43:47ZHendryk Bockelmannbockelmann@dkrz.dePGI compiler problem: Illegal type conversionusing pgi/19.9 with OpenMP gives:
PGC-S-0094-Illegal type conversion required (sct_reduce.c: 1196)
PGC/x86-64 Linux 19.9-0: compilation completed with severe errorsusing pgi/19.9 with OpenMP gives:
PGC-S-0094-Illegal type conversion required (sct_reduce.c: 1196)
PGC/x86-64 Linux 19.9-0: compilation completed with severe errorsHendryk Bockelmannbockelmann@dkrz.deHendryk Bockelmannbockelmann@dkrz.dehttps://gitlab.dkrz.de/dkrz-sw/sct/-/issues/1mistral impi check2019-03-07T07:59:29ZHendryk Bockelmannbockelmann@dkrz.demistral impi checklibsct mit impi und ifort auf mistral failed make checklibsct mit impi und ifort auf mistral failed make checkHendryk Bockelmannbockelmann@dkrz.deHendryk Bockelmannbockelmann@dkrz.de