Refactor backtrace
What happens when finish
is called:
- call backtrace passed as callback
- raise SIGSEGV in traceback function
- signal handler aborts MPI-procs
What code implies happens when finish
is called:
- call backtrace passed as callback
- call model_abort passed as callback
- model_abort call MPI_ABORT
This MR moves the code path back to what should happen with the following changes:
- remove any
raise(SIGSEGV)
from traceback-function, except NEC that offers no better traceback option - fallback options with primitive backtrace if
unwind.h
andpthreads.h
not available
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x6000061e7c98]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x6000061b1d28]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600006168570]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600003cbedc0]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600002499fc0]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000fe73b0]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000fd0eb8]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000fc4280]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000387d18]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x60000014d1b0]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000033cc0]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x6000092a4b08]
[util_backtrace]: /opt/nec/ve/lib/libc.so.6(__libc_start_main+0x3a8) [0x600c02040818]
[util_backtrace]: /hpc/sw/buildbot/home/data0/DWD_nec/build/bin/icon() [0x600000014108]
[util_backtrace]: Use addr2line for addresses to line number conversion.
- fallback for
(__APPLE_)
No backtrace for APPLE available
- fallback for
(__NEC_)
no backtrace for NEC available, raise SIGSEGV instead
- replace all compiler-specific traceback-functions by a single one from https://github.com/aivarsk/stacktrace with licence
- dumps traceback as follows:
Levante-gcc
227: #0 0x1d1b0f8 - stacktrace_print in /work/mh0287/buildbot/levante5/levante_gcc/build/externals/fortran-support/src/util_backtrace.c:177
227: #1 0x1cf9299 - __mo_exception_MOD_finish in /work/mh0287/buildbot/levante5/levante_gcc/build/externals/fortran-support/src/mo_exception.f90:238
227: #2 0x16591e5 - __mo_nwp_gscp_interface_MOD_nwp_microphysics in /work/mh0287/buildbot/levante5/levante_gcc/build/src/atm_phy_nwp/mo_nwp_gscp_interface.f90:652
227: #3 0x134df36 - __mo_nh_interface_nwp_MOD_nwp_nh_interface in /work/mh0287/buildbot/levante5/levante_gcc/build/src/atm_phy_nwp/mo_nh_interface_nwp.f90:803
227: #4 0xec705f - __mo_nh_stepping_MOD_integrate_nh in /work/mh0287/buildbot/levante5/levante_gcc/build/src/atm_dyn_iconam/mo_nh_stepping.f90:2115
227: #5 0xecd287 - allocate_nh_stepping in /work/mh0287/buildbot/levante5/levante_gcc/build/src/atm_dyn_iconam/mo_nh_stepping.f90:1187
227: #6 0x710410 - __mo_atmo_nonhydrostatic_MOD_atmo_nonhydrostatic in /work/mh0287/buildbot/levante5/levante_gcc/build/src/drivers/mo_atmo_nonhydrostatic.f90:245
227: #7 0x44cb8c - __mo_atmo_model_MOD_atmo_model in /work/mh0287/buildbot/levante5/levante_gcc/build/src/drivers/mo_atmo_model.f90:209
227: #8 0x40f0f1 - MAIN__ in icon.f90:0
Levante-intel
64: #0 0x2661c41 - mo_exception_mp_finish_ in /work/mh0287/buildbot/levante5/levante_intel/build/externals/fortran-support/src/mo_exception.f90:238
64: #1 0x1e060d1 - mo_nwp_gscp_interface_mp_nwp_microphysics_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/atm_phy_nwp/mo_nwp_gscp_interface.f90:652
64: #2 0x19f1f03 - mo_nh_interface_nwp_mp_nwp_nh_interface_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/atm_phy_nwp/mo_nh_interface_nwp.f90:789
64: #3 0x130a31f - mo_nh_stepping_mp_integrate_nh_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/atm_dyn_iconam/mo_nh_stepping.f90:2091
64: #4 0x12f8fa1 - mo_nh_stepping_mp_perform_nh_timeloop_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/atm_dyn_iconam/mo_nh_stepping.f90:1166
64: #5 0x12f3df9 - deallocate_nh_stepping in /work/mh0287/buildbot/levante5/levante_intel/build/src/atm_dyn_iconam/mo_nh_stepping.f90:3326
64: #6 0x8e44db - mo_atmo_nonhydrostatic_mp_atmo_nonhydrostatic_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/drivers/mo_atmo_nonhydrostatic.f90:245
64: #7 0x494698 - mo_atmo_model_mp_atmo_model_ in /work/mh0287/buildbot/levante5/levante_intel/build/src/drivers/mo_atmo_model.f90:202
64: #8 0x41e959 - MAIN__ in /work/mh0287/buildbot/levante5/levante_intel/build/src/drivers/icon.f90:227
64: #9 0x41e362 - main in /work/mh0287/buildbot/levante5/levante_intel/build/bin/icon:0
64: #10 0x7fff7c791cf3 - ?? in /usr/lib64/libc-2.28.so:0
64: #11 0x41e26e - _start in /work/mh0287/buildbot/levante5/levante_intel/build/bin/icon:0
levante-nag
87: #0 0xa2055d8 - stacktrace_print in /work/mh0287/buildbot/levante1/levante_nag/build/externals/fortran-support/src/util_backtrace.c:177
87: #1 0xa1b728e - mo_util_backtrace_MP_ftn_util_backtrace in /work/mh0287/buildbot/levante1/levante_nag/build/externals/fortran-support/src/mo_util_backtrace.f90:25
87: #2 0xa147e35 - mo_exception_MP_finish in /work/mh0287/buildbot/levante1/levante_nag/build/externals/fortran-support/src/mo_exception.f90:238
87: #3 0x82be13c - mo_nwp_gscp_interface_MP_nwp_microphysics in /work/mh0287/buildbot/levante1/levante_nag/build/src/atm_phy_nwp/mo_nwp_gscp_interface.f90:652
87: #4 0x6d8e0ee - mo_nh_interface_nwp_MP_nwp_nh_interface in /work/mh0287/buildbot/levante1/levante_nag/build/src/atm_phy_nwp/mo_nh_interface_nwp.f90:789
87: #5 0x462e945 - mo_nh_stepping_MP_integrate_nh in /work/mh0287/buildbot/levante1/levante_nag/build/src/atm_dyn_iconam/mo_nh_stepping.f90:2091
87: #6 0x4654f3a - mo_nh_stepping_MP_perform_nh_timeloop in /work/mh0287/buildbot/levante1/levante_nag/build/src/atm_dyn_iconam/mo_nh_stepping.f90:1166
87: #7 0x4676a41 - mo_nh_stepping_MP_perform_nh_stepping in /work/mh0287/buildbot/levante1/levante_nag/build/src/atm_dyn_iconam/mo_nh_stepping.f90:703
87: #8 0x1af4401 - mo_atmo_nonhydrostatic_MP_atmo_nonhydrostatic in /work/mh0287/buildbot/levante1/levante_nag/build/src/drivers/mo_atmo_nonhydrostatic.f90:243
87: #9 0x61bde0 - mo_atmo_model_MP_atmo_model in /work/mh0287/buildbot/levante1/levante_nag/build/src/drivers/mo_atmo_model.f90:204
87: #10 0x426c01 - icon_ in /work/mh0287/buildbot/levante1/levante_nag/build/src/drivers/icon.f90:227
87: #11 0x424222 - main in /work/mh0287/buildbot/levante1/levante_nag/build/src/drivers/icon.f90:16
87: #12 0x7fff7c5dfcf3 - ?? in /usr/lib64/libc-2.28.so:0
levante_cpu_nvhpc
0: #0 0x1808e88 - mo_util_backtrace_ftn_util_backtrace_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/externals/fortran-support/src/mo_util_backtrace.f90:25
0: #1 0x17f3f49 - mo_exception_finish_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/externals/fortran-support/src/mo_exception.f90:238
0: #2 0xd4e411 - mo_nh_stepping_perform_nh_stepping_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/src/atm_dyn_iconam/mo_nh_stepping.f90:297
0: #3 0x720120 - mo_atmo_nonhydrostatic_atmo_nonhydrostatic_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/src/drivers/mo_atmo_nonhydrostatic.f90:238
0: #4 0x463ae7 - mo_atmo_model_atmo_model_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/src/drivers/mo_atmo_model.f90:209
0: #5 0x410483 - MAIN_ in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/src/drivers/icon.f90:265
0: #6 0x40fcf3 - main in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/bin/icon:0
0: #7 0x7fff7a83bcf3 - ?? in /usr/lib64/libc-2.28.so:0
0: #8 0x40fbee - _start in /work/mh0287/buildbot/levante4/levante_cpu_nvhpc/build/bin/icon:0
0: #9 (nil) - ?? in ??:
levante_gpu_nvhpc
#0 0x245ed87 - mo_exception_finish_ in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/externals/fortran-support/src/mo_exception.f90:238
0: #1 0x102c46f - mo_nh_stepping_perform_nh_stepping_ in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/src/atm_dyn_iconam/mo_nh_stepping.f90:297
0: #2 0x791089 - mo_atmo_nonhydrostatic_atmo_nonhydrostatic_ in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/src/drivers/mo_atmo_nonhydrostatic.f90:238
0: #3 0x475777 - mo_atmo_model_atmo_model_ in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/src/drivers/mo_atmo_model.f90:209
0: #4 0x412ea4 - MAIN_ in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/src/drivers/icon.f90:265
0: #5 0x412833 - main in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/bin/icon:0
0: #6 0x7fff79de5cf3 - ?? in /usr/lib64/libc-2.28.so:0
0: #7 0x41270e - _start in /work/mh0287/buildbot/levante5/levante_gpu_nvhpc/build/bin/icon:0
0: #8 (nil) - ?? in ??
daint_cpu_cce
#0 0x2dbccda - resize_column$mo_util_table_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#1 0x2d8db7b - stack_op_plus_eval_2d$mo_expression_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#2 0x19adab1 - perform_nh_timeloop$mo_nh_stepping_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#3 0xc83361 - allocate_grf_state$mo_grf_intp_state_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#4 0x482f88 - write_geometry_info$mo_grid_geometry_info_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#5 0x419cf9 - p_recv_int_3d$mo_mpi_ in /scratch/snx3000/icontest/buildbot/icon-new/daint103/DAINT_CPU_cce/build/bin/icon:0
#6 0x15554cd5b3ea - ?? in /lib64/libc-2.26.so:0
Question
Could we use this traceback also in util_signal.c
for better feedback from the signla trap?
Edited by Jonas Jucker