Skip to content
Snippets Groups Projects
  1. Mar 31, 2025
  2. Mar 27, 2025
  3. Mar 25, 2025
  4. Mar 17, 2025
  5. Mar 03, 2025
  6. Feb 24, 2025
  7. Feb 20, 2025
  8. Feb 17, 2025
  9. Feb 12, 2025
  10. Jan 29, 2025
  11. Jan 15, 2025
    • Daniel Reinert's avatar
      Generalize field declaration for the least squares reconstruction routines... · 59496a48
      Daniel Reinert authored and Pradipta Samanta's avatar Pradipta Samanta committed
      Generalize field declaration for the least squares reconstruction routines (!24)
      
      ## What is the new feature
      
      The field declaration of the LSQ reconstruction routines has been generalized slightly, in order to make the routines applicable to the ocean surface wave model **ICON-wave**. 
      
      ## How is it implemented
      
      Currently, the 'vertical' or '3rd' dimension of the local field `z_b` is implicitly identified with `p_patch%nlev`. This is fine for the atmosphere model, but does not work for ICON-WAVE, where the 3rd dimension is the total number of spectral wave energy bins. 
      This issue has been solved by replacing 
      ```
      REAL(wp) :: z_d(3, nproma, nlev)
      ``` 
      with
      ```
      REAL(wp) :: z_d(3, nproma, elev)
      ```
      As a side effect, `nlev` is no longer needed by the reconstruction routines. The argument lists have been adapted accordingly.
      
      Merged-by: default avatarPradipta Samanta <samanta@dkrz.de>
      Changelog: default
  12. Jan 07, 2025
  13. Dec 20, 2024
    • Daniel Reinert's avatar
      Add NEC-specific compiler option to mo_lib_divrot (!20) · 3daa158f
      Daniel Reinert authored and Pradipta Samanta's avatar Pradipta Samanta committed
      
      ## What is the bug
      
      The least squares polynomial reconstruction routines `recon_lsq_cell_XY` have been extracted from the main ICON code and moved to the library `libiconmath`. Testing on NEC@DWD revealed that some of these routines did not vectorize anymore after moving them to `libiconmath`. To illustrate the issue, the compile listing of the loop body is given below for `recon_lsq_cell_c_lib`
      ```
       1175: |+----->        DO jk = slev, elev
        1176: ||        
        1177: ||                !$ACC LOOP VECTOR PRIVATE(z_qt_times_d)
        1178: ||        !NEC$ ivdep
        1179: ||+---->          DO jc = i_startidx, i_endidx
        1180: |||       
        1181: |||                 ! calculate matrix vector product Q^T d (transposed of Q times LHS)
        1182: |||                 ! (intrinsic function matmul not applied, due to massive
        1183: |||                 ! performance penalty on the NEC. Instead the intrinsic dot product
        1184: |||                 ! function is applied
        1185: |||       !TODO:  these should be nine scalars, since they should reside in registers
        1186: |||V===>            z_qt_times_d(1) = DOT_PRODUCT(lsq_qtmat_c(jc, 1, 1:9, jb), z_d(1:9, jc, jk))
        1187: |||V===>            z_qt_times_d(2) = DOT_PRODUCT(lsq_qtmat_c(jc, 2, 1:9, jb), z_d(1:9, jc, jk))
        1188: |||V===>            z_qt_times_d(3) = DOT_PRODUCT(lsq_qtmat_c(jc, 3, 1:9, jb), z_d(1:9, jc, jk))
        1189: |||V===>            z_qt_times_d(4) = DOT_PRODUCT(lsq_qtmat_c(jc, 4, 1:9, jb), z_d(1:9, jc, jk))
      ```
      For the library-variant of `recon_lsq_cell_c` we see that the DOT_PRODUCT gets vectorized rather than the horizontal `jc` loop, which leads to a significant performance penalty. Testing revealed, that the desired vectorization can be achieved by ensuring that a complete unrolling of the dot product is performed by the compiler. Unrolling is achieved by changing the compile option `floop-unroll-completely=m` from `m=8` to `m=10`. This makes sense, since the loop count for the dot product in the example given is `m=9`. 
      
      ## How do you fix it
      
      In order to minimize the possibility of unexpected side-effects, the updated compiler option is applied only locally to the module  `mo_lib_divrot`, by adding the directive
      ```
      !NEC$ options "-floop-unroll-completely=10"
      ```
      to the top of the module. The resulting compile listing with corrected vectorization is given below:
      ```
        1180: |+----->        DO jk = slev, elev
        1181: ||        
        1182: ||                !$ACC LOOP VECTOR PRIVATE(z_qt_times_d)
        1183: ||        !NEC$ ivdep
        1184: ||V---->          DO jc = i_startidx, i_endidx
        1185: |||       
        1186: |||                 ! calculate matrix vector product Q^T d (transposed of Q times LHS)
        1187: |||                 ! (intrinsic function matmul not applied, due to massive
        1188: |||                 ! performance penalty on the NEC. Instead the intrinsic dot product
        1189: |||                 ! function is applied
        1190: |||       !TODO:  these should be nine scalars, since they should reside in registers
        1191: |||*===>            z_qt_times_d(1) = DOT_PRODUCT(lsq_qtmat_c(jc, 1, 1:9, jb), z_d(1:9, jc, jk))
        1192: |||*===>            z_qt_times_d(2) = DOT_PRODUCT(lsq_qtmat_c(jc, 2, 1:9, jb), z_d(1:9, jc, jk))
        1193: |||*===>            z_qt_times_d(3) = DOT_PRODUCT(lsq_qtmat_c(jc, 3, 1:9, jb), z_d(1:9, jc, jk))
        1194: |||*===>            z_qt_times_d(4) = DOT_PRODUCT(lsq_qtmat_c(jc, 4, 1:9, jb), z_d(1:9, jc, jk))
        1195: |||*===>            z_qt_times_d(5) = DOT_PRODUCT(lsq_qtmat_c(jc, 5, 1:9, jb), z_d(1:9, jc, jk))
        1196: |||*===>            z_qt_times_d(6) = DOT_PRODUCT(lsq_qtmat_c(jc, 6, 1:9, jb), z_d(1:9, jc, jk))
      ```
      
      Approved-by: default avatarPradipta Samanta <samanta@dkrz.de>
      Merged-by: default avatarPradipta Samanta <samanta@dkrz.de>
      Changelog: default
      3daa158f
  14. Dec 09, 2024
  15. Nov 28, 2024
  16. Nov 26, 2024
  17. Nov 25, 2024
  18. Oct 17, 2024
  19. Sep 24, 2024
  20. Sep 23, 2024
  21. Aug 27, 2024
  22. Aug 23, 2024
  23. Aug 21, 2024
  24. Aug 16, 2024
  25. Aug 15, 2024
  26. Aug 13, 2024
Loading