Skip to content

Optimize tdma_solver_vec and make it optionally asynchronous

Dmitry Alexeev requested to merge nvidia-optimize-tdma into main

What is the bug

This is a small optimization of tdma_solver_vec that improves its performance and makes it possible to use completely asynchronously. It's a prerequisite to the JSBACH CUDA graph implementation: https://gitlab.dkrz.de/jsbach/jsbach/-/merge_requests/173

How do you fix it

N/A

How urgent is the bugfix

  • I need it as soon as possible
  • I can wait for a couple of days
  • None of my current codes is directly affected

Mandatory steps before review

  • Gitlab CI passes (Hint: use make format for linting)
  • Bugfix is covered by additional unit tests
  • Mark the merge request as ready by removing Draft:

Mandatory steps before merge

  • Reviewed by a maintainer
  • Incorporate review suggestions
  • Remember to edit the commit message and select the proper changelog category (feature/bugfix/other)
  • Prior to merging, please remove any boilerplate from the MR description, retaining only the What is the bug and How do you fix it section to maintain

You are not supposed to merge this request by yourself, the maintainers of libiconmath take care of this action!

Edited by Pradipta Samanta

Merge request reports

Loading