Optimize tdma_solver_vec and make it optionally asynchronous
What is the bug
This is a small optimization of tdma_solver_vec
that improves its performance and makes it possible to use completely asynchronously. It's a prerequisite to the JSBACH CUDA graph implementation: https://gitlab.dkrz.de/jsbach/jsbach/-/merge_requests/173
How do you fix it
N/A
How urgent is the bugfix
-
I need it as soon as possible -
I can wait for a couple of days -
None of my current codes is directly affected
Mandatory steps before review
-
Gitlab CI passes (Hint: use make format
for linting) -
Bugfix is covered by additional unit tests -
Mark the merge request as ready by removing Draft:
Mandatory steps before merge
-
Reviewed by a maintainer -
Incorporate review suggestions -
Remember to edit the commit message and select the proper changelog category (feature/bugfix/other) -
Prior to merging, please remove any boilerplate from the MR description, retaining only the What is the bug and How do you fix it section to maintain
You are not supposed to merge this request by yourself, the maintainers of libiconmath take care of this action!
Edited by Pradipta Samanta