Update Ice Dwarf: STORM authored by Wilton Jaciel Loch's avatar Wilton Jaciel Loch
......@@ -47,12 +47,12 @@ Times collected for all the following regions:
The first and most important metric is the execution time, which is shown in the following plot. The stress_tensor, stress2rhs and evp_loop regions are executed inside of the EVP main loop, which for the current experiment configuration is run for 120 iterations. Therefore, the total accumulated wall time of theses regions is 120x larger. Here their times are shown as for a single step to allow the single launch comparison between kernels/parallel regions. The ice_fem_fct kernel is also executed 3 times for 3 different arrays, however the time shown is aggregated for the 3.
![mean_execution_times](uploads/60cfc88fe5ca1856c328e0e03d8bb148/mean_execution_times.png)
![mean_execution_times](uploads/60cfc88fe5ca1856c328e0e03d8bb148/mean_execution_times.png){width=600}
### Speedup
The following plot shows the achieved speedups for each kernel and the overall combined compute time of all the kernels. For this latter value the sum of the time from all compute regions in the CPU is divided by the sum of the time from all kernels on the GPU.
![speedups](uploads/c3065bdd87048aabfdd0b8cef11716a1/speedups.png)
![speedups](uploads/c3065bdd87048aabfdd0b8cef11716a1/speedups.png){width=600}
From the plot, although evp_pre_loop and stress_tensor have the highest speedups, the overall compute speedup is not highly increased as these are relatively small kernels.