Changes

Wilton Jaciel Loch · b02615a8
--- a/Home/FESOM-GPU-porting/Tracer-Advection-Dwarf:-STORM.md
+++ b/Home/FESOM-GPU-porting/Tracer-Advection-Dwarf:-STORM.md
@@ -47,7 +47,7 @@ Times collected for the most expensive parallel regions:

 The first metric is the execution time of each of the parallel regions, which is shown ion the following plot. All times are in milliseconds.

-![mean_execution_times](uploads/6db53a7ed007bd2d0b1cd58c97317632/mean_execution_times.png)
+![mean_execution_times](uploads/6db53a7ed007bd2d0b1cd58c97317632/mean_execution_times.png){width=600}

 The version with the loops collapsed, when built for the CPU, presents in 3 of the 5 parallel regions a reduction in total time when compared with the baseline built for the CPU target. This is in contrast with the data obtained from the CORE2 tracer dwarf performance evaluation where most of the kernels suffered from an increase in CPU time  when moving from the baseline to the version with the loops collapsed. A similar improvement and degradation behavior with the collapsing of the loops is observed in the “OpenMP vs OpenACC” performance evaluation (improvement for UPWH and UPWV with degradation for MFCT and QR4C) for both tools.

@@ -55,4 +55,4 @@ The version with the loops collapsed, when built for the CPU, presents in 3 of t

 The following plot shows the achieved speedups for each kernel/parallel region, the overall combined compute time of all the kernels and the total dwarf time (including MPI communication time). Compute time is defined as the sum of all kernel times.

-![speedups](uploads/cfe3da21055f386ef811167d06f0bc62/speedups.png)
\ No newline at end of file
+![speedups](uploads/cfe3da21055f386ef811167d06f0bc62/speedups.png){width=600}
\ No newline at end of file