@@ -61,20 +61,6 @@ A number of performance evaluations were conducted in order to assess the achiev
---
## Lessons Learned
- OpenACC multicore target build has similar performance to the OpenMP multicore target for the majority of the kernels. Locks and atomic updates on the multicore OpenMP target have worse performance when compared to OpenACC atomics.
- For bit identical results when comparing CPU and GPU results the compiler optimizations should be set to the minimum.
- Increment OpenACC porting guide (perhaps as a standalone document).
## Optimizations
### OCE_FCT
- Conceptual mismatch: AUX is defined to be an auxiliary array to save space, however the values written to it are meaningful to further calculations done in other parts of the code.
- AUX is a pointer to edge_up_dn_grad, whose first dimension has size 4. Inside of fct only the two first positions of the first dimensions are used, which creates a non-contiguous memory access between different threads.
- Merging two last kernels into one: Loop over the number of nodes and vertical levels, for each node perform the calculation of the original second kernel for #edges/#nodes edges.