From d1b74a97d6611995e6fdc86839c1b1b2a6859003 Mon Sep 17 00:00:00 2001 From: Dominik Zobel <zobel@dkrz.de> Date: Wed, 26 Jun 2024 13:24:44 +0200 Subject: [PATCH] Update memory mountain slides and exercise --- lectures/memory-hierarchies/slides.qmd | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/lectures/memory-hierarchies/slides.qmd b/lectures/memory-hierarchies/slides.qmd index d7caad4..ba9f850 100644 --- a/lectures/memory-hierarchies/slides.qmd +++ b/lectures/memory-hierarchies/slides.qmd @@ -446,11 +446,18 @@ Based on a typical Levante node (AMD EPYC 7763) ## Memory Mountain (1/2) -Describe what is done +:::{.smaller} + +Code for program contained in "Computer Systems": + +<https://csapp.cs.cmu.edu/3e/mountain.tar> + +::: - - Influence of block size - - Influence of stride - - Measurements done on Levante compute node + - Process a representative amount of data + - Use `stride` between array elements to control spatial locality + - Use `size` of array to control temporal locality + - Also warm up the cache before the actual measurements @@ -469,8 +476,10 @@ $\approx$ Factor 20 between best and worst access ## Hands-On {.handson} - - Either example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`) - - Or OpenMP reduction (hand implementation), reference/continuation from parallelism lecture?! +1. Download and extract the C source code of the memory mountain program linked in the from previous slides +2. Compile the program and run it on your PC or a Levante compute node +3. Which factor do you get between best and worst performance? +4. (optional) Visualize your results -- GitLab