From f4c3fa6b7b45c7bd0735af163553122cb6b77715 Mon Sep 17 00:00:00 2001 From: Dominik Zobel <zobel@dkrz.de> Date: Thu, 20 Jun 2024 11:02:51 +0200 Subject: [PATCH] Divide memory model and processor techniques in two chapters (and restructure) --- lectures/memory-hierarchies/slides.qmd | 75 ++++++++++++-------------- 1 file changed, 35 insertions(+), 40 deletions(-) diff --git a/lectures/memory-hierarchies/slides.qmd b/lectures/memory-hierarchies/slides.qmd index ed95f0b..59e908a 100644 --- a/lectures/memory-hierarchies/slides.qmd +++ b/lectures/memory-hierarchies/slides.qmd @@ -167,33 +167,10 @@ Time in seconds using a 2 GB numpy array ($128 \times 128 \times 128 \times 128$ +# Memory Models -# Theory - -## Techniques - - - Caching - - Prefetching - - Branch prediction - - - -## Unavailable data - -If data is not available in the current memory level - - - register spilling (register -> cache) - - cache missing (cache -> main memory) - - page fault (main memory -> disk) - -Miss description - - - -## External Memory Model - -balance block size to get from next level to latency it takes to get it - + - Study effect of latencies, cache sizes, block sizes, ... + - Here just focus on latency ## Memory access time model (1/3) @@ -239,30 +216,30 @@ T_{avg,s} &= H_1 T_1 + ((1-H_1)\cdot H_2)\cdot(T_1+T_2)\\ -## Overview/Exercise +# Behind the scenes - - Introduction from base of pyramid (file access) - - example with access from disk and directly from memory - - background on memory hierarchy with focus on today instead of history +## Processor techniques - - Working our way up the pyramid - - reference values for Levante CPU - - example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`) - - openmp reduction (hand implementation), reference/continuation from parallelism lecture + - Caching + - Prefetching + - Branch prediction -## Observations +## Unavailable data - - Gap between processor and memory speeds. - Hierarchy needed because of discrepancy between speed of CPU and (main) memory - (include image) +If data is not available in the current memory level + + - register spilling (register -> cache) + - cache missing (cache -> main memory) + - page fault (main memory -> disk) + +Miss description - - exploit accessing data and code stored close to each other (temporal and spatial locality) -# Visualizations +# Visualization of memory hierarchies ## Memory Pyramid (upwards) @@ -507,6 +484,12 @@ $\approx$ Factor 20 between best and worst access ::: +## Hands-On {.handson} + + - Either example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`) + - Or OpenMP reduction (hand implementation), reference/continuation from parallelism lecture?! + + ## Different architectures @@ -517,6 +500,18 @@ $\approx$ Factor 20 between best and worst access +# Summary + +## Observations + + - Gap between processor and memory speeds. + Hierarchy needed because of discrepancy between speed of CPU and (main) memory + (include image) + + - exploit accessing data and code stored close to each other (temporal and spatial locality) + + + # Resources {.leftalign} - "Computer Systems: A Programmer's Perspective" by _R. Bryant_ and _D. O'Hallaron_, Pearson -- GitLab