From f4c3fa6b7b45c7bd0735af163553122cb6b77715 Mon Sep 17 00:00:00 2001
From: Dominik Zobel <zobel@dkrz.de>
Date: Thu, 20 Jun 2024 11:02:51 +0200
Subject: [PATCH] Divide memory model and processor techniques in two chapters
 (and restructure)

---
 lectures/memory-hierarchies/slides.qmd | 75 ++++++++++++--------------
 1 file changed, 35 insertions(+), 40 deletions(-)

diff --git a/lectures/memory-hierarchies/slides.qmd b/lectures/memory-hierarchies/slides.qmd
index ed95f0b..59e908a 100644
--- a/lectures/memory-hierarchies/slides.qmd
+++ b/lectures/memory-hierarchies/slides.qmd
@@ -167,33 +167,10 @@ Time in seconds using a 2 GB numpy array ($128 \times 128 \times 128 \times 128$
 
 
 
+# Memory Models
 
-# Theory
-
-## Techniques
-
- - Caching
- - Prefetching
- - Branch prediction
-
-
-
-## Unavailable data
-
-If data is not available in the current memory level
-
- - register spilling (register -> cache)
- - cache missing     (cache -> main memory)
- - page fault        (main memory -> disk)
-
-Miss description
-
-
-
-## External Memory Model
-
-balance block size to get from next level to latency it takes to get it
-
+ - Study effect of latencies, cache sizes, block sizes, ...
+ - Here just focus on latency
 
 
 ## Memory access time model (1/3)
@@ -239,30 +216,30 @@ T_{avg,s} &= H_1 T_1 + ((1-H_1)\cdot H_2)\cdot(T_1+T_2)\\
 
 
 
-## Overview/Exercise
+# Behind the scenes
 
- - Introduction from base of pyramid (file access)
-    - example with access from disk and directly from memory
-    - background on memory hierarchy with focus on today instead of history
+## Processor techniques
 
- - Working our way up the pyramid
-    - reference values for Levante CPU
-    - example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
-    - openmp reduction (hand implementation), reference/continuation from parallelism lecture
+ - Caching
+ - Prefetching
+ - Branch prediction
 
 
 
-## Observations
+## Unavailable data
 
- - Gap between processor and memory speeds.
-   Hierarchy needed because of discrepancy between speed of CPU and (main) memory
-   (include image)
+If data is not available in the current memory level
+
+ - register spilling (register -> cache)
+ - cache missing     (cache -> main memory)
+ - page fault        (main memory -> disk)
+
+Miss description
 
- - exploit accessing data and code stored close to each other (temporal and spatial locality)
 
 
 
-# Visualizations
+# Visualization of memory hierarchies
 
 ## Memory Pyramid (upwards)
 
@@ -507,6 +484,12 @@ $\approx$ Factor 20 between best and worst access
 :::
 
 
+## Hands-On {.handson}
+
+ - Either example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
+ - Or OpenMP reduction (hand implementation), reference/continuation from parallelism lecture?!
+
+
 
 ## Different architectures
 
@@ -517,6 +500,18 @@ $\approx$ Factor 20 between best and worst access
 
 
 
+# Summary
+
+## Observations
+
+ - Gap between processor and memory speeds.
+   Hierarchy needed because of discrepancy between speed of CPU and (main) memory
+   (include image)
+
+ - exploit accessing data and code stored close to each other (temporal and spatial locality)
+
+
+
 # Resources {.leftalign}
 
  - "Computer Systems: A Programmer's Perspective" by _R. Bryant_ and _D. O'Hallaron_, Pearson
-- 
GitLab