Divide memory model and processor techniques in two chapters (and restructure)

f4c3fa6b · Dominik Zobel · 94c2ce8a · f4c3fa6b
Commit f4c3fa6b authored 10 months ago by Dominik Zobel
--- a/lectures/memory-hierarchies/slides.qmd
+++ b/lectures/memory-hierarchies/slides.qmd
@@ -167,33 +167,10 @@ Time in seconds using a 2 GB numpy array ($128 \times 128 \times 128 \times 128$



+# Memory Models

-# Theory
-
-## Techniques
-
- - Caching
- - Prefetching
- - Branch prediction
-
-
-
-## Unavailable data
-
-If data is not available in the current memory level
-
- - register spilling (register -> cache)
- - cache missing     (cache -> main memory)
- - page fault        (main memory -> disk)
-
-Miss description
-
-
-
-## External Memory Model
-
-balance block size to get from next level to latency it takes to get it
-
+ - Study effect of latencies, cache sizes, block sizes, ...
+ - Here just focus on latency


 ## Memory access time model (1/3)
@@ -239,30 +216,30 @@ T_{avg,s} &= H_1 T_1 + ((1-H_1)\cdot H_2)\cdot(T_1+T_2)\\



-## Overview/Exercise
+# Behind the scenes

- - Introduction from base of pyramid (file access)
-    - example with access from disk and directly from memory
-    - background on memory hierarchy with focus on today instead of history
+## Processor techniques

- - Working our way up the pyramid
-    - reference values for Levante CPU
-    - example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
-    - openmp reduction (hand implementation), reference/continuation from parallelism lecture
+ - Caching
+ - Prefetching
+ - Branch prediction



-## Observations
+## Unavailable data

- - Gap between processor and memory speeds.
-   Hierarchy needed because of discrepancy between speed of CPU and (main) memory
-   (include image)
+If data is not available in the current memory level
+
+ - register spilling (register -> cache)
+ - cache missing     (cache -> main memory)
+ - page fault        (main memory -> disk)
+
+Miss description

- - exploit accessing data and code stored close to each other (temporal and spatial locality)



-# Visualizations
+# Visualization of memory hierarchies

 ## Memory Pyramid (upwards)

@@ -507,6 +484,12 @@ $\approx$ Factor 20 between best and worst access
 :::


+## Hands-On {.handson}
+
+ - Either example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
+ - Or OpenMP reduction (hand implementation), reference/continuation from parallelism lecture?!
+
+

 ## Different architectures

@@ -517,6 +500,18 @@ $\approx$ Factor 20 between best and worst access



+# Summary
+
+## Observations
+
+ - Gap between processor and memory speeds.
+   Hierarchy needed because of discrepancy between speed of CPU and (main) memory
+   (include image)
+
+ - exploit accessing data and code stored close to each other (temporal and spatial locality)
+
+
+
 # Resources {.leftalign}

 - "Computer Systems: A Programmer's Perspective" by _R. Bryant_ and _D. O'Hallaron_, Pearson