Skip to content
Snippets Groups Projects
Commit f4c3fa6b authored by Dominik Zobel's avatar Dominik Zobel
Browse files

Divide memory model and processor techniques in two chapters (and restructure)

parent 94c2ce8a
No related branches found
No related tags found
1 merge request!74Memory hierarchies lecture
......@@ -167,33 +167,10 @@ Time in seconds using a 2 GB numpy array ($128 \times 128 \times 128 \times 128$
# Memory Models
# Theory
## Techniques
- Caching
- Prefetching
- Branch prediction
## Unavailable data
If data is not available in the current memory level
- register spilling (register -> cache)
- cache missing (cache -> main memory)
- page fault (main memory -> disk)
Miss description
## External Memory Model
balance block size to get from next level to latency it takes to get it
- Study effect of latencies, cache sizes, block sizes, ...
- Here just focus on latency
## Memory access time model (1/3)
......@@ -239,30 +216,30 @@ T_{avg,s} &= H_1 T_1 + ((1-H_1)\cdot H_2)\cdot(T_1+T_2)\\
## Overview/Exercise
# Behind the scenes
- Introduction from base of pyramid (file access)
- example with access from disk and directly from memory
- background on memory hierarchy with focus on today instead of history
## Processor techniques
- Working our way up the pyramid
- reference values for Levante CPU
- example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
- openmp reduction (hand implementation), reference/continuation from parallelism lecture
- Caching
- Prefetching
- Branch prediction
## Observations
## Unavailable data
- Gap between processor and memory speeds.
Hierarchy needed because of discrepancy between speed of CPU and (main) memory
(include image)
If data is not available in the current memory level
- register spilling (register -> cache)
- cache missing (cache -> main memory)
- page fault (main memory -> disk)
Miss description
- exploit accessing data and code stored close to each other (temporal and spatial locality)
# Visualizations
# Visualization of memory hierarchies
## Memory Pyramid (upwards)
......@@ -507,6 +484,12 @@ $\approx$ Factor 20 between best and worst access
:::
## Hands-On {.handson}
- Either example with optimal and sub-optimal memory access, i.e. cache blocking (see `nproma`)
- Or OpenMP reduction (hand implementation), reference/continuation from parallelism lecture?!
## Different architectures
......@@ -517,6 +500,18 @@ $\approx$ Factor 20 between best and worst access
# Summary
## Observations
- Gap between processor and memory speeds.
Hierarchy needed because of discrepancy between speed of CPU and (main) memory
(include image)
- exploit accessing data and code stored close to each other (temporal and spatial locality)
# Resources {.leftalign}
- "Computer Systems: A Programmer's Perspective" by _R. Bryant_ and _D. O'Hallaron_, Pearson
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment