From 8374e031d835d91e2e64700a1a6a8d585e4be5f5 Mon Sep 17 00:00:00 2001
From: jfe <git@jfengels.de>
Date: Fri, 3 May 2024 15:53:19 +0200
Subject: [PATCH] Add first draft of parallelism.

---
 _quarto.yml                     |   4 +-
 lectures/hardware/slides.qmd    |  54 +++++++++++
 lectures/parallelism/slides.qmd | 157 ++++++++++++++++++++++++++++++++
 3 files changed, 213 insertions(+), 2 deletions(-)
 create mode 100644 lectures/hardware/slides.qmd
 create mode 100644 lectures/parallelism/slides.qmd

diff --git a/_quarto.yml b/_quarto.yml
index fd031fa..2d41e32 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -36,8 +36,8 @@ website:
           - "lectures/user-experience/slides.qmd"
           # - "lectures/testing/slides.qmd"
           # - "lectures/git2/slides.qmd"
-          # - "lectures/parallelism/slides.qmd"
-          # - "lectures/hardware/slides.qmd"
+          - "lectures/parallelism/slides.qmd"
+          - "lectures/hardware/slides.qmd"
           # - "lectures/file-and-data-systems/slides.qmd"
           # - "lectures/memory-hierarchies/slides.qmd"
           # - "lectures/student-talks/slides.qmd"
diff --git a/lectures/hardware/slides.qmd b/lectures/hardware/slides.qmd
new file mode 100644
index 0000000..6d7db39
--- /dev/null
+++ b/lectures/hardware/slides.qmd
@@ -0,0 +1,54 @@
+---
+title: "Example lecture"
+author: "Tobias KÃ¶lling, Florian Ziemen, and the teams at MPI-M and DKRZ"
+---
+
+# Preface
+
+* This is an example lecture for the generic computing skills course
+
+## Idea
+*optimize output for **analysis** *
+
+::: {.smaller}
+(not write throughput)
+:::
+
+
+## chunking & hierarchy
+
+:::: {.columns}
+
+::: {.column width="50%"}
+|   Grid   |  Cells  |
+|---------:|--------:|
+| 1Â° by 1Â° |   0.06M |
+|    10 km |    5.1M |
+|     5 km |     20M |
+|     1 km |    510M |
+|   200  m |  12750M |
+:::
+
+::: {.column width="50%"}
+| Screen      |  Pixels |
+|------------:|--------:|
+| VGA         |    0.3M |
+| Full HD     |    2.1M |
+| MacBook 13' |    4.1M |
+| 4K          |    8.8M |
+| 8K          |   35.4M |
+:::
+
+::::
+
+It's **impossible** to look at the entire globe in full resolution.
+
+
+## Load data at the resolution necessary for the analysis
+
+![[GÃ³rski et al, 2022: The HEALPix primer](https://healpix.jpl.nasa.gov/pdf/intro.pdf)](https://easy.gems.dkrz.de/_images/gorski_f1.jpg)
+
+## Highlight! {background-color=var(--dark-bg-color)}
+
+* This slide is either important or has a special purpose.
+* You can use it to ask the audience a question or to start a hands-on session.
diff --git a/lectures/parallelism/slides.qmd b/lectures/parallelism/slides.qmd
new file mode 100644
index 0000000..7a29e47
--- /dev/null
+++ b/lectures/parallelism/slides.qmd
@@ -0,0 +1,157 @@
+---
+title: "Parallelism"
+author: "CF, GM, JFE FIXME"
+---
+
+# Motivation
+
+* We have a serial code and want to make it faster
+* Plan of action:
+  * Cut problem into smaller pieces
+  * Use independent compute resources for each piece
+* Outlook for next week: The individual computing element does no longer
+  get much faster, but there are more of them
+* FIXME: What else?
+
+
+
+## This lecture
+
+* Is mostly about parallelism as a concept
+* Next week: Hardware using this concept
+
+[comment]: # (Thinking about it, I think we should not give a theoretical definition here,
+but first give the example and explain parallelism there. Eventually, with the
+task-parallelism we should probably give a real definition and different flavours.)
+
+# Our example problem
+
+:::: {.columns}
+
+::: {.column width="50%"}
+* 1d Tsunami equation
+* Kortewegâ€“De Vries equation
+* Discretization not numerically accurate
+* [Wikipedia](https://en.wikipedia.org/wiki/Korteweg%E2%80%93De_Vries_equation)
+* FIXME
+:::
+
+::: {.column width="50%"}
+FIXME show plot of a soliton
+:::
+
+::::
+
+# Our example problem
+FIXME
+show some central loop
+
+# Decomposing problem domains
+## Our problem domain
+FIXME
+
+## Other problem domains
+* ICONs domain decomp
+* maybe something totally different?
+
+FIXME
+
+# Introducing OpenMP
+* A popular way to parallelize code
+* Pragma-based parallelization API
+  * You annotate your code with parallel regions and the compiler does the rest
+* OpenMP uses something called threads
+  * Wait until next week for a definition
+
+```c
+#pragma omp parallel for
+    for (int i = 0; i < N; ++i)
+        a[i] = 2 * i;
+```
+
+## Hands-on Session! {background-color=var(--dark-bg-color) .leftalign}
+
+1. Compile and run the example serially. Use `time ./serial.x` to time the execution.
+23. Compile and run the example using OpenMP. Use `OMP_NUM_THREADS=2 time ./omp.x` to time the execution.
+42. Now add
+   * `schedule(static,1)`
+   * `schedule(static,10)`
+   * `schedule(FIXMEsomethingelse)`
+   * `schedule(FIXMEsomethingelse)`
+and find out how the OpenMP runtime decomposes the problem domain.
+
+# Reductions FIXME title should be more generic
+## What is happening here?
+```c
+    int a[] = {2, 4, 6};
+    for (int i = 0; i < N; ++i)
+        sum = sum + a[i];
+```
+## What is happening here?
+```c
+    int a[] = {2, 4, 6};
+#pragma omp parallel for
+    for (int i = 0; i < N; ++i)
+        sum = sum + a[i];
+```
+[comment]: # (Can something go wrong?)
+
+## Solution
+```c
+    int a[] = {2, 4, 6};
+#pragma omp parallel for reduction(+:sum)
+    for (int i = 0; i < N; ++i)
+        sum = sum + a[i];
+```
+
+# Doing stuff wrong
+## What is going wrong here?
+```c
+    temp = 0;
+#pragma omp parallel for
+    for (int i = 0; i < N; ++i) {
+        temp = 2 * a[i];
+        b[i] = temp + 4;
+    }
+```
+## Solution
+```c
+    temp = 0;
+#pragma omp parallel for private(temp)
+    for (int i = 0; i < N; ++i) {
+        temp = 2 * a[i];
+        b[i] = temp + 4;
+    }
+```
+The problem is called "data race".
+
+## Other common errors
+* Race conditions
+  * The outcome of a program depends on the relative timing of multiple threads.
+* Deadlocks
+  * Multiple threads wait for a resource that cannot be fulfilled.
+* Inconsistency
+  * FIXME
+
+# Finally: A definition of parallelism
+"Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously."
+Wikipedia
+FIXME: Citation correct?!
+
+## Types of parallelism
+* Data parallelism (what we've been discussing)
+* Task parallelism (Example: Atmosphere ocean coupling)
+* Instruction level parallelism (see next week)
+
+## Summary: Preconditions for parallel execution
+FIXME, if we want to do that
+
+# FIXME
+* Homework:
+    * Do something where you run into hardware-constraints (i.e. Numa, too many threads, ...)
+    * Give some example with race condition or stuff and have them find it.
+* Add maybe:
+    * Are there theoretical concept like Amdahl, which we should explain? (I don't like Amdahl)
+    * Strong/weak scaling?
+
+
-- 
GitLab