Skip to content
Snippets Groups Projects
Commit 8374e031 authored by Jan Frederik Engels's avatar Jan Frederik Engels :new_moon: Committed by Georgiana Mania
Browse files

Add first draft of parallelism.

parent 2853724a
No related branches found
No related tags found
1 merge request!72Draft: Compute devices lecture
......@@ -36,8 +36,8 @@ website:
- "lectures/user-experience/slides.qmd"
# - "lectures/testing/slides.qmd"
# - "lectures/git2/slides.qmd"
# - "lectures/parallelism/slides.qmd"
# - "lectures/hardware/slides.qmd"
- "lectures/parallelism/slides.qmd"
- "lectures/hardware/slides.qmd"
# - "lectures/file-and-data-systems/slides.qmd"
# - "lectures/memory-hierarchies/slides.qmd"
# - "lectures/student-talks/slides.qmd"
......
---
title: "Example lecture"
author: "Tobias Kölling, Florian Ziemen, and the teams at MPI-M and DKRZ"
---
# Preface
* This is an example lecture for the generic computing skills course
## Idea
*optimize output for **analysis** *
::: {.smaller}
(not write throughput)
:::
## chunking & hierarchy
:::: {.columns}
::: {.column width="50%"}
| Grid | Cells |
|---------:|--------:|
| 1° by 1° | 0.06M |
| 10 km | 5.1M |
| 5 km | 20M |
| 1 km | 510M |
| 200 m | 12750M |
:::
::: {.column width="50%"}
| Screen | Pixels |
|------------:|--------:|
| VGA | 0.3M |
| Full HD | 2.1M |
| MacBook 13' | 4.1M |
| 4K | 8.8M |
| 8K | 35.4M |
:::
::::
It's **impossible** to look at the entire globe in full resolution.
## Load data at the resolution necessary for the analysis
![[Górski et al, 2022: The HEALPix primer](https://healpix.jpl.nasa.gov/pdf/intro.pdf)](https://easy.gems.dkrz.de/_images/gorski_f1.jpg)
## Highlight! {background-color=var(--dark-bg-color)}
* This slide is either important or has a special purpose.
* You can use it to ask the audience a question or to start a hands-on session.
---
title: "Parallelism"
author: "CF, GM, JFE FIXME"
---
# Motivation
* We have a serial code and want to make it faster
* Plan of action:
* Cut problem into smaller pieces
* Use independent compute resources for each piece
* Outlook for next week: The individual computing element does no longer
get much faster, but there are more of them
* FIXME: What else?
## This lecture
* Is mostly about parallelism as a concept
* Next week: Hardware using this concept
[comment]: # (Thinking about it, I think we should not give a theoretical definition here,
but first give the example and explain parallelism there. Eventually, with the
task-parallelism we should probably give a real definition and different flavours.)
# Our example problem
:::: {.columns}
::: {.column width="50%"}
* 1d Tsunami equation
* Korteweg–De Vries equation
* Discretization not numerically accurate
* [Wikipedia](https://en.wikipedia.org/wiki/Korteweg%E2%80%93De_Vries_equation)
* FIXME
:::
::: {.column width="50%"}
FIXME show plot of a soliton
:::
::::
# Our example problem
FIXME
show some central loop
# Decomposing problem domains
## Our problem domain
FIXME
## Other problem domains
* ICONs domain decomp
* maybe something totally different?
FIXME
# Introducing OpenMP
* A popular way to parallelize code
* Pragma-based parallelization API
* You annotate your code with parallel regions and the compiler does the rest
* OpenMP uses something called threads
* Wait until next week for a definition
```c
#pragma omp parallel for
for (int i = 0; i < N; ++i)
a[i] = 2 * i;
```
## Hands-on Session! {background-color=var(--dark-bg-color) .leftalign}
1. Compile and run the example serially. Use `time ./serial.x` to time the execution.
23. Compile and run the example using OpenMP. Use `OMP_NUM_THREADS=2 time ./omp.x` to time the execution.
42. Now add
* `schedule(static,1)`
* `schedule(static,10)`
* `schedule(FIXMEsomethingelse)`
* `schedule(FIXMEsomethingelse)`
and find out how the OpenMP runtime decomposes the problem domain.
# Reductions FIXME title should be more generic
## What is happening here?
```c
int a[] = {2, 4, 6};
for (int i = 0; i < N; ++i)
sum = sum + a[i];
```
## What is happening here?
```c
int a[] = {2, 4, 6};
#pragma omp parallel for
for (int i = 0; i < N; ++i)
sum = sum + a[i];
```
[comment]: # (Can something go wrong?)
## Solution
```c
int a[] = {2, 4, 6};
#pragma omp parallel for reduction(+:sum)
for (int i = 0; i < N; ++i)
sum = sum + a[i];
```
# Doing stuff wrong
## What is going wrong here?
```c
temp = 0;
#pragma omp parallel for
for (int i = 0; i < N; ++i) {
temp = 2 * a[i];
b[i] = temp + 4;
}
```
## Solution
```c
temp = 0;
#pragma omp parallel for private(temp)
for (int i = 0; i < N; ++i) {
temp = 2 * a[i];
b[i] = temp + 4;
}
```
The problem is called "data race".
## Other common errors
* Race conditions
* The outcome of a program depends on the relative timing of multiple threads.
* Deadlocks
* Multiple threads wait for a resource that cannot be fulfilled.
* Inconsistency
* FIXME
# Finally: A definition of parallelism
"Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously."
Wikipedia
FIXME: Citation correct?!
## Types of parallelism
* Data parallelism (what we've been discussing)
* Task parallelism (Example: Atmosphere ocean coupling)
* Instruction level parallelism (see next week)
## Summary: Preconditions for parallel execution
FIXME, if we want to do that
# FIXME
* Homework:
* Do something where you run into hardware-constraints (i.e. Numa, too many threads, ...)
* Give some example with race condition or stuff and have them find it.
* Add maybe:
* Are there theoretical concept like Amdahl, which we should explain? (I don't like Amdahl)
* Strong/weak scaling?
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment