Skip to content
Snippets Groups Projects
Commit e30cf802 authored by Georgiana Mania's avatar Georgiana Mania
Browse files

add content for strong/weak scaling; parallelism cond; summary; docs

parent 19ded8ab
No related branches found
No related tags found
1 merge request!72Draft: Compute devices lecture
Pipeline #68565 passed
...@@ -86,11 +86,66 @@ first ideas of strong/weak scaling. ...@@ -86,11 +86,66 @@ first ideas of strong/weak scaling.
# Scaling # Scaling
Bake mini-pancake dessert
:::: {.columns}
::: {.column width="25%"}
![](static/pancakes_stack.png){width=100%}
:::
::: {.column width="25%" }
:::{.fragment}
![](static/one_pancake.png){width=100%}
:::
:::
::: {.column width="25%"}
:::{.fragment}
![](static/four_pans_cake.png){width=100%}
:::
:::
::: {.column width="25%"}
:::{.fragment}
![](static/four_pancakes.png){width=100%}
:::
:::
::::
:::{.info .tiny}
Images generated by Pradipta Samanta with DALL-E
:::
## Strong vs weak scaling
:::{.smaller}
Starting with batter for $N$ pancakes and 1 pan, we can scale by using $P$ pans in two ways:
:::
:::{.fragment}
|Parameter / Scaling type |Strong|Weak|
|-------|---|---|
|Resources <br> (e.g. pans) | $P$| $P$ |
|Total workload <br> (e.g. pancake count)| $N$ | $P \times N$ |
|Workload per worker <br> (e.g. pancakes per pan) | $N/P$ | $N$ |
|Total time | $T_1 \times N/P$ | $T_1 \times N$ |
:::
<!--
* What happens if one uses more threads, but keep the problem size? * What happens if one uses more threads, but keep the problem size?
* Strong scaling * Strong scaling
* What happens if one uses more threads, but also increases the problem size by * What happens if one uses more threads, but also increases the problem size by
the same factor? the same factor?
* Weak scaling * Weak scaling
-->
# Reductions FIXME title should be more generic # Reductions FIXME title should be more generic
## What is happening here? ## What is happening here?
...@@ -142,8 +197,8 @@ The problem is called "data race". ...@@ -142,8 +197,8 @@ The problem is called "data race".
* The outcome of a program depends on the relative timing of multiple threads. * The outcome of a program depends on the relative timing of multiple threads.
* Deadlocks * Deadlocks
* Multiple threads wait for a resource that cannot be fulfilled. * Multiple threads wait for a resource that cannot be fulfilled.
* Inconsistency * Starvation
* FIXME * A thread is blocked indefinitely waiting on a resource
# Finally: A definition of parallelism # Finally: A definition of parallelism
"Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously." "Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously."
...@@ -155,8 +210,111 @@ FIXME: Citation correct?! ...@@ -155,8 +210,111 @@ FIXME: Citation correct?!
* Task parallelism (Example: Atmosphere ocean coupling) * Task parallelism (Example: Atmosphere ocean coupling)
* Instruction level parallelism (see next week) * Instruction level parallelism (see next week)
## Summary: Preconditions for parallel execution ## Precondition for parallel execution
FIXME, if we want to do that
*"Two consecutive instructions or code segments can be executed in parallel if they are **independent**."*
## Code and data dependence {.leftalign}
:::{.fragment .fade-in fragment-index=1}
* Data dependence - the instructions share the same data
:::
:::{.fragment .fade-in-then-semi-out fragment-index=2}
```c
a = b;
c = a + b; // flow dependence
```
:::
:::{.fragment .fade-in fragment-index=3}
* Control dependence - the order of the execution is defined at runtime
:::
:::{.fragment .fade-in-then-semi-out fragment-index=4}
```c
for (int i = 1; i < n ; i++) {
a[i] = (a[i-1] > a[i]) ? a[i] + 1 : 1;
}
```
:::
:::{.fragment .fade-in fragment-index=5}
* Resource dependence - the instructions share the same resource
:::
:::{.fragment .fade-in-then-semi-out fragment-index=6}
```c
a = b;
b = 42; // read after write: a has an old value
```
:::
## Bernstein's parallelism conditions {.alignleft}
For data dependence, use the Bernstein's conditions: *"the intersection between read-write set, write-read set and write-write set of instructions is null"*.
:::{.fragment}
```c
c = a + b; // S1
d = a - b; // S2
```
:::
:::{.smaller}
:::: {.columns}
::: {.column width="50%"}
:::{.fragment}
Read and write sets for S1 and S2:
$$
R_1 = \{a,b\} ; W_1 = \{c\} \\
R_2 = \{a,b\} ; W_2 = \{d\} \\
$$
:::
:::
::: {.column width="50%"}
:::{.fragment}
Bernstein's conditions:
$$
R_1 \cap W_2 = \emptyset \\
W_1 \cap R_2 = \emptyset \\
W_1 \cap W_2 = \emptyset
$$
:::
:::
:::
::::
:::{.fragment}
S1 and S2 can be executed in parallel!
:::
:::{.notes}
How about these two? replace a in S2 with c
```c
c = a + b; // S1
d = c - b; // S2
```
:::
## Best practices
::: {.incremental}
* Parallelisation should not change the results! Exceptions to be discussed next week!
* Limit the number of shared resources in parallel regions (less sync)
* Limit the amount of communication between executors
* Use efficient domain decomposition to avoid load imbalance (be aware when using I/O)
:::
# FIXME # FIXME
* Homework: * Homework:
...@@ -165,3 +323,7 @@ FIXME, if we want to do that ...@@ -165,3 +323,7 @@ FIXME, if we want to do that
* Have them discuss the concepts from the lecture using the metaphor of a kitchen workflow? * Have them discuss the concepts from the lecture using the metaphor of a kitchen workflow?
# Documentation
* "Computer Architecture - A Quantitative Approach" - J. Hennessy and D. Patterson
* "Introduction to High Performance Computing for Scientists and Engineers" - G. Hager and G. Wellein
\ No newline at end of file
lectures/parallelism/static/four_pancakes.png

326 KiB

lectures/parallelism/static/four_pans_cake.png

352 KiB

lectures/parallelism/static/one_pancake.png

218 KiB

lectures/parallelism/static/pancakes_stack.png

198 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment