diff --git a/lectures/parallelism/slides.qmd b/lectures/parallelism/slides.qmd index 204e3565434e703c2dce178bcd58b95030a47aea..ce4f7891a134eac04c71f5dccf67fadf0efbf38a 100644 --- a/lectures/parallelism/slides.qmd +++ b/lectures/parallelism/slides.qmd @@ -86,11 +86,66 @@ first ideas of strong/weak scaling. # Scaling +Bake mini-pancake dessert + +:::: {.columns} + +::: {.column width="25%"} + +{width=100%} + +::: + +::: {.column width="25%" } + +:::{.fragment} +{width=100%} +::: +::: + +::: {.column width="25%"} +:::{.fragment} +{width=100%} +::: +::: + +::: {.column width="25%"} +:::{.fragment} +{width=100%} +::: +::: + +:::: + +:::{.info .tiny} +Images generated by Pradipta Samanta with DALL-E +::: + + + +## Strong vs weak scaling + +:::{.smaller} +Starting with batter for $N$ pancakes and 1 pan, we can scale by using $P$ pans in two ways: +::: + +:::{.fragment} +|Parameter / Scaling type |Strong|Weak| +|-------|---|---| +|Resources <br> (e.g. pans) | $P$| $P$ | +|Total workload <br> (e.g. pancake count)| $N$ | $P \times N$ | +|Workload per worker <br> (e.g. pancakes per pan) | $N/P$ | $N$ | +|Total time | $T_1 \times N/P$ | $T_1 \times N$ | +::: + + +<!-- * What happens if one uses more threads, but keep the problem size? * Strong scaling * What happens if one uses more threads, but also increases the problem size by the same factor? * Weak scaling + --> # Reductions FIXME title should be more generic ## What is happening here? @@ -142,8 +197,8 @@ The problem is called "data race". * The outcome of a program depends on the relative timing of multiple threads. * Deadlocks * Multiple threads wait for a resource that cannot be fulfilled. -* Inconsistency - * FIXME +* Starvation + * A thread is blocked indefinitely waiting on a resource # Finally: A definition of parallelism "Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously." @@ -155,8 +210,111 @@ FIXME: Citation correct?! * Task parallelism (Example: Atmosphere ocean coupling) * Instruction level parallelism (see next week) -## Summary: Preconditions for parallel execution -FIXME, if we want to do that +## Precondition for parallel execution + +*"Two consecutive instructions or code segments can be executed in parallel if they are **independent**."* + +## Code and data dependence {.leftalign} + + +:::{.fragment .fade-in fragment-index=1} +* Data dependence - the instructions share the same data +::: + +:::{.fragment .fade-in-then-semi-out fragment-index=2} +```c +a = b; +c = a + b; // flow dependence +``` +::: + +:::{.fragment .fade-in fragment-index=3} +* Control dependence - the order of the execution is defined at runtime +::: + +:::{.fragment .fade-in-then-semi-out fragment-index=4} +```c +for (int i = 1; i < n ; i++) { + a[i] = (a[i-1] > a[i]) ? a[i] + 1 : 1; +} +``` +::: + +:::{.fragment .fade-in fragment-index=5} +* Resource dependence - the instructions share the same resource +::: + +:::{.fragment .fade-in-then-semi-out fragment-index=6} +```c +a = b; +b = 42; // read after write: a has an old value +``` +::: + + + +## Bernstein's parallelism conditions {.alignleft} +For data dependence, use the Bernstein's conditions: *"the intersection between read-write set, write-read set and write-write set of instructions is null"*. + +:::{.fragment} +```c +c = a + b; // S1 +d = a - b; // S2 +``` +::: + +:::{.smaller} + +:::: {.columns} + +::: {.column width="50%"} + +:::{.fragment} + +Read and write sets for S1 and S2: +$$ +R_1 = \{a,b\} ; W_1 = \{c\} \\ +R_2 = \{a,b\} ; W_2 = \{d\} \\ +$$ + +::: +::: + +::: {.column width="50%"} + +:::{.fragment} +Bernstein's conditions: + +$$ +R_1 \cap W_2 = \emptyset \\ +W_1 \cap R_2 = \emptyset \\ +W_1 \cap W_2 = \emptyset +$$ +::: + +::: +::: +:::: + +:::{.fragment} +S1 and S2 can be executed in parallel! +::: + +:::{.notes} +How about these two? replace a in S2 with c +```c +c = a + b; // S1 +d = c - b; // S2 +``` +::: + +## Best practices +::: {.incremental} +* Parallelisation should not change the results! Exceptions to be discussed next week! +* Limit the number of shared resources in parallel regions (less sync) +* Limit the amount of communication between executors +* Use efficient domain decomposition to avoid load imbalance (be aware when using I/O) +::: # FIXME * Homework: @@ -165,3 +323,7 @@ FIXME, if we want to do that * Have them discuss the concepts from the lecture using the metaphor of a kitchen workflow? +# Documentation + +* "Computer Architecture - A Quantitative Approach" - J. Hennessy and D. Patterson +* "Introduction to High Performance Computing for Scientists and Engineers" - G. Hager and G. Wellein \ No newline at end of file diff --git a/lectures/parallelism/static/four_pancakes.png b/lectures/parallelism/static/four_pancakes.png new file mode 100644 index 0000000000000000000000000000000000000000..9bcfd13b38358306f7471a72f007a4688fa98d96 Binary files /dev/null and b/lectures/parallelism/static/four_pancakes.png differ diff --git a/lectures/parallelism/static/four_pans_cake.png b/lectures/parallelism/static/four_pans_cake.png new file mode 100644 index 0000000000000000000000000000000000000000..b8a9a0188954c340e2f4cc9180bfe72de2250dc1 Binary files /dev/null and b/lectures/parallelism/static/four_pans_cake.png differ diff --git a/lectures/parallelism/static/one_pancake.png b/lectures/parallelism/static/one_pancake.png new file mode 100644 index 0000000000000000000000000000000000000000..75ecd349cf7cca200d899cba531603abc2c990e2 Binary files /dev/null and b/lectures/parallelism/static/one_pancake.png differ diff --git a/lectures/parallelism/static/pancakes_stack.png b/lectures/parallelism/static/pancakes_stack.png new file mode 100644 index 0000000000000000000000000000000000000000..92c193dce965a3abcd22030610c59a6a337d8cba Binary files /dev/null and b/lectures/parallelism/static/pancakes_stack.png differ