From 2cc6b5c62998b47a5b4b6fdde534132c04cae716 Mon Sep 17 00:00:00 2001 From: Florian Ziemen <ziemen@dkrz.de> Date: Tue, 25 Jun 2024 11:21:52 +0200 Subject: [PATCH] cleanup --- lectures/file-and-data-systems/slides.qmd | 47 ++++++++++++----------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/lectures/file-and-data-systems/slides.qmd b/lectures/file-and-data-systems/slides.qmd index aea306e..f13ad41 100644 --- a/lectures/file-and-data-systems/slides.qmd +++ b/lectures/file-and-data-systems/slides.qmd @@ -5,11 +5,11 @@ author: "Florian Ziemen and Karl-Hermann Wieners" # Storing data -* Recording of (more or less valuable) information in a medium +* Recording of information in a medium + * Deoxyribonucleic acid (DNA) + * Hand writing * Magnetic tapes * Hard disks - * Hand writing - * Deoxyribonucleic acid (DNA) # Topics @@ -46,8 +46,8 @@ Disk quotas for prj 30001639 (pid 30001639): ## Latency * How long does it take until we get the first bit of data? * Crucial when opening many small files (e.g. starting python) -* Less crucial when reading one big file start-to end. -* Largely determined by moving pieces in the storage medium. +* Less crucial when reading one big file start-to-end. +* Largely determined by moving parts in the storage medium. ## Continuous read / write @@ -62,7 +62,7 @@ Disk quotas for prj 30001639 (pid 30001639): ## Caching * Keeping data *in memory* for frequent re-use. * Usually storage media like disks have small caches with better properties. -* e.g. HDD of 16 TB with a 512 MB of RAM cache. +* e.g. HDD of 16 TB with 512 MB of RAM cache. * Operating systems also cache reads. * Caching writes in RAM is trouble because of the risk of data loss due to power loss / system crash. @@ -80,7 +80,7 @@ Disk quotas for prj 30001639 (pid 30001639): | Tape | minutes | 300 MB/s | minimal | ~ 5 | -* All figures based on a quick google search in 06/2024 +* All figures based on a quick google search in 06/2024. * RAM needs electricity to keep the data (*volatile memory*). * All but tape usually remain powered in an HPC. @@ -93,27 +93,27 @@ Disk quotas for prj 30001639 (pid 30001639): ## Solid-state disk/flash drives * Non-volatile electronic medium. - - keeping state (almost) without energy supply. + - Keeps state (almost) without energy supply. * High speed, also under random access. ## Hard disk -* Stacks of magnetic disks with read/write heads. -* Spinning to make every point accessible by heads. * Basically a stack of modern record players. +* Stack of magnetic disks with read/write heads. +* Spinning to make every point accessible by heads. * Good for bigger files, not ideal for random access. ## Tape -* Spools of magnetizable bands -* Serialized access only -* Backup / long-term storage +* Spool of magnetizable bands. +* Serialized access only. +* Used for backup / long-term storage. ## Hands-on {.handson} ::: {.smaller} {{< embed timer.ipynb echo=true >}} ::: -Take this set of calls, improve the code, and measure the write speed for different file sizes on your `/scratch/` +Take this set of calls, and measure the write speed for different file sizes on your `/scratch/` # Storage Architectures @@ -316,9 +316,10 @@ Add a similar function to the previous one for reading, and read the data you ju * Data is presented as immutable "objects" ("BLOB") * Each object has a globally unique identifier (eg. UUID or hash) * Objects may be assigned names, grouped in "buckets" -* Usually only supports creation ("put") and retrieval ("get") +* Generally supports creation ("put") and retrieval ("get"), can support much more (versioning, etc). * Focus on data distribution and replication, fast read access + ## Object storage -- metadata * Object metadata stored independent from data location @@ -350,10 +351,9 @@ Protection against * downtimes due to hardware failure ## Backups - -* Keep old states of the file system available -* Need at least as much space as the (compressed version of the) data being backuped -* Often low-freq full backups and hi-freq incremental backups<br> +* Keep old states of the file system available. +* Need at least as much space as the (compressed version of the) data being backuped. +* Often low-freq full backups and hi-freq incremental backups to balance space requirements and restoring time * Ideally at different locations * Automate them! @@ -365,7 +365,7 @@ Combining multiple harddisks into bigger / more secure combinations - often at c * RAID 0 distributes the blocks across all disks - more space, but data loss if one fails. * RAID 1 mirrors one disk on an identical copy. * RAID 5 is similar to 0, but with one extra disk for (distributed) parity info -* RAID 6 is similar to 5, but with two extro disks for parity info (levante uses 8+2 disks). +* RAID 6 is similar to 5, but with two extra disks for parity info (levante uses 8+2 disks). ## Erasure coding @@ -389,14 +389,15 @@ The file system becomes an independent system. * All nodes see the same set of files. * A set of central servers manages the file system. * All nodes accessing the lustre file system run local *clients*. +* Many nodes can write into the same file at the same time (MPI-IO). * Optimized for high traffic volumes in large files. ## Metadata and storage servers -* The index is spread over a group of Metadata servers (MDS, 8 for work on levante). +* The index is spread over a group of Metadata servers (MDS, 8 for /work on levante). * The files are spread over another group (40 OSS / 160 OST on levante). -* Every directory is tied to one MDS +* Every directory is tied to one MDS. * A file is tied to one or more OSTs. -* An OST contains many hard disks +* An OST contains many hard disks. ## The work file system of levante in context -- GitLab