Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
L
lecture materials
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
generic software skills
lecture materials
Commits
2cc6b5c6
Commit
2cc6b5c6
authored
10 months ago
by
Florian Ziemen
Browse files
Options
Downloads
Patches
Plain Diff
cleanup
parent
c90e19b0
No related branches found
No related tags found
1 merge request
!11
File and Data Systems
Pipeline
#71521
passed
10 months ago
Stage: test
Stage: build
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
lectures/file-and-data-systems/slides.qmd
+24
-23
24 additions, 23 deletions
lectures/file-and-data-systems/slides.qmd
with
24 additions
and
23 deletions
lectures/file-and-data-systems/slides.qmd
+
24
−
23
View file @
2cc6b5c6
...
...
@@ -5,11 +5,11 @@ author: "Florian Ziemen and Karl-Hermann Wieners"
# Storing data
* Recording of (more or less valuable) information in a medium
* Recording of information in a medium
* Deoxyribonucleic acid (DNA)
* Hand writing
* Magnetic tapes
* Hard disks
* Hand writing
* Deoxyribonucleic acid (DNA)
# Topics
...
...
@@ -46,8 +46,8 @@ Disk quotas for prj 30001639 (pid 30001639):
## Latency
* How long does it take until we get the first bit of data?
* Crucial when opening many small files (e.g. starting python)
* Less crucial when reading one big file start-to
end.
* Largely determined by moving p
iece
s in the storage medium.
* Less crucial when reading one big file start-to
-
end.
* Largely determined by moving p
art
s in the storage medium.
## Continuous read / write
...
...
@@ -62,7 +62,7 @@ Disk quotas for prj 30001639 (pid 30001639):
## Caching
* Keeping data *in memory* for frequent re-use.
* Usually storage media like disks have small caches with better properties.
* e.g. HDD of 16 TB with
a
512 MB of RAM cache.
* e.g. HDD of 16 TB with 512 MB of RAM cache.
* Operating systems also cache reads.
* Caching writes in RAM is trouble because of the risk of data loss due to power loss / system crash.
...
...
@@ -80,7 +80,7 @@ Disk quotas for prj 30001639 (pid 30001639):
| Tape | minutes | 300 MB/s | minimal | ~ 5 |
* All figures based on a quick google search in 06/2024
* All figures based on a quick google search in 06/2024
.
* RAM needs electricity to keep the data (*volatile memory*).
* All but tape usually remain powered in an HPC.
...
...
@@ -93,27 +93,27 @@ Disk quotas for prj 30001639 (pid 30001639):
## Solid-state disk/flash drives
* Non-volatile electronic medium.
-
k
eep
ing
state (almost) without energy supply.
-
K
eep
s
state (almost) without energy supply.
* High speed, also under random access.
## Hard disk
* Stacks of magnetic disks with read/write heads.
* Spinning to make every point accessible by heads.
* Basically a stack of modern record players.
* Stack of magnetic disks with read/write heads.
* Spinning to make every point accessible by heads.
* Good for bigger files, not ideal for random access.
## Tape
* Spool
s
of magnetizable bands
* Serialized access only
*
B
ackup / long-term storage
* Spool of magnetizable bands
.
* Serialized access only
.
*
Used for b
ackup / long-term storage
.
## Hands-on {.handson}
::: {.smaller}
{{< embed timer.ipynb echo=true >}}
:::
Take this set of calls,
improve the code,
and measure the write speed for different file sizes on your `/scratch/`
Take this set of calls, and measure the write speed for different file sizes on your `/scratch/`
# Storage Architectures
...
...
@@ -316,9 +316,10 @@ Add a similar function to the previous one for reading, and read the data you ju
* Data is presented as immutable "objects" ("BLOB")
* Each object has a globally unique identifier (eg. UUID or hash)
* Objects may be assigned names, grouped in "buckets"
*
Usually on
ly supports creation ("put") and retrieval ("get")
*
General
ly supports creation ("put") and retrieval ("get")
, can support much more (versioning, etc).
* Focus on data distribution and replication, fast read access
## Object storage -- metadata
* Object metadata stored independent from data location
...
...
@@ -350,10 +351,9 @@ Protection against
* downtimes due to hardware failure
## Backups
* Keep old states of the file system available
* Need at least as much space as the (compressed version of the) data being backuped
* Often low-freq full backups and hi-freq incremental backups<br>
* Keep old states of the file system available.
* Need at least as much space as the (compressed version of the) data being backuped.
* Often low-freq full backups and hi-freq incremental backups
to balance space requirements and restoring time
* Ideally at different locations
* Automate them!
...
...
@@ -365,7 +365,7 @@ Combining multiple harddisks into bigger / more secure combinations - often at c
* RAID 0 distributes the blocks across all disks - more space, but data loss if one fails.
* RAID 1 mirrors one disk on an identical copy.
* RAID 5 is similar to 0, but with one extra disk for (distributed) parity info
* RAID 6 is similar to 5, but with two extr
o
disks for parity info (levante uses 8+2 disks).
* RAID 6 is similar to 5, but with two extr
a
disks for parity info (levante uses 8+2 disks).
## Erasure coding
...
...
@@ -389,14 +389,15 @@ The file system becomes an independent system.
* All nodes see the same set of files.
* A set of central servers manages the file system.
* All nodes accessing the lustre file system run local *clients*.
* Many nodes can write into the same file at the same time (MPI-IO).
* Optimized for high traffic volumes in large files.
## Metadata and storage servers
* The index is spread over a group of Metadata servers (MDS, 8 for work on levante).
* The index is spread over a group of Metadata servers (MDS, 8 for
/
work on levante).
* The files are spread over another group (40 OSS / 160 OST on levante).
* Every directory is tied to one MDS
* Every directory is tied to one MDS
.
* A file is tied to one or more OSTs.
* An OST contains many hard disks
* An OST contains many hard disks
.
## The work file system of levante in context
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment