Skip to content
Snippets Groups Projects
Commit ebb93353 authored by Fabian Wachsmann's avatar Fabian Wachsmann
Browse files

Add new file

parent 2ca06d26
No related branches found
No related tags found
No related merge requests found
Pipeline #92614 failed
# Cloudify Workshop
Building competence for the eerie.cloud data workflow.
Links:
- [Gitlab Repo](https://gitlab.dkrz.de/data-infrastructure-services/cloudify)
- [Technical overview paper](https://pad.gwdg.de/OZo5HMC4R6iljvZHlo-BzQ?view#)
- Access and usage:
- [Slideshow](https://pad.gwdg.de/HD0fHG-STCGzCqBtZCx8iQ?view#)
- [Notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html)
- Ingest: [Lake house approach paper](https://pad.gwdg.de/gzXeJB85QTC6LNNlhA1A5A?both#)
For the cloudify tarining on levante, start a jupyterhub server on either a compute (recommended) or an interactive node.
## [Concepts](https://docs.google.com/presentation/d/1OrPWOZXAs0rRRdfomPABo1-rSaHyH8amlfLOIvRd48A/edit?usp=sharing)
- Zarr - a (not only) cloud-optimized data format for ESM output
- Benefits of cloud storages and why we not fully use it (yet)
- Xpublish - the cloud data emulator with server-side processing
## [Cloudify on Levante](https://gitlab.dkrz.de/data-infrastructure-services/cloudify/-/tree/main/workshop?ref_type=heads)
- How to start an app
- The various ways to access cloudified data through catalogs, xarray and cdo
- Use-cases and preparations for a data server on the PB scale
- Server-side processing for lossy compression, rechunking and on-the-fly post-processing
- Large aggregations: Zarr becomes the *catalog* with kerchunked input and the kerchunk API
- SPOA for ingestions
## [The eerie.cloud implementation](https://docs.google.com/presentation/d/1L7ehzS5O2n9O131MZdk47SIQflhuyMedJ4rwO5P3NvM/edit?usp=sharing)
### Data preparation for ingestion
- Requirements for a performant data server
- Kerchunking: we create virtual datasets by extracting the storage chunks of netcdf and grib files, concat them and store the consolidated dataset in a lazy format based on parquet tables.
- Catalogs: The virtual zarr datasets are collected in an intake catalog based on intake-xarray. This catalog is used for eerie.cloud ingestion.
### Server setup
- The openstack VM setting, Nginx and a xpublish plugin.
### STAC API
- The catalog infrastructure based on a mixture of static and dynamic STAC catalogs.
## User guidance
Live show
**Navigation** through eerie.cloud with static stac catalogs in the web-browser using the stac-browser.
**User guide** with the [easy gems notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html)
**Applications**: Jupyterlite, Gridlook and a WPS based on a simple xarray API
**Integrations** to be discussed:
- Freva
- Warmworld approach
## Past events
8.1. 13:00-15:00
hybrid: room #23 or https://eu02web.zoom-x.de/j/9290696892?pwd=WElNS0xIMGp3ZERIRTlYdjR0U3ZaUT09
For DKRZ DM.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment