-
Fabian Wachsmann authoredFabian Wachsmann authored
README 2.66 KiB
# Cloudify Workshop Building competence for the eerie.cloud data workflow. Links: - [Gitlab Repo](https://gitlab.dkrz.de/data-infrastructure-services/cloudify) - [Technical overview paper](https://pad.gwdg.de/OZo5HMC4R6iljvZHlo-BzQ?view#) - Access and usage: - [Slideshow](https://pad.gwdg.de/HD0fHG-STCGzCqBtZCx8iQ?view#) - [Notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html) - Ingest: [Lake house approach paper](https://pad.gwdg.de/gzXeJB85QTC6LNNlhA1A5A?both#) For the cloudify tarining on levante, start a jupyterhub server on either a compute (recommended) or an interactive node. ## [Concepts](https://docs.google.com/presentation/d/1OrPWOZXAs0rRRdfomPABo1-rSaHyH8amlfLOIvRd48A/edit?usp=sharing) - Zarr - a (not only) cloud-optimized data format for ESM output - Benefits of cloud storages and why we not fully use it (yet) - Xpublish - the cloud data emulator with server-side processing ## [Cloudify on Levante](https://gitlab.dkrz.de/data-infrastructure-services/cloudify/-/tree/main/workshop?ref_type=heads) - How to start an app - The various ways to access cloudified data through catalogs, xarray and cdo - Use-cases and preparations for a data server on the PB scale - Server-side processing for lossy compression, rechunking and on-the-fly post-processing - Large aggregations: Zarr becomes the *catalog* with kerchunked input and the kerchunk API - SPOA for ingestions ## [The eerie.cloud implementation](https://docs.google.com/presentation/d/1L7ehzS5O2n9O131MZdk47SIQflhuyMedJ4rwO5P3NvM/edit?usp=sharing) ### Data preparation for ingestion - Requirements for a performant data server - Kerchunking: we create virtual datasets by extracting the storage chunks of netcdf and grib files, concat them and store the consolidated dataset in a lazy format based on parquet tables. - Catalogs: The virtual zarr datasets are collected in an intake catalog based on intake-xarray. This catalog is used for eerie.cloud ingestion. ### Server setup - The openstack VM setting, Nginx and a xpublish plugin. ### STAC API - The catalog infrastructure based on a mixture of static and dynamic STAC catalogs. ## User guidance Live show **Navigation** through eerie.cloud with static stac catalogs in the web-browser using the stac-browser. **User guide** with the [easy gems notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html) **Applications**: Jupyterlite, Gridlook and a WPS based on a simple xarray API **Integrations** to be discussed: - Freva - Warmworld approach ## Past events 8.1. 13:00-15:00 hybrid: room #23 or https://eu02web.zoom-x.de/j/9290696892?pwd=WElNS0xIMGp3ZERIRTlYdjR0U3ZaUT09 For DKRZ DM.