Add new file

ebb93353 · Fabian Wachsmann · 2ca06d26 · ebb93353
Commit ebb93353 authored 3 months ago by Fabian Wachsmann
--- a/workshop/README
+++ b/workshop/README
+# Cloudify Workshop
+
+Building competence for the eerie.cloud data workflow.
+
+Links:
+- [Gitlab Repo](https://gitlab.dkrz.de/data-infrastructure-services/cloudify)
+- [Technical overview paper](https://pad.gwdg.de/OZo5HMC4R6iljvZHlo-BzQ?view#)
+- Access and usage:
+    - [Slideshow](https://pad.gwdg.de/HD0fHG-STCGzCqBtZCx8iQ?view#)
+    - [Notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html)
+- Ingest: [Lake house approach paper](https://pad.gwdg.de/gzXeJB85QTC6LNNlhA1A5A?both#)
+
+For the cloudify tarining on levante, start a jupyterhub server on either a compute (recommended) or an interactive node.
+
+## [Concepts](https://docs.google.com/presentation/d/1OrPWOZXAs0rRRdfomPABo1-rSaHyH8amlfLOIvRd48A/edit?usp=sharing)
+
+- Zarr - a (not only) cloud-optimized data format for ESM output
+- Benefits of cloud storages and why we not fully use it (yet)
+- Xpublish - the cloud data emulator with server-side processing
+
+## [Cloudify on Levante](https://gitlab.dkrz.de/data-infrastructure-services/cloudify/-/tree/main/workshop?ref_type=heads)
+
+- How to start an app
+- The various ways to access cloudified data through catalogs, xarray and cdo
+- Use-cases and preparations for a data server on the PB scale
+    - Server-side processing for lossy compression, rechunking and on-the-fly post-processing
+    - Large aggregations: Zarr becomes the *catalog* with kerchunked input and the kerchunk API
+    - SPOA for ingestions
+
+## [The eerie.cloud implementation](https://docs.google.com/presentation/d/1L7ehzS5O2n9O131MZdk47SIQflhuyMedJ4rwO5P3NvM/edit?usp=sharing)
+
+### Data preparation for ingestion
+
+- Requirements for a performant data server
+- Kerchunking: we create virtual datasets by extracting the storage chunks of netcdf and grib files, concat them and store the consolidated dataset in a lazy format based on parquet tables.
+- Catalogs: The virtual zarr datasets are collected in an intake catalog based on intake-xarray. This catalog is used for eerie.cloud ingestion.
+
+### Server setup
+
+- The openstack VM setting, Nginx and a xpublish plugin.
+
+### STAC API
+
+- The catalog infrastructure based on a mixture of static and dynamic STAC catalogs.
+
+## User guidance
+
+Live show
+
+**Navigation** through eerie.cloud with static stac catalogs in the web-browser using the stac-browser.
+
+**User guide** with the [easy gems notebook](https://easy.gems.dkrz.de/simulations/EERIE/eerie_data-access_online.html)
+
+**Applications**: Jupyterlite, Gridlook and a WPS based on a simple xarray API
+
+**Integrations** to be discussed:
+
+- Freva
+- Warmworld approach
+
+## Past events
+
+8.1. 13:00-15:00
+hybrid: room #23 or https://eu02web.zoom-x.de/j/9290696892?pwd=WElNS0xIMGp3ZERIRTlYdjR0U3ZaUT09 
+
+For DKRZ DM.
\ No newline at end of file