Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • data-infrastructure-services/cloudify
1 result
Show changes
Commits on Source (2)
%% Cell type:markdown id:13ba06b9-5f2e-4de3-90da-511557166bfe tags:
# Cloudify
This notebook series guides you through the *cloudify* service: Serving Xarray datasets as zarr-datasets with xpublish and enabled server-side processing with dask. It introduces to the basic concepts with some examples. It was designed to work on DKRZ's HPC.
%% Cell type:markdown id:a56b764b-23a7-4d86-84b2-bf419d989cb2 tags:
## 1. Start an app
In the following, you will learn how to start and control the cloudify service.
**Is there any other reason why to run cloudify on the only internally accessible DKRZ HPC?**
If you *cloudify* a virtual dataset prepared as a highly aggregated, analysis-ready dataset, clients can subset from this *one* large aggregated dataset instead of searching the file system.
%% Cell type:markdown id:0b17b450-716d-49b3-bbfa-aec943e47120 tags:
1. Install a kernel for jupyterhub
```bash
source activate /work/bm0021/conda-envs/cloudify
python -m ipykernel install --user --name cloudify_env
```
- Choose the kernel
%% Cell type:markdown id:8cfb6129-aea7-4d87-a016-e04cee5bf084 tags:
2. For being able to allow secure *https* access, we need a ssl certificate. For testing purposes and for levante, we can use a self-signed one. Additionally, right now, some applications do only allow access through https. We can create it like this:
%% Cell type:code id:0a748c3e-2a25-40ea-aefc-ae40bc13f664 tags:
``` python
cn=!echo $HOSTNAME
cn=cn[0]
cn
```
%% Output
'l40356.lvt.dkrz.de'
%% Cell type:code id:d5e47e26-93ac-465f-90a4-8d84762b1f80 tags:
``` python
#!openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes -subj "/C=XX/ST=Hamburg/L=Hamburg/O=Test/OU=Test/CN=localhost"
!openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes -subj "/C=XX/ST=Hamburg/L=Hamburg/O=Test/OU=Test/CN="{cn}
```
%% Output
....+..+.......+..+.+.....+....+...+.....+.+............+.....+...+.+.....+..........+......+...+++++++++++++++++++++++++++++++++++++++++++++*.....+.+.....+....+..+.......+++++++++++++++++++++++++++++++++++++++++++++*...........+...+.+.....+.+.....+....+..+.......+..+..........+...+.......................+.............+.....+....+......+.....+....+...+........+.+.....+.........+.+........................+...........+...................+..+....+...+...+............+......+...........+.........+...............+......+.+............+.................................+.........+...+..+..................+.+..+....+...+.....+...+++++
......+.....+.......+..+.........+.........+.+......+...+...........+.+...+.....+++++++++++++++++++++++++++++++++++++++++++++*.+.+...+......+...............+++++++++++++++++++++++++++++++++++++++++++++*....+.....+.+...+......+......+..+.......+...+.....+......+.......+...............+.....+............+...+....+...+...+...+...............+.....+.+.........+..+.........................+...+..+.........+.+...+..+.........+......+...+..........+.....+.+.....+...+............+...+...+...+....+...+..+............+....+.....+.......+...+..+............+......+..........+............+.........+........+..........+..+......+....+...........+...+....+...+.....+....+........+.......+........+......+.+...+..+.....................+....+..............+....+......+.........+..+...............+...+.+..+...................+.....+....+.....+.+...........+...............+...............+.......+...+.........+...+.......................+..........+..............+....+.....+.........+...+...............+...+.....................+....+.........+.....+...+....+......+...+............+.....+...+....+...........+.+...+..+.+.................+..........+..+......+...........................+.+.....................+......+..+...+..............................+....+...+........+.............+......+......+..+......+...............+.+........+......+.+.....+..................+.+...............+..+..........+.....+....+...+..................+...............+..+..........+........+...+....+..+....+.........+..+....+..............+...+...+......................+............+............+..............................+........+...+...+.+...+.....+...+++++
-----
%% Cell type:markdown id:190bde7f-6f36-4c87-a9f2-a82ee840302e tags:
3. We write a cloudify script for data serving and start to host an example dataset in a background process. We need to consider some settings:
**Port**
The resulting service listens on a specifc *port*. In case we share a node, we can only use ports that are not allocated already. To enbale us all to run an own app, we agree to use a port `90XX` where XX are the last two digits of our account.
**Dask Cluster**
Dask is necessary for lazy access of the data. Additionally, a dask cluster can help us to do server-side processing like uniform encoding. When starting the imported predefined dask cluster, it will use the following resources:
```python
n_workers=2,
threads_per_worker=8,
memory_limit="16GB"
```
which should be sufficient for at least two clients in parallel. We store it in an environment variable so that xpublish can find it. We futhermore have to allign the two event loops of dask and xpublish's asyncio with `nest_asyncio.apply()`. Event loops can be seen as *while* loops for a permanently running main worker.
**Plug-ins**
Xpublish finds pre-installed plugins like the intake-plugin by itself. Own plugins need to be registered.
Further settings will be discussed later.
%% Cell type:code id:e11d309f-c893-401a-ba5f-9f3f0046e039 tags:
``` python
xpublish_example_script="xpublish_example.py"
```
%% Cell type:code id:571a82ea-d7bc-42e3-8169-ae22ef999065 tags:
``` python
%%writefile {xpublish_example_script}
port=9010
ssl_keyfile="/work/bm0021/k204210/cloudify/workshop/key.pem"
ssl_certfile="/work/bm0021/k204210/cloudify/workshop/cert.pem"
from cloudify.plugins.stacer import *
from cloudify.plugins.geoanimation import *
from cloudify.utils.daskhelper import *
import xarray as xr
import xpublish as xp
import asyncio
import nest_asyncio
import sys
import os
nest_asyncio.apply()
chunks={}
for coord in ["lon","lat"]:
chunk_size=os.environ.get(f"XPUBLISH_{coord.upper()}_CHUNK_SIZE",None)
if chunk_size:
chunks[coord]=int(chunk_size)
l_lossy=os.environ.get("L_LOSSY",False)
def lossy_compress(partds):
import numcodecs
rounding = numcodecs.BitRound(keepbits=12)
return rounding.decode(rounding.encode(partds))
if __name__ == "__main__": # This avoids infinite subprocess creation
import dask
zarrcluster = asyncio.get_event_loop().run_until_complete(get_dask_cluster())
os.environ["ZARR_ADDRESS"]=zarrcluster.scheduler._address
dsname=sys.argv[1]
glob_inp=sys.argv[2:]
dsdict={}
ds=xr.open_mfdataset(
glob_inp,
compat="override",
coords="minimal",
chunks=chunks
chunks=chunks,
)
if "height" in ds:
del ds["height"]
for dv in ds.variables:
if "time" in dv:
ds[dv]=ds[dv].load()
ds[dv].encoding["dtype"] = "float64"
ds[dv].encoding["compressor"] = None
ds=ds.set_coords([a for a in ds.data_vars if "bnds" in a])
if l_lossy:
ds = xr.apply_ufunc(
lossy_compress,
ds,
dask="parallelized",
keep_attrs="drop_conflicts"
)
dsdict[dsname]=ds
collection = xp.Rest(dsdict)
collection.register_plugin(Stac())
collection.register_plugin(PlotPlugin())
collection.serve(
host="0.0.0.0",
port=port,
ssl_keyfile=ssl_keyfile,
ssl_certfile=ssl_certfile
)
```
%% Output
Overwriting xpublish_example.py
%% Cell type:markdown id:ca64c11f-0846-4ddd-9e60-4b22dba8b32c tags:
You can run this app e.g. for:
```
dsname="example"
glob_inp="/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp370/r1i1p1f1/Amon/tas/gn/v20190710/*.nc"
```
by applying:
%% Cell type:code id:5da13d6b-05f1-4b3b-aecd-1ac3bb635526 tags:
``` python
%%bash --bg
#Cannot use variables from python script here so it is all hard-coded
source activate /work/bm0021/conda-envs/cloudify
python xpublish_example.py \
example \
/work/ik1017/CMIP6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp370/r1i1p1f1/Amon/tas/gn/v20190710/*.nc
```
%% Cell type:markdown id:634d1952-43a9-40a7-b7c3-9bbff5f07081 tags:
### Stop a running app
Let us try to just run **one** app at the time. Otherwise, we would have multiple ports and dask clusters. It wouldnt end up well.
You can check for the main *cloudify* processes by finding the dask workers. In a next step, you can *kill* by ID.
%% Cell type:code id:9a43c4ce-be08-4493-8dd5-a3789f8c0647 tags:
``` python
!ps -ef | grep cloudify
```
%% Output
k204210 52510 4121939 0 10:12 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-c6562e74-6d50-4cf3-92d0-27e4f4cae6f8.json
k204210 54077 52595 0 10:13 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -c from multiprocessing.resource_tracker import main;main(27)
k204210 54079 52595 2 10:13 ? 00:00:06 /work/bm0021/conda-envs/cloudify/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=28, pipe_handle=38) --multiprocessing-fork
k204210 54082 52595 2 10:13 ? 00:00:06 /work/bm0021/conda-envs/cloudify/bin/python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=28, pipe_handle=38) --multiprocessing-fork
k204210 55387 52510 0 10:17 pts/18 00:00:00 /bin/bash -c ps -ef | grep cloudify
k204210 55389 55387 0 10:17 pts/18 00:00:00 grep cloudify
k204210 4125442 4121939 0 08:57 ? 00:00:06 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-e22a5303-3b28-41cb-b092-ed92a8ff6221.json
k204210 4125444 4121939 0 08:57 ? 00:00:01 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-f2ebf347-166b-4e0f-89ee-e7e663b08a4c.json
k204210 4125452 4121939 0 08:57 ? 00:00:01 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-33489dce-534b-42ed-bbe7-6d125e3f6167.json
k204210 4125453 4121939 0 08:57 ? 00:00:01 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-16ae079b-b9fb-4460-bc7c-808797637e88.json
k204210 1885374 1878744 0 09:31 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-449f1aed-0c01-4339-8e8e-3391add9c830.json
k204210 1885397 1878744 0 09:31 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-59fedbd8-7d7f-424a-8226-3980eadf7fc6.json
k204210 1886037 1878744 1 09:35 ? 00:00:21 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-1a921a3d-65a6-4b07-9c29-d17f990ab11b.json
k204210 1894380 1878744 17 09:58 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-1d363456-8c8d-40f5-91ea-1e931e364b72.json
k204210 1894407 1894380 0 09:58 pts/3 00:00:00 /bin/bash -c ps -ef | grep cloudify
k204210 1894411 1894410 0 09:58 ? 00:00:00 /sw/spack-levante/jupyterhub/jupyterhub/bin/python /sw/spack-levante/jupyterhub/jupyterhub/bin/conda shell.posix activate /work/bm0021/conda-envs/cloudify
k204210 1894413 1894407 0 09:58 pts/3 00:00:00 grep cloudify
%% Cell type:markdown id:5b505b0a-2b48-4eb3-8c91-fb6b7fcdc54b tags:
**Important note:**
If you plan to continue with another notebook, do not stop the app now.
%% Cell type:code id:af33c134-f4ba-42f7-9687-7bb9948d5dfe tags:
``` python
!kill 52595
!kill 1882536
```
%% Output
/bin/bash: line 0: kill: (1882536) - No such process
%% Cell type:code id:feacaed0-df8d-4e52-af8c-acd094cac6f4 tags:
``` python
```
......
This diff is collapsed.
%% Cell type:markdown id:1bc2dd78-97d7-4b39-bf35-03dcacc767f9 tags:
 
## 4. Kerchunk input data and the kerchunk API
 
Within this series, we cannot explain how kerchunking works. For now, it is only important to understand that it leverages the zarr benefits of both small memory requirements for opening as well as consolidated metadata for virtual aggregation.
 
We now design the script such that it
- opens *kerchunk* references instead of files
- enables access trough the kerchunk API
 
With the kerchunk API, we do not necessarily need a dask cluster anymore (but without a dask cluster, the dask API will not work).
 
%% Cell type:markdown id:505d07ff-db38-4938-8b65-c18ca0391599 tags:
 
**Differences to the first example**:
 
- we open data through the so called *lazy reference* mapper with
```python
fsspec.get_mapper(
lazy=True,
)
```
which we pass to xarray afterwards. This only works for kerchunked input data.
- we add a *dict* of fspec mappern to the kerchunk plguin by setting `kp.mapper_dict`
Xpublish will recognize the
 
%% Cell type:code id:571a82ea-d7bc-42e3-8169-ae22ef999065 tags:
 
``` python
%%writefile xpublish_references.py
 
port=9010
ssl_keyfile="/work/bm0021/k204210/cloudify/workshop/key.pem"
ssl_certfile="/work/bm0021/k204210/cloudify/workshop/cert.pem"
 
from cloudify.plugins.stacer import *
from cloudify.utils.daskhelper import *
from cloudify.plugins.kerchunk import *
import xarray as xr
import xpublish as xp
import asyncio
import nest_asyncio
import sys
import os
 
nest_asyncio.apply()
 
if __name__ == "__main__": # This avoids infinite subprocess creation
#import dask
#zarrcluster = asyncio.get_event_loop().run_until_complete(get_dask_cluster())
#os.environ["ZARR_ADDRESS"]=zarrcluster.scheduler._address
 
dsname=sys.argv[1]
glob_inp=sys.argv[2]
 
dsdict={}
source="reference::/"+glob_inp
fsmap = fsspec.get_mapper(
source,
remote_protocol="file",
lazy=True,
cache_size=0
)
ds=xr.open_dataset(
fsmap,
engine="zarr",
chunks="auto",
consolidated=False
)
kp = KerchunkPass()
kp.mapper_dict = {source:fsmap}
ds=ds.drop_encoding()
ds.encoding["source"]=source
dsdict[dsname]=ds
 
collection = xp.Rest(dsdict)
collection.register_plugin(Stac())
collection.register_plugin(kp)
collection.serve(
host="0.0.0.0",
port=port,
ssl_keyfile=ssl_keyfile,
ssl_certfile=ssl_certfile
)
```
 
%% Output
 
Overwriting xpublish_references.py
 
%% Cell type:markdown id:ca64c11f-0846-4ddd-9e60-4b22dba8b32c tags:
 
We run this app with ERA5 data:
 
```
dsname="era5"
glob_inp="/work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet"
```
 
by applying:
 
%% Cell type:code id:5da13d6b-05f1-4b3b-aecd-1ac3bb635526 tags:
 
``` python
%%bash --bg
source activate /work/bm0021/conda-envs/cloudify
python xpublish_references.py era5 /work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet
```
 
%% Cell type:markdown id:a7ab712d-eff9-4a8e-8f66-fd603e2ab658 tags:
 
If sth goes wrong, you can check for *cloudify* processes that you can *kill* by ID.
 
%% Cell type:code id:9a43c4ce-be08-4493-8dd5-a3789f8c0647 tags:
 
``` python
!ps -ef | grep k204210
```
 
%% Output
 
k204210 52510 4121939 0 10:12 ? 00:00:01 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-c6562e74-6d50-4cf3-92d0-27e4f4cae6f8.json
k204210 56270 4121939 0 10:18 ? 00:00:07 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-e22a5303-3b28-41cb-b092-ed92a8ff6221.json
k204210 67839 4121939 0 10:42 ? 00:00:13 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-33489dce-534b-42ed-bbe7-6d125e3f6167.json
k204210 198403 4121939 3 11:34 ? 00:00:00 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-16ae079b-b9fb-4460-bc7c-808797637e88.json
k204210 198403 4121939 13 11:34 ? 00:00:12 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-16ae079b-b9fb-4460-bc7c-808797637e88.json
k204210 198463 198403 0 11:35 ? 00:00:00 bash
k204210 198487 198463 52 11:35 ? 00:00:05 python xpublish_references.py era5 /work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet
k204210 199212 198403 11 11:35 pts/2 00:00:00 /bin/bash -c ps -ef | grep k204210
k204210 199216 199212 0 11:35 pts/2 00:00:00 ps -ef
k204210 199217 199212 0 11:35 pts/2 00:00:00 grep k204210
k204210 198487 198463 9 11:35 ? 00:00:06 python xpublish_references.py era5 /work/bm1344/DKRZ/kerchunks_single/testera/E5_sf_an_1D.parquet
k204210 200138 198403 0 11:36 pts/2 00:00:00 /bin/bash -c ps -ef | grep k204210
k204210 200139 200138 0 11:36 pts/2 00:00:00 ps -ef
k204210 200140 200138 0 11:36 pts/2 00:00:00 grep k204210
k204210 4121487 4121482 0 08:57 ? 00:00:00 /bin/bash /var/spool/slurmd/job14671095/slurm_script
k204210 4121939 4121487 8 08:57 ? 00:13:38 /sw/spack-levante/jupyterhub/jupyterhub/bin/python /sw/spack-levante/jupyterhub/jupyterhub/bin/batchspawner-singleuser jupyterhub-singleuser --SingleUserNotebookApp.default_url=/lab/tree//home/k/k204210 --ServerApp.root_dir=/ --KernelSpecManager.ensure_native_kernel=False --ServerApp.disable_user_config=True --ServerApp.ContentsManager.allow_hidden=True
k204210 4121939 4121487 8 08:57 ? 00:13:45 /sw/spack-levante/jupyterhub/jupyterhub/bin/python /sw/spack-levante/jupyterhub/jupyterhub/bin/batchspawner-singleuser jupyterhub-singleuser --SingleUserNotebookApp.default_url=/lab/tree//home/k/k204210 --ServerApp.root_dir=/ --KernelSpecManager.ensure_native_kernel=False --ServerApp.disable_user_config=True --ServerApp.ContentsManager.allow_hidden=True
k204210 4124097 4121939 0 08:57 pts/13 00:00:00 /bin/bash -l
k204210 4124109 4121939 0 08:57 pts/14 00:00:00 /bin/bash -l
k204210 4124210 4121939 0 08:57 pts/15 00:00:00 /bin/bash -l
k204210 4125115 4121939 0 08:57 ? 00:00:01 /work/bm1344/conda-envs/virtualizarr/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-991a5929-0b29-4b79-afb2-9ad9962e513f.json
k204210 4125118 4121939 0 08:57 ? 00:00:01 /work/bm1344/conda-envs/virtualizarr/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-0bbcf52e-1759-4e8e-95b7-9f0914202162.json
k204210 4125121 4121939 0 08:57 ? 00:00:00 /bin/bash /sw/spack-levante/jupyterhub/jupyter_kernels/scripts/python3_unstable.sh /home/k/k204210/.local/share/jupyter/runtime/kernel-1f990edb-3439-4064-be93-5851e8a396ef.json
k204210 4125128 4121939 0 08:57 ? 00:00:00 /bin/bash /sw/spack-levante/jupyterhub/jupyter_kernels/scripts/python3_unstable.sh /home/k/k204210/.local/share/jupyter/runtime/kernel-44cd728a-deef-4255-9461-07327e613d97.json
k204210 4125152 4121939 0 08:57 ? 00:00:01 /work/bm0021/conda-envs/xesmf/bin/python -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-c65fa87a-5d17-4467-82e9-5d4c3a2a04a7.json
k204210 4125443 4121939 0 08:57 ? 00:00:01 /work/bm1344/conda-envs/virtualizarr/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-c6b4a066-3a6c-4547-87d4-0757879168fb.json
k204210 4125444 4121939 0 08:57 ? 00:00:01 /work/bm0021/conda-envs/cloudify/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-f2ebf347-166b-4e0f-89ee-e7e663b08a4c.json
k204210 4125445 4121939 0 08:57 ? 00:00:01 /work/bm1344/conda-envs/py_312/bin/python -Xfrozen_modules=off -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-44b9e81e-0dc4-459c-99b4-1cd8ffa3c663.json
k204210 4125998 4125121 0 08:57 ? 00:00:01 python -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-1f990edb-3439-4064-be93-5851e8a396ef.json
k204210 4125999 4125128 0 08:57 ? 00:00:01 python -m ipykernel_launcher -f /home/k/k204210/.local/share/jupyter/runtime/kernel-44cd728a-deef-4255-9461-07327e613d97.json
 
%% Cell type:code id:af33c134-f4ba-42f7-9687-7bb9948d5dfe tags:
 
``` python
!kill 1539362
!kill 198487
```
 
%% Cell type:markdown id:8e797fa6-8621-46c7-9dce-5b79adf714e3 tags:
 
**Data access via the kerchunk API**
 
You can get the host url with the hostname of the levante node you work on and the port that you used for the app:
 
%% Cell type:code id:bd1abfca-5f10-4dfc-a71f-90f0257d10d1 tags:
 
``` python
port=9010
hostname=!echo $HOSTNAME
hosturl="https://"+hostname[0]+":"+str(port)
print(hosturl)
```
 
%% Output
 
https://l40038.lvt.dkrz.de:9010
 
%% Cell type:markdown id:84951ac0-8e70-4917-892a-01b236f7c0ba tags:
 
We have to tell the python programs to do not verify ssl certificates for our purposes:
 
%% Cell type:code id:e3befb5e-99ec-4bda-aa47-303641a66320 tags:
 
``` python
storage_options=dict(verify_ssl=False)
```
 
%% Cell type:markdown id:093ceaf2-1bda-40c7-9df5-7cbe8cf9dc66 tags:
 
**Xarray**
 
%% Cell type:markdown id:878d421a-c4a4-4999-9f7e-e101d6b58082 tags:
 
Our era dataset is available via both the *zarr* API **and** the *kerchunk* API.
They are named similar:
 
%% Cell type:code id:73527b45-950c-4546-9a7e-5f1877cff132 tags:
 
``` python
dsname="era5"
zarr_url='/'.join([hosturl,"datasets",dsname,"zarr"])
kerchunk_url='/'.join([hosturl,"datasets",dsname,"kerchunk"])
print(kerchunk_url)
```
 
%% Output
 
https://l40038.lvt.dkrz.de:9010/datasets/era5/kerchunk
 
%% Cell type:code id:482f8b5c-0186-494f-91b0-e5a03c370004 tags:
 
``` python
import xarray as xr
ds=xr.open_zarr(
kerchunk_url,
consolidated=True,
storage_options=storage_options
)
```
 
%% Cell type:code id:3b7405e3-1957-46ef-b564-942850fec433 tags:
 
``` python
ds
```
 
%% Output
 
<xarray.Dataset> Size: 5TB
Dimensions: (time: 30955, cell: 542080)
Coordinates:
lat (cell) float64 4MB dask.array<chunksize=(542080,), meta=np.ndarray>
lon (cell) float64 4MB dask.array<chunksize=(542080,), meta=np.ndarray>
* time (time) datetime64[ns] 248kB 1940-01-01T11:30:00 ... 2024-09-30T1...
Dimensions without coordinates: cell
Data variables: (12/40)
100u (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
100v (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
10u (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
10v (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
2d (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
2t (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
... ...
swvl4 (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcc (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tco3 (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcw (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcwv (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tsn (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
Attributes: (12/22)
project: ECMWF Re-Analysis
project_id: ERA
institution_id: ECMWF-DKRZ
institution: Data from European Centre for Medium-Range Weather ...
source_id: IFS
source: ECMWF Integrated Forecast System (IFS) CY41R2
... ...
format: kerchunk
product: reanalysis
responsible_persons: Angelika Heil, Fabian Wachsmann
title: The DKRZ ERA5 data pool. Generated using Copernicus...
license: The ERA5 data are published with the Copernicus Pro...
references: Hersbach, H., Bell, B., Berrisford, P., Hirahara, S...
 
%% Cell type:code id:e46eeffa-f2b7-4763-9d50-3866f16b2495 tags:
 
``` python
ds.isel(time=-1).load()
```
 
%% Output
 
<xarray.Dataset> Size: 182MB
Dimensions: (cell: 542080)
Coordinates:
lat (cell) float64 4MB 89.78 89.78 89.78 89.78 ... -89.78 -89.78 -89.78
lon (cell) float64 4MB nan 20.0 40.0 60.0 ... 280.0 300.0 320.0 340.0
time datetime64[ns] 8B 2024-09-30T11:30:00
Dimensions without coordinates: cell
Data variables: (12/40)
100u (cell) float64 4MB 5.838 2.636 -0.8782 ... -8.296 -6.063 -2.922
100v (cell) float64 4MB -8.215 -9.827 -10.22 ... -6.401 -8.995 -10.81
10u (cell) float64 4MB 4.36 2.381 0.09148 ... -5.739 -4.711 -3.002
10v (cell) float64 4MB -4.953 -6.245 -6.762 ... -3.089 -4.959 -6.483
2d (cell) float64 4MB 259.5 259.7 259.9 260.1 ... 219.4 219.4 219.4
2t (cell) float64 4MB 261.9 262.1 262.3 262.5 ... 223.3 223.4 223.5
... ...
swvl4 (cell) float64 4MB 0.0 0.0 0.0 0.0 ... 0.1602 0.1602 0.1602 0.1602
tcc (cell) float64 4MB 0.9721 0.9683 0.9667 ... 0.9813 0.9838 0.9871
tco3 (cell) float64 4MB 0.006883 0.006877 0.006873 ... 0.002953 0.002952
tcw (cell) float64 4MB 5.733 5.772 5.815 5.877 ... 0.4126 0.4126 0.4126
tcwv (cell) float64 4MB 5.655 5.692 5.733 5.793 ... 0.3989 0.4008 0.4008
tsn (cell) float64 4MB 261.7 261.9 262.1 262.4 ... 221.2 221.3 221.3
Attributes: (12/22)
project: ECMWF Re-Analysis
project_id: ERA
institution_id: ECMWF-DKRZ
institution: Data from European Centre for Medium-Range Weather ...
source_id: IFS
source: ECMWF Integrated Forecast System (IFS) CY41R2
... ...
format: kerchunk
product: reanalysis
responsible_persons: Angelika Heil, Fabian Wachsmann
title: The DKRZ ERA5 data pool. Generated using Copernicus...
license: The ERA5 data are published with the Copernicus Pro...
references: Hersbach, H., Bell, B., Berrisford, P., Hirahara, S...
 
%% Cell type:markdown id:2887ced5-7341-49e0-b0f0-511d68f55c74 tags:
 
**Intake**
 
The default **method** for intake datasets is *kerchunk* i.e. the datasets are loaded through the kerchunk API per default.
 
%% Cell type:code id:5fb665bb-fdd1-47be-81ce-51d71506533c tags:
 
``` python
intake_url='/'.join([hosturl,"intake.yaml"])
print(intake_url)
```
 
%% Output
 
https://l40038.lvt.dkrz.de:9010/intake.yaml
 
%% Cell type:code id:54d805e3-6e73-46bd-838e-b8d61ecec2d6 tags:
 
``` python
import intake
cat=intake.open_catalog(
intake_url,
storage_options=storage_options
)
list(cat)
```
 
%% Output
 
['era5']
 
%% Cell type:code id:62eb5ed4-9635-4e0d-822d-38ca5aad386f tags:
 
``` python
cat[dsname](storage_options=storage_options).to_dask()
```
 
%% Output
 
/work/bm0021/conda-envs/cloudify/lib/python3.11/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
'dims': dict(self._ds.dims),
 
<xarray.Dataset> Size: 5TB
Dimensions: (time: 30955, cell: 542080)
Coordinates:
lat (cell) float64 4MB dask.array<chunksize=(542080,), meta=np.ndarray>
lon (cell) float64 4MB dask.array<chunksize=(542080,), meta=np.ndarray>
* time (time) datetime64[ns] 248kB 1940-01-01T11:30:00 ... 2024-09-30T1...
Dimensions without coordinates: cell
Data variables: (12/40)
100u (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
100v (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
10u (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
10v (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
2d (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
2t (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
... ...
swvl4 (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcc (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tco3 (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcw (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tcwv (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
tsn (time, cell) float64 134GB dask.array<chunksize=(1, 542080), meta=np.ndarray>
Attributes: (12/22)
project: ECMWF Re-Analysis
project_id: ERA
institution_id: ECMWF-DKRZ
institution: Data from European Centre for Medium-Range Weather ...
source_id: IFS
source: ECMWF Integrated Forecast System (IFS) CY41R2
... ...
format: kerchunk
product: reanalysis
responsible_persons: Angelika Heil, Fabian Wachsmann
title: The DKRZ ERA5 data pool. Generated using Copernicus...
license: The ERA5 data are published with the Copernicus Pro...
references: Hersbach, H., Bell, B., Berrisford, P., Hirahara, S...
 
%% Cell type:code id:9355a45a-2f0f-4923-967c-85afacdfcab7 tags:
 
``` python
stac_url=zarr_url.replace('/zarr','/stac')
```
 
%% Cell type:code id:7d4148b1-5e49-4b0b-8450-2928e823471d tags:
 
``` python
import pystac
import fsspec
import json
pystac.item.Item.from_dict(
json.load(fsspec.open(stac_url,**storage_options).open())
)
```
 
%% Output
 
<Item id=era5>
......