Commit 2c4a1ae1 authored by Fabian Wachsmann's avatar Fabian Wachsmann
Browse files

Merge branch 'setup-for-ci' into 'master'

Setup for ci

See merge request !74
parents 8995329d c3a4e45e
Pipeline #19358 passed with stages
in 14 minutes and 25 seconds
......@@ -21,7 +21,8 @@ build:
#- ls /pool/data
- cd docs
- chmod 755 ./leave_required_nbooks.sh && ./leave_required_nbooks.sh
- ln -s /mnt/lustre/work /work
- ln -fs /mnt/lustre/work /work && chmod 755 /work
- ls /work/ik1017/CMIP6/data/CMIP6/AerChemMIP/BCC/BCC-ESM1/hist-piAer/r1i1p1f1/AERmon/c2h6/gn/v20200511/c2h6_AERmon_BCC-ESM1_hist-piAer_r1i1p1f1_gn_185001-201412.nc
- make html
after_script:
#- conda deactivate
......
# User guide
[![Gitlab-repo](https://img.shields.io/badge/gitlab-repo-green)](https://gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases)
[![Pages](https://img.shields.io/badge/gitlab-pages-blue)](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/index.html)
[![Pages](https://img.shields.io/badge/gitlab-pages-blue)](https://tutorials.dkrz.de)
## Using Jupyter Notebooks for Model Data Analysis
......@@ -26,21 +26,13 @@ To run the notebooks, you only need a browser (like Firefox, Chrome, Safari,...)
1. Open the [DKRZ Jupyterhub](https://jupyterhub.dkrz.de) in your browser.
2. Login with your DKRZ account (if you do not have one account yet, see the links above).
3. Pick a profile (``Preset -> Start from Preset Profile``). You need a **prepost** node (they have internet access, more info [here](https://www.dkrz.de/up/systems/mistral/running-jobs/partitions-and-limits)). Choose profile ``5GB memory, prepost``.
> NOTE: Everytime you run the notebook you will use some of that RAM, we recomend to click on ``Kernel -> Shutdown kernel`` often so the memory is released. If you want to run several notebooks at the same time or one notebook several times and you cannot shoutdown the kernel each time, please, choose a job profile with a larger memory.
4. Press "start" and your Jupyter server will start (which it is also known as spawning).
5. Open a terminal in Jupyter (``New -> Terminal``, on the right side)
6. A terminal window opens on the node where your Jupyter is running.
7. Clone the notebooks from the DKRZ GitLab:
```console
$ git clone https://gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases.git
```
8. Go back to your Jupyter and open a notebook from the notebooks folder:
```
tutorials-and-use-cases/notebooks/
```
9. Make sure you use the Jupyter ``Python 3 unstable`` kernel (``Kernel -> Change Kernel``).
3. Select a *preset* spawner Option.
4. Choose *job profile* which matches your processing requirements. We recommend to use at least 10GB memory. Find info about the partitons [here](https://docs.dkrz.de/doc/levante/running-jobs/partitions-and-limits.html) or note the *mouse hoover*. Specify an account (the luv account which your user belongs to, e.g. bk1088).
5. Press "start" and your Jupyter server will start (which it is also known as spawning). The server will run for the specified time in which you can always come back to the server (i.e. reopen the web-url) and continue to work.
6. In the upper bar, click on ``Git -> Clone a Repository``
7. In the alert window, type in ``https://gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases.git``. When it is successfull, a new folder appears in the data browser which is the cloned repo.
8. In the data browser, change the directory to ``tutorials-and-use-cases/notebooks`` and browse and open a notebook from this folder.
9. Make sure you use a recent ``Python 3`` kernel (``Kernel -> Change Kernel``).
### Advanced
......@@ -61,10 +53,6 @@ $ bash make_kernel.sh
* notebooks/demo/tutorial_*
1. We prepared a tutorial on how to use [Intake](https://intake.readthedocs.io/en/latest/) in the DKRZ data pool. [![NBViewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg)](https://nbviewer.jupyter.org/urls/gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases/-/raw/master/notebooks/demo/tutorial_intake.ipynb)
2. ESMVal-Tool
### Use-cases
* notebooks/demo/use-case_*
......@@ -72,14 +60,10 @@ $ bash make_kernel.sh
## Further Infos
* Find more in the DKRZ Jupyterhub [documentation](https://jupyterhub.gitlab-pages.dkrz.de/jupyterhub-docs/index.html).
* *prepost* nodes at DKRZ have internet access [info](https://www.dkrz.de/up/systems/mistral/running-jobs/partitions-and-limits).
* ``Python 3 unstable`` kernel: This kernel already contains all the common geoscience packages that we need for our notebooks.
* Find more in the DKRZ Jupyterhub [documentation](https://docs.dkrz.de/doc/software%26services/jupyterhub/index.html).
* See in this [video](https://youtu.be/f0wZX9i0uWQ) the main features of the DKRZ Jupterhub and how to use it.
* Advanced users developing their own notebooks can find there how to create their own environments that are visible as kernels by the Jupyterhub.
Besides the information on the Jupyterhub, in these DKRZ [docs](https://www.dkrz.de/up/systems/mistral/programming/jupyter-notebook) you can find how to run Jupyter notebooks directly in the DKRZ server, that is, out of the Jupyterhub (it entails that you install the geoscience packages you need).
## Exercises
In this hands-on we will find, analyze, and visualize data from our DKRZ data pool. The goal is to create two maps, one showing the number of tropical nights for 2014 (the most recent year of the historical dataset) and another one showing a chosen year in the past. The hands-on will be split into two exercises:
......
......@@ -54,6 +54,7 @@ nb_execution_allow_errors = True
nbsphinx_kernel_name = 'python3'
#nbsphinx_timeout = -1
nb_execution_timeout = -1
execution_excludepatterns = ['*era5*.ipynb']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
......
......@@ -16,7 +16,7 @@ Welcome to Tutorials and Use Cases's!
This `Gitlab repository <https://gitlab.dkrz.de/data-infrastructure-services/tutorials-and-use-cases/>`_ collects and prepares Jupyter notebooks with coding examples on how to use state-of-the-art *processing tools* on *big data* collections. The Jupyter notebooks highlight the optimal usage of *High-Performance Computing resources* and adress data analysists and researchers which begin to work with resources of German Climate Computing Center `DKRZ <https://www.dkrz.de/>`_.
The Jupyter notebooks are meant to run in the `Jupyterhub portal <https://jupyterhub.dkrz.de/>`_. See in this `video <https://youtu.be/f0wZX9i0uWQ>`_ the main features of the DKRZ Jupterhub/lab and how to use it. Clone this repositroy into your home directory at the DKRZ supercomputers Levante and Mistral. The contents will be visible from the Jupyterhub portal. When you open a notebook in the Jupyterhub, make sure you choose the Python 3 unstable kernel on the Kernel tab (upper tool bar in the Jupyterhub). This kernel contains most of the common geoscience packages in current versions.
The Jupyter notebooks are meant to run in the `Jupyterhub portal <https://jupyterhub.dkrz.de/>`_. See in this `video <https://youtu.be/f0wZX9i0uWQ>`_ the main features of the DKRZ Jupterhub/lab and how to use it. Clone this repositroy into your home directory at the DKRZ supercomputers Levante and Mistral. The contents will be visible from the Jupyterhub portal. When you open a notebook in the Jupyterhub, make sure you choose a *recent* Python 3 kernel on the Kernel tab (upper tool bar in the Jupyterhub). Such a kernel contains most of the common geoscience packages in current versions.
Direct and fast access to DKRZ's data pools is a main benefit of the `server-side <https://en.wikipedia.org/wiki/Server-side>`_ data-near computing demonstrated here. Note that running the notebooks on your local computer will generally require much memory and processing resources.
......
Intake
===================================================
This is a training series on the cataloging package *intake*.
.. toctree::
:maxdepth: 1
tutorial_intake-1-introduction.ipynb
tutorial_intake-1-2-dkrz-catalogs.ipynb
tutorial_intake-1-3-dkrz-catalogs-era5.ipynb
tutorial_intake-2-subset-catalogs.ipynb
tutorial_intake-3-merge-catalogs.ipynb
tutorial_intake-4-preprocessing-derived-vars.ipynb
tutorial_intake-5-create-esm-collection.ipynb
.. Tutorials and Use Cases documentation master file, created by
sphinx-quickstart on Tue Jan 19 15:58:41 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Data Quality Checker
===================================================
This series is on different data and meta data quality checker which are used for ensuring high data quality when publishing to data repositories like ESGF.
.. toctree::
:maxdepth: 1
tutorial_data-quality_atmodat-checker.ipynb
../../notebooks/./demo/tutorial_esmvaltool.ipynb
\ No newline at end of file
../../notebooks/demo/tutorial_intake-1-3-dkrz-catalogs-era5.ipynb
\ No newline at end of file
......@@ -6,15 +6,12 @@
Tutorials
===================================================
This section contains a series of Notebooks for different python packages which are supported at DKRZ and which we recommend you to use.
Each training series aims at making you fully capable of applying the python package for enhanced climate data analysis.
.. toctree::
:maxdepth: 1
tutorial_esmvaltool.ipynb
tutorial_intake-1-introduction.ipynb
tutorial_intake-1-2-dkrz-catalogs.ipynb
tutorial_intake-2-subset-catalogs.ipynb
tutorial_intake-3-merge-catalogs.ipynb
tutorial_intake-4-preprocessing-derived-vars.ipynb
tutorial_intake-5-create-esm-collection.ipynb
tutorial_data-quality_atmodat-checker.ipynb
intake
quality
tutorial_cloud_tzis.ipynb
......@@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
......@@ -11,11 +11,11 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"nbs = !find ./ -mindepth 2 -name \"*.ipynb\" ! -name \"*checkpoint.ipynb\" -type f "
"nbs = !find ./ -mindepth 2 -name \"*.ipynb\" ! -name \"*checkpoint.ipynb\" ! -name \"*era5*\" -type f "
]
},
{
......@@ -84,9 +84,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3 (based on the module python3/2022.01)",
"language": "python",
"name": "python3"
"name": "python3_2022_01"
},
"language_info": {
"codemirror_mode": {
......@@ -98,7 +98,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
"version": "3.9.9"
}
},
"nbformat": 4,
......
%% Cell type:code id: tags:
``` python
import json
```
%% Cell type:code id: tags:
``` python
nbs = !find ./ -mindepth 2 -name "*.ipynb" ! -name "*checkpoint.ipynb" -type f
nbs = !find ./ -mindepth 2 -name "*.ipynb" ! -name "*checkpoint.ipynb" ! -name "*era5*" -type f
```
%% Cell type:code id: tags:
``` python
for nb in nbs:
#!jupyter nbconvert --clear-output --inplace {nb}
!jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace {nb}
with open(nb, 'r') as jsonFile:
nbdata=json.load(jsonFile)
nbdata["metadata"]["kernelspec"]["name"]="python3"
nbdata["metadata"]["kernelspec"]["display_name"]="python3"
with open(nb, 'w') as jsonFile:
json.dump(nbdata, jsonFile, indent=1, ensure_ascii=False)
```
%%%% Output: stream
[NbConvertApp] Converting notebook ./hands-on_excercises/1_hands-on_find_data_intake.ipynb to notebook
[NbConvertApp] Writing 4392 bytes to ./hands-on_excercises/1_hands-on_find_data_intake.ipynb
[NbConvertApp] Converting notebook ./hands-on_excercises/2_hands-on_tropical_nights_intake_xarray_cmip6.ipynb to notebook
[NbConvertApp] Writing 4843 bytes to ./hands-on_excercises/2_hands-on_tropical_nights_intake_xarray_cmip6.ipynb
[NbConvertApp] Converting notebook ./hands-on_solutions/2_hands-on_tropical_nights_intake_xarray_cmip6_solution.ipynb to notebook
[NbConvertApp] Writing 7892 bytes to ./hands-on_solutions/2_hands-on_tropical_nights_intake_xarray_cmip6_solution.ipynb
[NbConvertApp] Converting notebook ./hands-on_solutions/1_hands-on_find_data_intake_solution.ipynb to notebook
[NbConvertApp] Writing 4999 bytes to ./hands-on_solutions/1_hands-on_find_data_intake_solution.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_simple-vis_xarray-matplotlib_cmip6.ipynb to notebook
[NbConvertApp] Writing 8696 bytes to ./demo/use-case_simple-vis_xarray-matplotlib_cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb to notebook
[NbConvertApp] Writing 12892 bytes to ./demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/dkrz-data-pool-cmip6.ipynb to notebook
[NbConvertApp] Writing 16043 bytes to ./demo/dkrz-data-pool-cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/dkrz-intake-catalog.ipynb to notebook
[NbConvertApp] Writing 12605 bytes to ./demo/dkrz-intake-catalog.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb to notebook
[NbConvertApp] Writing 13772 bytes to ./demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb to notebook
[NbConvertApp] Writing 15325 bytes to ./demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/tutorial_esmvaltool.ipynb to notebook
[NbConvertApp] Writing 5332 bytes to ./demo/tutorial_esmvaltool.ipynb
[NbConvertApp] Converting notebook ./demo/dkrz-jupyterhub-notebook.ipynb to notebook
[NbConvertApp] Writing 5453 bytes to ./demo/dkrz-jupyterhub-notebook.ipynb
[NbConvertApp] Converting notebook ./demo/tutorial_intake.ipynb to notebook
[NbConvertApp] Writing 18803 bytes to ./demo/tutorial_intake.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_multimodel-comparison_xarray-cdo_cmip6.ipynb to notebook
[NbConvertApp] Writing 17702 bytes to ./demo/use-case_multimodel-comparison_xarray-cdo_cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_plot-unstructured_psyplot_cmip6.ipynb to notebook
[NbConvertApp] Writing 9342 bytes to ./demo/use-case_plot-unstructured_psyplot_cmip6.ipynb
[NbConvertApp] Converting notebook ./demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb to notebook
[NbConvertApp] Writing 14045 bytes to ./demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb
%% Cell type:code id: tags:
``` python
```
......
......@@ -132,8 +132,8 @@
"source": [
"import intake\n",
"# Path to master catalog on the DKRZ server\n",
"col_url = \"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"\n",
"parent_col=intake.open_catalog(col_url)\n",
"col_url = \"https://dkrz.de/s/intake\"\n",
"parent_col=intake.open_catalog([col_url])\n",
"list(parent_col)\n",
"\n",
"# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n",
......@@ -149,7 +149,7 @@
"source": [
"# We just use the first file from the CMIP6 catalog and copy it to the local disk because we make some experiments from it\n",
"download_file=col.df[\"uri\"].values[0]\n",
"!cp {download_file} ./"
"#!cp {download_file} ./"
]
},
{
......
%% Cell type:markdown id:77da5fe5-80a7-4952-86e4-f39b3f06ddef tags:
## ATMODAT Standard Compliance Checker
This notebook introduces you to the [atmodat checker](https://github.com/AtMoDat/atmodat_data_checker) which contains checks to ensure compliance with the ATMODAT Standard.
> Its core functionality is based on the [IOOS compliance checker](https://github.com/ioos/compliance-checker). The ATMODAT Standard Compliance Checker library makes use of [cc-yaml](https://github.com/cedadev/cc-yaml), which provides a plugin for the IOOS compliance checker that generates check suites from YAML descriptions. Furthermore, the Compliance Check Library is used as the basis to define generic, reusable compliance checks.
In addition, the compliance to the **CF Conventions 1.4 or higher** is verified with the [CF checker](https://github.com/cedadev/cf-checker).
%% Cell type:markdown id:edb35c53-dc33-4f1f-a4af-5a8ea69e5dfe tags:
In this notebook, you will learn
- [how to use an environment on DKRZ HPC mistral or levante](#Preparation)
- [how to run checks with the atmodat data checker](#Application)
- [to understand the results of the checker and further analyse it with pandas](#Results)
- [how you could proceed to cure the data with xarray if it does not pass the QC](#Curation)
%% Cell type:markdown id:3abf2250-4b78-4043-82fe-189875d692f2 tags:
### Preparation
On DKRZ's High-performance computer PC, we provide a `conda` environment which are useful for working with data in DKRZ’s CMIP Data Pool.
**Option 1: Activate checker libraries for working with a comand-line shell**
If you like to work with shell commands, you can simply activate the environment. Prior to this, you may have
to load a module with a recent python interpreter
```bash
module load python3/unstable
#The following line activates the quality-assurance environment mit den checker libraries so that you can execute them with shell commands:
source activate /work/bm0021/conda-envs/quality-assurance
```
%% Cell type:markdown id:dff94c1c-8aa1-42aa-9486-f6d5a6df1884 tags:
**Option 2: Create a kernel with checker libraries to work with jupyter notebooks**
With `ipykernel` you can install a *kernel* which can be used within a jupyter server like [jupyterhub](https://jupyterhub.dkrz.de). `ipykernel` creates the kernel based on the activated environment.
```bash
module load python3/unstable
#The following line activates the quality-assurance environment mit den checker libraries so that you can execute them with shell commands:
source activate /work/bm0021/conda-envs/quality-assurance
python -m ipykernel install --user --name qualitychecker --display-name="qualitychecker"
```
If you run this command from within a jupyter server, you have to restart the jupyterserver afterwards to be able to select the new *quality checker* kernel.
%% Cell type:markdown id:95f9ba22-f84c-42e4-9952-ff6ef4f7b86d tags:
**Expert mode**: Running the jupyter server from a different environment than the environment in which atmodat is installed
Make sure that you:
1. Install the `cfunits` package to the jupyter environment via `conda install cfunits -c conda-forge -p $jupyterenv` and restart the kernel.
1. Add the atmodat environment to the `PATH` environment variable inside the notebook. Otherwise, the notebook's shell does not find the application `run_checks`. You can modify environment variables with the `os` package and its command `os.envrion`. The environment of the kernel can be found with `sys` and `sys.executable`. The following block sets the environment variable `PATH` correctly:
%% Cell type:code id:955fcaff-3b3f-4e5e-8c56-59ed90a4bca2 tags:
``` python
import sys
import os
os.environ["PATH"]=os.environ["PATH"]+":"+os.path.sep.join(sys.executable.split('/')[:-1])
```
%% Cell type:code id:72c0158e-1fbb-420b-8976-329579e397b9 tags:
``` python
#As long as there is the installation bug, we have to manually get the Atmodat CVs:
if not "AtMoDat_CVs" in [dirpath.split(os.path.sep)[-1]
for (dirpath, dirs, files) in os.walk(os.path.sep.join(sys.executable.split('/')[:-2]))] :
!git clone https://github.com/AtMoDat/AtMoDat_CVs.git {os.path.sep.join(sys.executable.split('/')[:-2])}/lib/python3.9/site-packages/atmodat_checklib/AtMoDat_CVs
```
%% Cell type:markdown id:3d0c7dc2-4e14-4738-92c5-b8c107916656 tags:
### Data to be checked
In this tutorial, we will check a small subset of CMIP6 data which we gain via `intake`:
%% Cell type:code id:75e90932-4e2f-478c-b7b5-d82b9fd347c9 tags:
``` python
import intake
# Path to master catalog on the DKRZ server
col_url = "https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"
parent_col=intake.open_catalog(col_url)
col_url = "https://dkrz.de/s/intake"
parent_col=intake.open_catalog([col_url])
list(parent_col)
# Open the catalog with the intake package and name it "col" as short for "collection"
col=parent_col["dkrz_cmip6_disk"]
```
%% Cell type:code id:d30edc41-2561-43b1-879f-5e5d58784e4e tags:
``` python
# We just use the first file from the CMIP6 catalog and copy it to the local disk because we make some experiments from it
download_file=col.df["uri"].values[0]
!cp {download_file} ./
#!cp {download_file} ./
```
%% Cell type:code id:47e26721-4281-4acd-9205-2eb77b2ac05a tags:
``` python
exp_file=download_file.split('/')[-1]
exp_file
```
%% Cell type:markdown id:f1476f21-6f58-4430-9602-f18d8fa79460 tags:
### Application
The command `run_checks` can be executed from any directory from within the atmodat conda environment.
The atmodat checker contains two modules:
- one that checks the global attributes for compliance with the ATMODAT standard
- another that performs a standard CF check (building upon the cfchecks library).
%% Cell type:markdown id:365507aa-33a6-42df-9b35-7ead7da006b6 tags:
Show usage instructions of `run_checks`
%% Cell type:code id:76dabfbf-839b-4dca-844c-514cf82f0b66 tags:
``` python
!run_checks -h
```
%% Cell type:markdown id:2c04701c-bc27-4460-b80e-d32daf4a7376 tags:
The results of the performed checks are provided in the checker_output directory. By default, `run_checks` assumes writing permissions in the path where the atmodat checker is installed. If this is not the case, you must specify an output directory where you possess writing permissions with the `-op output_path`.
In the following block, we set the *output path* to the current working directory which we get via the bash command `pwd`. We apply `run_checks` for the `exp_file` which we downloaded in the chapter before.
%% Cell type:code id:c3ef1468-6ce9-4869-a173-2374eca5bc2c tags:
``` python
cwd=!pwd
cwd=cwd[0]
!run_checks -f {exp_file} -op {cwd} -s
```
%% Cell type:markdown id:13e20408-b6fa-4d39-be02-41db2109c980 tags:
Now, we have a directory `atmodat_checker_output` in the `op`. For each run of `run_checks`, a new directory is created inside of `op` named by the timestamp. Additionally, a directory *latest* always shows the output of the most recent run.
%% Cell type:code id:601f3486-91e2-4ff5-9f8e-324f10f799b5 tags:
``` python
!ls {os.path.sep.join([cwd, "atmodat_checker_output"])}
```
%% Cell type:markdown id:fa5ef2a4-a1da-4fa0-873f-902884ea4db6 tags:
As we ran `run_checks` with the option `-s`, one output is the *short_summary.txt* file which we `cat` in the following:
%% Cell type:code id:9f6c38fd-199b-413e-9821-6535235be83c tags:
``` python
output_dir_string=os.path.sep.join(["atmodat_checker_output","latest"])
output_path=os.path.sep.join([cwd, output_dir_string])
!cat {os.path.sep.join([output_path, "short_summary.txt"])}
```
%% Cell type:markdown id:99d2ba16-52c2-4cb6-b82b-226e75463aab tags:
### Results
The short summary contains information about versions, the timestamp of execution, the ratio of passed checks on attributes and errors written by the CF checker.
- cfchecks routine only issues a warning/information message if variable metadata are completely missing.
- Zero errors in the cfchecks routine does not necessarily mean that a data file is CF compliant!
We can also have a look into the detailled output including the exact error message in the *long_summary_* files which are subdivided into severe levels.
%% Cell type:code id:9600c713-1203-430b-a4a6-bf70ec441221 tags:
``` python
!cat {os.path.sep.join([output_path,"long_summary_recommended.csv"])}
```
%% Cell type:code id:b9fa72d6-6e5f-433a-81f0-40e4cd5a94cd tags:
``` python
!cat {os.path.sep.join([output_path,"long_summary_mandatory.csv"])}
```
%% Cell type:markdown id:b94a7c75-abc6-4792-aa5f-65467c6522de tags:
We can open the *.csv* files with `pandas` to further analyse the output.
%% Cell type:code id:f02ea2c4-7238-4afd-aef0-565aa5a5787f tags:
``` python
import pandas as pd
recommend_df=pd.read_csv(os.path.sep.join([output_path,"long_summary_recommended.csv"]))
recommend_df
```
%% Cell type:markdown id:6453b4ca-288e-4c49-8c93-da4524ef5792 tags:
There may be **missing** global attributes wich are recommended by the *atmodat standard*. We can find them with pandas:
%% Cell type:code id:f0a7e6db-f79a-448f-8046-bb4bf3bcef9d tags:
``` python
missing_recommend_atts=list(
recommend_df.loc[recommend_df["Error Message"]=="global attribute is not present"]["Global Attribute"]
)
missing_recommend_atts
```
%% Cell type:markdown id:06283c25-c5b6-450f-bfe9-d65e8fe26623 tags:
### Curation
Let's try first steps to *cure* the file by adding a missing attribute with `xarray`. We can open the file into an *xarray dataset* with:
%% Cell type:code id:b294cd89-d55c-421f-82e2-4cf42ece7d62 tags:
``` python
import xarray as xr
exp_file_ds=xr.open_dataset(exp_file)
exp_file_ds
```
%% Cell type:markdown id:f02bc09f-94dc-4e0f-b12f-9798549e90e8 tags:
We can **handle and add attributes** via the `dict`-type attribute `.attrs`. Applied on the dataset, it shows all *global attributes* of the file:
%% Cell type:code id:fc0ffe80-4288-4ac3-a599-3239f37f461d tags:
``` python
exp_file_ds.attrs
```
%% Cell type:markdown id:6f61190e-49bc-40da-8b33-30f3debd1895 tags:
We add all missing attributes and set a dummy value for them:
%% Cell type:code id:3fd18adf-fe43-4d47-b565-d082b80b970d tags:
``` python
for att in missing_recommend_atts:
exp_file_ds.attrs[att]="Dummy"
```
%% Cell type:markdown id:56e26094-0ad6-42a9-afaf-5c482ee8ca87 tags:
We save the modified dataset with the `to_netcdf` function:
%% Cell type:code id:8050d724-da0d-417a-992e-24bb5aae0c82 tags:
``` python
exp_file_ds.to_netcdf(exp_file+".modified.nc")
```
%% Cell type:markdown id:5794c6ce-fff2-4c6e-8c08-aaf5dd342f8d tags:
Now, lets run `run_checks` again.
We can also only provide a directory instead of a file as an argument with the option `-p`. The checker will find all `.nc` files inside that directory.
%% Cell type:code id:6c3698f7-62a4-4297-bfbf-d6447a0f006a tags:
``` python
!run_checks -p {cwd} -op {cwd} -s
```
%% Cell type:markdown id:c72647ee-7497-42df-ae68-f6a2d4ea87ad tags:
Using the *latest* directory, here is the new summary:
%% Cell type:code id:51d2eff6-2a31-47b7-a706-f2555e03b9c3 tags:
``` python
!cat {os.path.sep.join([output_path,"short_summary.txt"])}
```
%% Cell type:markdown id:1c9205ec-4f5f-4173-bb0d-1896785a9d04 tags:
You can see that the checks do not fail for the modified file when subtracting the earlier failes from the sum of new passed checks.
......
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=200 src=\"https://www.smhi.se/polopoly_fs/1.135796.1527766089!/image/LoggaEUCP.png_gen/derivatives/Original_366px/image/LoggaEUCP.png\"> <img width=200 src=\"https://zenodo.org/api/files/00000000-0000-0000-0000-000000000000/is-enes3/logo.png\"> <img width=200 src=\"https://www.dtls.nl/wp-content/uploads/2015/03/NleSc.png\"> <img width=200 src=\"https://www.dkrz.de/++theme++dkrz.theme/images/logo.png\"> <img width=200 src=\"https://jupyter.org/assets/hublogo.svg\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The new Python API for ESMValTool\n",
"\n",
"ESMValTool is a library of climate analysis workflows (\"recipes\"), as well as a tool to execute them. With the new Python API, this library is now also easily accessible in Jupyter environment. It allows you to easily run existing recipes as well as developing new ones. A very useful feature is that you can directly access all the output (data, images, etc) and further process them in the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import the tool\n",
"import esmvalcore.experimental as esmvaltool"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inspect/modify settings\n",
"\n",
"If you want, you can look at the configuration. Notice that there are some default data paths set to where CMIP data is stored on Mistral. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"esmvaltool.CFG"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The default settings should be okay for most use cases. However, should you wish to modify some settings, this is quite straightforward:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"esmvaltool.CFG['max_parallel_tasks'] = 1\n",
"esmvaltool.CFG"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### See all the available ESMValTool recipes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_recipes = esmvaltool.get_all_recipes()\n",
"all_recipes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we'll just run the example recipe. We can search through the available recipes and select it"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"examples = all_recipes.find(\"example\")\n",
"examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"recipe_python = examples[7]\n",
"recipe_python"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running the example recipe\n",
"\n",
"Now that we've selected our recipe, we can just run it and inspect the output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output = recipe_python.run()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output['timeseries/script1'][0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output['map/script1'][2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"output['map/script1'][1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the output is a dataset, you can load it with `xarray` or `iris`. In this way, you can immediately continue to work with the (pre-)processed data in your notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xrds = output['map/script1'][1].load_xarray()\n",
"xrds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cube = output['map/script1'][1].load_iris()[0]\n",
"print(cube)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xrds.tas.plot(figsize=(15, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Final remarks\n",
"\n",
"For more information on all available recipes, visit the es"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "python3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
\ No newline at end of file
%% Cell type:markdown id: tags:
<img width=200 src="https://www.smhi.se/polopoly_fs/1.135796.1527766089!/image/LoggaEUCP.png_gen/derivatives/Original_366px/image/LoggaEUCP.png"> <img width=200 src="https://zenodo.org/api/files/00000000-0000-0000-0000-000000000000/is-enes3/logo.png"> <img width=200 src="https://www.dtls.nl/wp-content/uploads/2015/03/NleSc.png"> <img width=200 src="https://www.dkrz.de/++theme++dkrz.theme/images/logo.png"> <img width=200 src="https://jupyter.org/assets/hublogo.svg">
%% Cell type:markdown id: tags:
# The new Python API for ESMValTool
ESMValTool is a library of climate analysis workflows ("recipes"), as well as a tool to execute them. With the new Python API, this library is now also easily accessible in Jupyter environment. It allows you to easily run existing recipes as well as developing new ones. A very useful feature is that you can directly access all the output (data, images, etc) and further process them in the notebook.
%% Cell type:code id: tags:
``` python
# Import the tool
import esmvalcore.experimental as esmvaltool
```
%% Cell type:markdown id: tags:
### Inspect/modify settings
If you want, you can look at the configuration. Notice that there are some default data paths set to where CMIP data is stored on Mistral.
%% Cell type:code id: tags:
``` python
esmvaltool.CFG
```
%% Cell type:markdown id: tags:
The default settings should be okay for most use cases. However, should you wish to modify some settings, this is quite straightforward:
%% Cell type:code id: tags:
``` python
esmvaltool.CFG['max_parallel_tasks'] = 1
esmvaltool.CFG
```
%% Cell type:markdown id: tags:
### See all the available ESMValTool recipes
%% Cell type:code id: tags:
``` python
all_recipes = esmvaltool.get_all_recipes()
all_recipes
```
%% Cell type:markdown id: tags:
Here, we'll just run the example recipe. We can search through the available recipes and select it
%% Cell type:code id: tags:
``` python
examples = all_recipes.find("example")
examples
```
%% Cell type:code id: tags:
``` python
recipe_python = examples[7]
recipe_python
```
%% Cell type:markdown id: tags:
### Running the example recipe
Now that we've selected our recipe, we can just run it and inspect the output
%% Cell type:code id: tags:
``` python
output = recipe_python.run()
```
%% Cell type:code id: tags:
``` python
output
```
%% Cell type:code id: tags:
``` python
output['timeseries/script1'][0]
```
%% Cell type:code id: tags:
``` python
output['map/script1'][2]
```
%% Cell type:code id: tags:
``` python
output['map/script1'][1]
```
%% Cell type:markdown id: tags:
If the output is a dataset, you can load it with `xarray` or `iris`. In this way, you can immediately continue to work with the (pre-)processed data in your notebook.
%% Cell type:code id: tags:
``` python
xrds = output['map/script1'][1].load_xarray()