Commit 708ecaa6 authored by Fabian Wachsmann's avatar Fabian Wachsmann
Browse files

Merge branch 'setup-for-ci' into 'master'

Setup for ci

See merge request !70
parents e63cd493 389af878
Pipeline #18273 canceled with stages
in 71 minutes and 38 seconds
......@@ -65,6 +65,8 @@ pages:
needs:
- build
script:
- chmod 600 $DEPLOY_KEY
- "rsync -rlgD --delete -e \"ssh -i $DEPLOY_KEY -o UserKnownHostsFile=$DEPLOY_HOST_KEYS\" docs/build/html/ $DEPLOY_USER@$DEPLOY_HOST:/var/www/$DEPLOY_USER"
- if ! test -d public ; then mkdir public ; fi
- mv docs/build/html/* public/
only:
......
../../notebooks/demo/tutorial_intake-1-2-dkrz-catalogs.ipynb
\ No newline at end of file
......@@ -11,6 +11,7 @@ Tutorials
tutorial_esmvaltool.ipynb
tutorial_intake-1-introduction.ipynb
tutorial_intake-1-2-dkrz-catalogs.ipynb
tutorial_intake-2-subset-catalogs.ipynb
tutorial_intake-3-merge-catalogs.ipynb
tutorial_intake-4-preprocessing-derived-vars.ipynb
......
%% Cell type:markdown id:77da5fe5-80a7-4952-86e4-f39b3f06ddef tags:
## ATMODAT Standard Compliance Checker
This notebook introduces you to the [atmodat checker](https://github.com/AtMoDat/atmodat_data_checker) which contains checks to ensure compliance with the ATMODAT Standard.
> Its core functionality is based on the [IOOS compliance checker](https://github.com/ioos/compliance-checker). The ATMODAT Standard Compliance Checker library makes use of [cc-yaml](https://github.com/cedadev/cc-yaml), which provides a plugin for the IOOS compliance checker that generates check suites from YAML descriptions. Furthermore, the Compliance Check Library is used as the basis to define generic, reusable compliance checks.
In addition, the compliance to the **CF Conventions 1.4 or higher** is verified with the [CF checker](https://github.com/cedadev/cf-checker).
%% Cell type:markdown id:edb35c53-dc33-4f1f-a4af-5a8ea69e5dfe tags:
In this notebook, you will learn
- [how to use an environment on DKRZ HPC mistral or levante](#Preparation)
- [how to run checks with the atmodat data checker](#Application)
- [to understand the results of the checker and further analyse it with pandas](#Results)
- [how you could proceed to cure the data with xarray if it does not pass the QC](#Curation)
%% Cell type:markdown id:3abf2250-4b78-4043-82fe-189875d692f2 tags:
### Preparation
On DKRZ's High-performance computer PC, we provide a `conda` environment which are useful for working with data in DKRZ’s CMIP Data Pool.
**Option 1: Activate checker libraries for working with a comand-line shell**
If you like to work with shell commands, you can simply activate the environment. Prior to this, you may have
to load a module with a recent python interpreter
```bash
module load python3/unstable
#The following line activates the quality-assurance environment mit den checker libraries so that you can execute them with shell commands:
source activate /work/bm0021/conda-envs/quality-assurance
```
%% Cell type:markdown id:dff94c1c-8aa1-42aa-9486-f6d5a6df1884 tags:
**Option 2: Create a kernel with checker libraries to work with jupyter notebooks**
With `ipykernel` you can install a *kernel* which can be used within a jupyter server like [jupyterhub](https://jupyterhub.dkrz.de). `ipykernel` creates the kernel based on the activated environment.
```bash
module load python3/unstable
#The following line activates the quality-assurance environment mit den checker libraries so that you can execute them with shell commands:
source activate /work/bm0021/conda-envs/quality-assurance
python -m ipykernel install --user --name qualitychecker --display-name="qualitychecker"
```
If you run this command from within a jupyter server, you have to restart the jupyterserver afterwards to be able to select the new *quality checker* kernel.
%% Cell type:markdown id:95f9ba22-f84c-42e4-9952-ff6ef4f7b86d tags:
**Expert mode**: Running the jupyter server from a different environment than the environment in which atmodat is installed
Make sure that you:
1. Install the `cfunits` package to the jupyter environment via `conda install cfunits -c conda-forge -p $jupyterenv` and restart the kernel.
1. Add the atmodat environment to the `PATH` environment variable inside the notebook. Otherwise, the notebook's shell does not find the application `run_checks`. You can modify environment variables with the `os` package and its command `os.envrion`. The environment of the kernel can be found with `sys` and `sys.executable`. The following block sets the environment variable `PATH` correctly:
%% Cell type:code id:955fcaff-3b3f-4e5e-8c56-59ed90a4bca2 tags:
``` python
import sys
import os
os.environ["PATH"]=os.environ["PATH"]+":"+os.path.sep.join(sys.executable.split('/')[:-1])
```
%% Cell type:code id:72c0158e-1fbb-420b-8976-329579e397b9 tags:
``` python
#As long as there is the installation bug, we have to manually get the Atmodat CVs:
if not "AtMoDat_CVs" in [dirpath.split(os.path.sep)[-1]
for (dirpath, dirs, files) in os.walk(os.path.sep.join(sys.executable.split('/')[:-2]))] :
!git clone https://github.com/AtMoDat/AtMoDat_CVs.git {os.path.sep.join(sys.executable.split('/')[:-2])}/lib/python3.9/site-packages/atmodat_checklib/AtMoDat_CVs
```
%% Cell type:markdown id:3d0c7dc2-4e14-4738-92c5-b8c107916656 tags:
### Data to be checked
In this tutorial, we will check a small subset of CMIP6 data which we gain via `intake`:
%% Cell type:code id:75e90932-4e2f-478c-b7b5-d82b9fd347c9 tags:
``` python
import intake
# Path to master catalog on the DKRZ server
col_url = "https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml"
col_url = "https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"
parent_col=intake.open_catalog(col_url)
list(parent_col)
# Open the catalog with the intake package and name it "col" as short for "collection"
col=parent_col["dkrz_cmip6_disk_netcdf_fromcloud"]
col=parent_col["dkrz_cmip6_disk"]
```
%% Cell type:code id:d30edc41-2561-43b1-879f-5e5d58784e4e tags:
``` python
# We just use the first file from the CMIP6 catalog and copy it to the local disk because we make some experiments from it
download_file=col.df["path"].values[0]
download_file=col.df["uri"].values[0]
!cp {download_file} ./
```
%% Cell type:code id:47e26721-4281-4acd-9205-2eb77b2ac05a tags:
``` python
exp_file=download_file.split('/')[-1]
exp_file
```
%% Cell type:markdown id:f1476f21-6f58-4430-9602-f18d8fa79460 tags:
### Application
The command `run_checks` can be executed from any directory from within the atmodat conda environment.
The atmodat checker contains two modules:
- one that checks the global attributes for compliance with the ATMODAT standard
- another that performs a standard CF check (building upon the cfchecks library).
%% Cell type:markdown id:365507aa-33a6-42df-9b35-7ead7da006b6 tags:
Show usage instructions of `run_checks`
%% Cell type:code id:76dabfbf-839b-4dca-844c-514cf82f0b66 tags:
``` python
!run_checks -h
```
%% Cell type:markdown id:2c04701c-bc27-4460-b80e-d32daf4a7376 tags:
The results of the performed checks are provided in the checker_output directory. By default, `run_checks` assumes writing permissions in the path where the atmodat checker is installed. If this is not the case, you must specify an output directory where you possess writing permissions with the `-op output_path`.
In the following block, we set the *output path* to the current working directory which we get via the bash command `pwd`. We apply `run_checks` for the `exp_file` which we downloaded in the chapter before.
%% Cell type:code id:c3ef1468-6ce9-4869-a173-2374eca5bc2c tags:
``` python
cwd=!pwd
cwd=cwd[0]
!run_checks -f {exp_file} -op {cwd} -s
```
%% Cell type:markdown id:13e20408-b6fa-4d39-be02-41db2109c980 tags:
Now, we have a directory `atmodat_checker_output` in the `op`. For each run of `run_checks`, a new directory is created inside of `op` named by the timestamp. Additionally, a directory *latest* always shows the output of the most recent run.
%% Cell type:code id:601f3486-91e2-4ff5-9f8e-324f10f799b5 tags:
``` python
!ls {os.path.sep.join([cwd, "atmodat_checker_output"])}
```
%% Cell type:markdown id:fa5ef2a4-a1da-4fa0-873f-902884ea4db6 tags:
As we ran `run_checks` with the option `-s`, one output is the *short_summary.txt* file which we `cat` in the following:
%% Cell type:code id:9f6c38fd-199b-413e-9821-6535235be83c tags:
``` python
output_dir_string=os.path.sep.join(["atmodat_checker_output","latest"])
output_path=os.path.sep.join([cwd, output_dir_string])
!cat {os.path.sep.join([output_path, "short_summary.txt"])}
```
%% Cell type:markdown id:99d2ba16-52c2-4cb6-b82b-226e75463aab tags:
### Results
The short summary contains information about versions, the timestamp of execution, the ratio of passed checks on attributes and errors written by the CF checker.
- cfchecks routine only issues a warning/information message if variable metadata are completely missing.
- Zero errors in the cfchecks routine does not necessarily mean that a data file is CF compliant!
We can also have a look into the detailled output including the exact error message in the *long_summary_* files which are subdivided into severe levels.
%% Cell type:code id:9600c713-1203-430b-a4a6-bf70ec441221 tags:
``` python
!cat {os.path.sep.join([output_path,"long_summary_recommended.csv"])}
```
%% Cell type:code id:b9fa72d6-6e5f-433a-81f0-40e4cd5a94cd tags:
``` python
!cat {os.path.sep.join([output_path,"long_summary_mandatory.csv"])}
```
%% Cell type:markdown id:b94a7c75-abc6-4792-aa5f-65467c6522de tags:
We can open the *.csv* files with `pandas` to further analyse the output.
%% Cell type:code id:f02ea2c4-7238-4afd-aef0-565aa5a5787f tags:
``` python
import pandas as pd
recommend_df=pd.read_csv(os.path.sep.join([output_path,"long_summary_recommended.csv"]))
recommend_df
```
%% Cell type:markdown id:6453b4ca-288e-4c49-8c93-da4524ef5792 tags:
There may be **missing** global attributes wich are recommended by the *atmodat standard*. We can find them with pandas:
%% Cell type:code id:f0a7e6db-f79a-448f-8046-bb4bf3bcef9d tags:
``` python
missing_recommend_atts=list(
recommend_df.loc[recommend_df["Error Message"]=="global attribute is not present"]["Global Attribute"]
)
missing_recommend_atts
```
%% Cell type:markdown id:06283c25-c5b6-450f-bfe9-d65e8fe26623 tags:
### Curation
Let's try first steps to *cure* the file by adding a missing attribute with `xarray`. We can open the file into an *xarray dataset* with:
%% Cell type:code id:b294cd89-d55c-421f-82e2-4cf42ece7d62 tags:
``` python
import xarray as xr
exp_file_ds=xr.open_dataset(exp_file)
exp_file_ds
```
%% Cell type:markdown id:f02bc09f-94dc-4e0f-b12f-9798549e90e8 tags:
We can **handle and add attributes** via the `dict`-type attribute `.attrs`. Applied on the dataset, it shows all *global attributes* of the file:
%% Cell type:code id:fc0ffe80-4288-4ac3-a599-3239f37f461d tags:
``` python
exp_file_ds.attrs
```
%% Cell type:markdown id:6f61190e-49bc-40da-8b33-30f3debd1895 tags:
We add all missing attributes and set a dummy value for them:
%% Cell type:code id:3fd18adf-fe43-4d47-b565-d082b80b970d tags:
``` python
for att in missing_recommend_atts:
exp_file_ds.attrs[att]="Dummy"
```
%% Cell type:markdown id:56e26094-0ad6-42a9-afaf-5c482ee8ca87 tags:
We save the modified dataset with the `to_netcdf` function:
%% Cell type:code id:8050d724-da0d-417a-992e-24bb5aae0c82 tags:
``` python
exp_file_ds.to_netcdf(exp_file+".modified.nc")
```
%% Cell type:markdown id:5794c6ce-fff2-4c6e-8c08-aaf5dd342f8d tags:
Now, lets run `run_checks` again.
We can also only provide a directory instead of a file as an argument with the option `-p`. The checker will find all `.nc` files inside that directory.
%% Cell type:code id:6c3698f7-62a4-4297-bfbf-d6447a0f006a tags:
``` python
!run_checks -p {cwd} -op {cwd} -s
```
%% Cell type:markdown id:c72647ee-7497-42df-ae68-f6a2d4ea87ad tags:
Using the *latest* directory, here is the new summary:
%% Cell type:code id:51d2eff6-2a31-47b7-a706-f2555e03b9c3 tags:
``` python
!cat {os.path.sep.join([output_path,"short_summary.txt"])}
```
%% Cell type:markdown id:1c9205ec-4f5f-4173-bb0d-1896785a9d04 tags:
You can see that the checks do not fail for the modified file when subtracting the earlier failes from the sum of new passed checks.
......
%% Cell type:markdown id: tags:
# Intake I part 2 - DKRZ catalog scheme, strategy and services
%% Cell type:markdown id: tags:
```{admonition} Overview
:class: dropdown
![Level](https://img.shields.io/badge/Level-Introductory-green.svg)
🎯 **objectives**: Learn what `intake-esm` ESM-collections DKRZ offer
⌛ **time_estimation**: "15min"
☑️ **requirements**: None
© **contributors**: k204210
⚖ **license**:
```
%% Cell type:markdown id: tags:
```{admonition} Agenda
:class: tip
In this part, you learn
1. [DKRZ intake-esm catalog schema](#examples)
1. [DKRZ intake-esm catalogs for project data](#examples)
1. [Catalog dependencies on different stores](#stores)
1. [Workflow at Levante for collecting and merging catalogs into main catalog](#workflow)
```
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
## DKRZ intake-esm catalog strategy and schema
DKRZ catalogs aim at using one common scheme for its attributes so that **combining** catalogs and working with multiple catalogs on the same time will be easy. In collaboration with NextGEMS scientists, we agreed on some attribute names that DKRZ intake-esm catalogs should be equipped with. The resulting scheme is named *cataloonies scheme*.
```{note}
The cataloonies scheme is not a standard for anything but it is evolving and will be adapted to use cases. It is mainly influenced by ICON output and the CMIP standard. If you have suggestions, please contact us.
```
- As a result, you will find **redundant** attributes in project catalogs which have the same meaning, e.g:
- source_id, model_id, model
- member_id, ensemble_member, simulation_id
- Which of these attributes are loaded into the python workflow can be set (see intake-1).
- You will find only **one version** for each *atomic dataset* in each catalog. This is the most recent one available in the store. An atomic dataset is found if unique values are set for all catalog attributes with one exception: it covers the entire time span of the simulation.
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
### The cataloonies scheme
ESM script developers, project scientists and data managers together defined some attribute names that DKRZ intake-esm catalogs should be equipped with. One benefit, originated by the composition of this working group, is that these attribute names can be used throughout the research data life cycle: At the earliest point, for model raw output, but also at the latest point, for data that is standardized and published.
The integration of intake-esm catalog generation into ESM run scripts is planned. That will enable the usage of intake and with it the easy usage and configuration of the python software stack for data processing from the beginning of data life.
Already existing catalogs will be provided with the newly defined attributes. For some, the values will fall back to *None* as there is no easy way to retrieve the values without looking into the asset which is technically not implementable when taking into account the amount of published project data. For CMIP6-like projects, we can take missing information from the *cmor-mip-tables* which represent the data standard of the project.
%% Cell type:code id: tags:hide-input
``` python
import pandas as pd
cataloonies_raw=[["#","Attribute name and column name","Examples","Description","Comments"],
[1,"project","DYAMON-WINTER","The project in which the data was produced",],
[2,"institution_id","MPIM-DWD-DKRZ","The institution that runs the simulations",],
[3,"source_id","ICON-SAP-5km","The Earth System Model which produced the simulations",],
[4,"experiment_id","DW-CPL / DW-ATM","The short term for the experiment that was run",],
[5,"simulation_id","dpp1234","The simulation/member/realization of the ensemble.",],
[6,"realm","atm / oce","The submodel of the ESM which produces the output.",],
[7,"frequency","PT1h or 1hr – Style","The frequency of the output","ICON uses ISO format"],
[8,"time_reduction","mean / inst / timmax /…","The method used for sampling and averaging along time. The same as the time part of cell_methods.",],
[9,"grid_label","gn","A clear description for the grid for distingusihing between native and regridded grids.",],
[10,"grid_id","","A specific identifier of the grid.","we might need more than one (e.g. horizontal + vertical)"],
[11,"variable_id","tas","The CMIP short term of the variable.",],
[12,"level_type","pressure_level, atmosphere_level","The vertical axis type used for the variable.",],
[13,"time_min",1800,"The minimal time value covered by the asset.",],
[14,"time_max",1900,"The maximal time value covered by the asset.",],
[15,"format","netcdf/zarr/…","The format of the asset.",],
[16,"uri","url,path-to-file","The uri used to open and load the asset.",],
[17,"(time_range)","start-end","Combination of time_min and time_max.",]]
pd.DataFrame(cataloonies_raw[1:],columns=cataloonies_raw[0])[cataloonies_raw[0][1:-1]]
```
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
## DKRZ intake-esm catalogs for community project data
%% Cell type:markdown id: tags:
### Jobs we do for you
- We **make all catalogs available**
- under `/pool/data/Catalogs/` for logged-in HPC users
- in the [cloud](https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/tree/master/esm-collections)
- We **create and update** the content of project's catalogs regularly by running scripts which are automatically executed and called _cronjobs_. We set the creation frequency so that the data of the project is updated sufficently quickly.
- The updated catalog __replaces__ the outdated one.
- The updated catalog is __uploaded__ to the DKRZ swift cloud
- We plan to provide a catalog that tracks data which is __removed__ by the update.
%% Cell type:markdown id: tags:
### The data bases of project catalogs
**Creation of the `.csv.gz` table :**
1. A file list is created based on a `find` shell command on the project directory in the data pool.
1. For the column values, filenames and pathes are parsed according to the project's `path_template` and `filename_template`. These templates need to be constructed with attribute values requested and required by the project.
- Filenames that cannot be parsed are sorted out
1. If more than one version is found for a dataset, only the most recent one is kept.
1. Depending on the project, additional columns can be created by adding project's specifications.
- E.g., for CMIP6, we added a `OpenDAP` column which allows users to access data from everywhere via `http`
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
By 2022, we offer you project data for
```{tabbed} CMIP6-like projects
- 💾 **projects** :
- [CMIP6](https://c6de.dkrz.de/)
- [PalMod2](https://www.palmod.de/)
- **Data location**:
- is hosted within the [cmip-data-pool](cmip-data-pool.dkrz.de) accessible for all dkrz users under `/pool/data/PROJECT`
- **Attributes**:
- the catalog's attributes are explained [here](https://goo.gl/v1drZl)
- `source_id` and `experiment_id` are the most important attributes. A unique `source_id` refers to one and only the one same model. Different institutions can use it but it is the same model. An `experiment_id` can be found only in *one activity*.
- values of `member_id` are rather arbitrary. Some member might never be published, others have been retracted. There is no gurantee in having *r1i1p1f1* availabe.
- **Variable definition**:
- a **unique variable** is a combination of `table_id` and `variable_id`. A variable can look different from one table to another. I.e., a variable can have different dimensions for monthly frequency than it has for daily frequency.
```
```{tabbed} CMIP5-like projects
- 💾 **projects** :
- CMIP5:
- [CORDEX](https://is-enes-data.github.io/cordex_archive_specifications.pdf)
- **Data location**:
- CMIP5:
- Only a small subset is still available on the pool's common and shared disk resource due to a lack of disk storage capacity. But most of the data have been archived and can be accessed via `jblob`.
- CORDEX:
- On disk, CORDEX data are disseminated across different storage projects
- **Attributes**:
- In comparison to CMIP6-like projects, such projects build on the older CMIP5 standard. Therefore, some attributes have different names.
- Regional model data includes additional attributes incomparison to CMIP5: `CORDEX_domain`, `driving_model_id` and `rcm_version_id`.
- **Variable definition**:
- A **unique variable** is a combination of `mip_table` and `variable`. A variable can look different from one table to another. I.e., a variable can have different dimensions for monthly frequency than it has for daily frequency.
```
```{tabbed} ESM-raw-output-near projects
- 💾 **projects** :
- [DYAMOND](https://easy.gems.dkrz.de/DYAMOND/index.html)
- [NextGEMS](https://easy.gems.dkrz.de/DYAMOND/NextGEMS/index.html)
- [ERA5](https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html):
- **Data location**:
- For disk projects, mostly linked under `/pool/data`
- **Attributes**:
- We try to use the cataloonies schema for catalogs of all other projects. For reanalysis products, it cannot be entirely fulfilled as the data is available in GRIB format.
```
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
## Catalog dependencies on different stores
DKRZ's catalog naming convention distinguishes between the different storage formats for as long as data access to stores like archive is either not possible or very different to disk access. The specialties of the storages are explained in the following.
%% Cell type:markdown id: tags:
```{tabbed} Disk
- **Way of data access**:
- `uri` contain *pathes* on levante's lustre filesystem
- **User requirements for data access**:
- users muste be **logged in** to levante to *access* the data (exception: opendap_url column, see [introduction]())
- **Provider requirements**:
- pathes of a valid catalog must be *readable for everyone*
```
```{tabbed} Cloud
- **Way of data access**:
- `uri` contain *links* to datasets in dkrz's swift cloud storage which can be opened with `xarray` and therefore `intake`
- If the asset's `format` is *zarr* (specified in the `format` column), use `zarr_kwargs` in the `to_dataset_dict()` function
- **User requirements for data access**:
- None. Users can access it from everywhere if internet connection is sufficient
- **Provider requirements**:
- links in a valid catalog must point at datasets in **open containers**
```
```{tabbed} Archive
- **Way of data access**:
- `uri` is empty i.e. no direct data access via intake is possible. If the catalog contains a `jblob_file` column, users can however download the data via *jblob* on levante (see next point).
- **User requirements for data access**:
- users muste be **logged in** to levante to *access* the data. After loading the module on levante via `module load jblob`, e.g. for CMIP5, an example command is `jblob --cmip5-file DSET` where dset is a value of `jblob_file`, e.g. `cmip5.output1.BCC.bcc-csm1-1.abrupt4xCO2.fx.atmos.fx.r0i0p0.v1.areacella.areacella_fx_bcc-csm1-1_abrupt4xCO2_r0i0p0.nc`
- **Provider requirements**:
- None
```
%% Cell type:markdown id: tags:
<a class="anchor" id="examples"></a>
## Preparing project catalogs for DKRZ's main catalog
1. Use attributes of existing catalogs and/or templates in `/pool/data/Catalogs/Templates` but at least `uri`, `format` and `project`.
1. Set permissions to *readable for everyone* for
- the data referenced in the catalog under `uri`
- the catalog itself
1. Use the naming convention for dkrz catalogs ( `dkrz_PROJECT_STORE` ) for your catalog
1. Link the catalog via `ln -s PATH-TO-YOUR-CATALOG /pool/data/Catalogs/Candidates/YOUR-CATALOG`
%% Cell type:markdown id: tags:
Your catalog then will be catched by a cronjob which
1. tests your catalog
- against the catalog naming convention
- open, search and load
- if for disk, are all `uri` values *readable*?
1. merges or creates your catalog
- if a catalog for the specified project exists in `/pool/data/Catalogs/`, they will be merged if possible. Entries of your catalog will be merged if they are no duplicates.
- else, your catalog will be written to `/work/ik1017/Catalogs` and a link will be set in `/pool/data/Catalogs/`
%% Cell type:markdown id: tags:
```{seealso}
This tutorial is part of a series on `intake`:
* [Part 1: Introduction](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/tutorial_intake-1-introduction.html)
* [Part 2: Modifying and subsetting catalogs](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/tutorial_intake-2-subset-catalogs.html)
* [Part 3: Merging catalogs](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/tutorial_intake-3-merge-catalogs.html)
* [Part 4: Use preprocessing and create derived variables](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/tutorial_intake-4-preprocessing-derived-variables.html)
* [Part 5: How to create an intake catalog](https://data-infrastructure-services.gitlab-pages.dkrz.de/tutorials-and-use-cases/tutorial_intake-5-create-esm-collection.html)
- You can also do another [CMIP6 tutorial](https://intake-esm.readthedocs.io/en/latest/user-guide/cmip6-tutorial.html) from the official intake page.
```
%% Cell type:markdown id: tags:
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
# Intake I - find, browse and access `intake-esm` collections
%% Cell type:markdown id: tags:
```{admonition} Overview
:class: dropdown
![Level](https://img.shields.io/badge/Level-Introductory-green.svg)
🎯 **objectives**: Learn how to use `intake` to find, browse and access `intake-esm` ESM-collections
⌛ **time_estimation**: "30min"
☑️ **requirements**: None
☑️ **requirements**: `intake_esm.__version__ >= 2021.8.17`
© **contributors**: k204210
⚖ **license**:
```
%% Cell type:markdown id: tags: