1. [browsing through the ERA5 collection](#browse)
1. [how to load ERA5 data with intake-esm](#access)
```
%% Cell type:markdown id: tags:
<aclass="anchor"id="intro"></a>
## ERA5, its features and use cases
ERA ('ECMWF Re-Analysis') refers to a series of climate reanalysis datasets produced at the [European Centre for Medium-Range Weather Forecasts](http://www.ecmwf.int). Climate reanalyses combine observations with models to generatÏe consistent time series of multiple climate variables. [ERA5 (ERA fifth generation)](https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5) is the latest climate reanalysis which is produced by Copernicus Climate Change Service (C3S) at ECMWF. It replaces ERA-Interim and other [predecessor ERA datasets](https://confluence.ecmwf.int/display/CKB/The+family+of+ERA5+datasets?src=contextnavpagetreemode) such as, e.g., ERA-40, ERA-15 and ERA-20C.
Contracted by the [German Meteorological Service](https://www.dwd.de/DE/Home/home_node.html), the World Data Centre for Climate (WDCC) at DKRZ is the German distributor of a [selection of these data](https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html).
> ERA5 is a global comprehensive reanalysis, from 1979 to near real time. The period 1959 to 1979 was only recently released and is currently being transferred to DKRZ.
%% Cell type:markdown id: tags:
### Features
- Spatial resolution is about **31 km** globally
- Dependent on the parameter, the data are stored on a **reduced Gaussian Grid (N320)**<br> or as **spectral coefficients** (with a triangular truncation of **T639**)
- Provided on 137/37 different **model/pressure** levels
- Temporal coverage from **1979 up to today** (1959-1979 newly released)
- Temporal resolution from hourly, daily to monthly
### Use cases
ERA5 data have a broad range of applications, some of which are
- forcing of (regional) climate models,
- evaluation of climate models with reanalysis,
- comparison of weather observations to data of other scientific fields.
%% Cell type:markdown id: tags:
### Further information
-[General ERA5 data documentation](https://confluence.ecmwf.int/display/CKB/ERA5:+data+documentation)
-[List of parameters/codes/definitions from the parameter database by code/table numbers](https://apps.ecmwf.int/codes/grib/param-db)
-[List of params/codes/defs from the parameter DB by parameter types, incl explanations](https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings)
-[Conversion table for accumulated variables (total precipitation/fluxes)](https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790)
-[ERA5 data in DKRZ's /pool/data](https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html)
Please mail to data [at] dkrz [dot] de and visit the [DKRZ Webpage](https://www.dkrz.de/up/de-services/de-data-management/de-projects_cooperations/de-era/de-era)
%% Cell type:markdown id: tags:
<aclass="anchor"id="find"></a>
## Find and open the collection
First of all, we need to import the required packages
%% Cell type:code id: tags:
``` python
importintake
```
%% Cell type:markdown id: tags:
We use intake to open the main catalog which includes all project catalogs and sub catalogs.
`intake`**opens** catalogs for data sources given in `yaml` format. These contain information about plugins and sources required for accessing and loading the data. The command is `open_catalog`:
Use `print` and `list` to find out what the catalog contains:
%% Cell type:code id: tags:
``` python
list(dkrz_catalog)
```
%% Output
['dkrz_cmip5_archive',
'dkrz_cmip5_disk',
'dkrz_cmip6_cloud',
'dkrz_cmip6_disk',
'dkrz_cordex_disk',
'dkrz_dyamond-winter_disk',
'dkrz_era5_disk',
'dkrz_nextgems_disk',
'dkrz_palmod2_disk']
%% Cell type:markdown id: tags:
We now focus on the ERA5 collection
%% Cell type:code id: tags:
``` python
col=dkrz_catalog.dkrz_era5_disk
```
%% Output
/sw/spack-levante/mambaforge-4.11.0-0-Linux-x86_64-sobz6z/lib/python3.9/site-packages/intake_esm/utils.py:96: DtypeWarning: Columns (13,14) have mixed types. Specify dtype option on import or set low_memory=False.
The variable `col` now contains the intake collection that links to DKRZ's /pool/data ERA5 database.
%% Cell type:code id: tags:
``` python
col.description
```
%% Output
"This is an ESM collection for ERA5 data accessible on the DKRZ's disk storage system in /work/bk1099/data/"
%% Cell type:markdown id: tags:
Now, we print the variable `col` to see information on the data assets properties and associated metadata (e.g. which institution the data come from).
%% Cell type:code id: tags:
``` python
col
```
%% Output
%% Cell type:markdown id: tags:
The ERA5 catalog consists of 16 datasets from about 550k assets/files.
%% Cell type:markdown id: tags:
<aclass="anchor"id="browse"></a>
## ERA5 collection's facets
The **ERA5 Catalog** enables to browse through the data base using **10 search facets**. We could group them into 4 categories:
*Basic* data information:
-`era_id`: Today, only E5 is available.
-`dataType`: Two data types are available: **An**alysis data are *pure* analysis and only contain intensive data (like temperature). **F**ore**c**ast data contain extensive data (like precipitation) which are accumulated quantities.
-`uri`: Corresponds to the path on DKRZ's HPC file system.
%% Cell type:markdown id: tags:
Information on the *type of horizontal level*:
-`level_type`: Three types are available: **model_level**, **pressure_level** or **surface**
*Temporal* information. The ERA5 database starts in January 1979 (the years until 1959 are currently being added).
-`stepType`: Is the variable accumulated, instantaneous or averaged?
-`frequency`: What is the temporal resolution of the data? The database contains hourly, daily and monthly data.
-`validation_date`: The date when the analysis is valid.
-`initialization_date`: The date when the forecast started.
*Variable* identifier (redundant) and attributes:
-`code` : Corresponds to the GRIB code of the variable in the file.
-`table_id` : Specifies which GRIB code table associated with the Grib code.
%% Cell type:markdown id: tags:
If you require more information on the variables, the catalog can be loaded with more columns. You can find out additional era5 attributes from the main catalog via:
/sw/spack-levante/mambaforge-4.11.0-0-Linux-x86_64-sobz6z/lib/python3.9/site-packages/intake_esm/utils.py:96: DtypeWarning: Columns (13,14) have mixed types. Specify dtype option on import or set low_memory=False.
We can **search** through the intake collection by using its `search` function. E.g., we can search for ERA5 data on *pressure_level* in *hourly* frequency by:
%% Cell type:code id: tags:
``` python
query=dict(level_type="pressure_level",
frequency="hourly")
cat=col.search(**query)
```
%% Cell type:markdown id: tags:
The variable `cat` is a new *sub*-catalog i.e. a subset of the original catalog.<br>To see the variables contained in this sub-catalog, we print what unique variable *long names* exists :
%% Cell type:code id: tags:
``` python
cat.unique("long_name")
```
%% Output
{'long_name': {'count': 16,
'values': ['Vorticity (relative)',
'V component of wind',
'Ozone mass mixing ratio',
'U component of wind',
'Relative humidity',
'Specific humidity',
'Fraction of cloud cover',
'Specific rain water content',
'Specific cloud ice water content',
'Potential vorticity',
'Specific cloud liquid water content',
'Temperature',
'Divergence',
'Vertical velocity',
'Geopotential',
'Specific snow water content']}}
%% Cell type:markdown id: tags:
We can select a specific variable by another `search`, e.g. for *Temperature*.<br>We can also subset the temporal coverage that we are interested in. intake allows using **wildcards** in the search.<br>In the sub-catalog of hourly pressure level data, we can search e.g. for temperature data that are valid for January 1980 using:
We can open the *entire* selection at once with `to_dataset_dict`. The result will be a `dict`ionary of `xarray` datasets.
For this, we have to specify a configuration for `xarray` via the `cdf_kwargs` argument:
```python
cdf_kwargs={"engine":"cfgrib",
"chunks":{
"time":1
}
}
```
While the *engine* indicates what *backend*`xarray` has to use to open the files (*here: cfgrib since the ERA5 data are stored in GRIB format*), we specify `chunks` so that `dask` is used for array handling. This approach **saves memory** and returns *futures* of arrays which are only computed and loaded if needed.<br>This may take a while. We can ignore warnings printed by the underlying `cfgrib` library.
<br>The dictionary *temp_hourly_pl_xr_dict* has exactly one entry because *all files* of the sub-catalog temp_hourly_pl have been merged along the time axis. The default configurations that control operations on the sub-catalog can be parsed as follows:
<br>Now, let's get our dataset and have a look. We extract the last (and only) entry from `temp_hourly_pl_xr_dict` using the `popitem` method. `popitem` returns a tuple of size 2. The first tuple (index 0) contains the key '128.0.instant.pressure_level.hourly', the second tuple (index 1) contains the dataset:
t (values) float32 236.8 236.7 236.6 ... 237.5 237.4 237.4
%% Cell type:markdown id: tags:
Plotting the data with the `plot` function shows the zonal gradient of 500 hPa temperature (in K) in January 1980. The x-axis is a proxy for the latitude (North->South direction). The figure reflects mid-atmosphere temperature (500 hPa) strongly increases from the poles towards the Equator.