-
Etor Lucio Eceiza authoredEtor Lucio Eceiza authored
ERA5 Tables
A tool to convert ECMWF ERA5/ERA5Land tables into CMOR-compatible JSONs, using CMIP6 and Obs4MIPs as reference vocabularies.
Author: Etor E. Lucio-Eceiza (lucio-eceiza@dkrz.de)
Institution: Deutsches Klimarechenzentrum (DKRZ)
This repository generates CMOR-compatible JSON tables for ERA5 and ERA5Land based on the CSV configuration from getera5 and the suggestion made by Angelika Heil (heil@dkrz.de), in charge of the ERA5/ERA5Land data streams at /pool/data.
It prioritizes Obs4MIPs tables as a reference for variable definitions, then CMIP6, and finally falls back to the manually defined information in the CSV file when necessary.
🌍 Features
-
🧠 Smart variable matching from Obs4MIPs and CMIP6 tables -
🧱 Flexible outputs: JSON, CSV, and Excel -
🧼 Post-hoc harmonization of variable metadata (e.g., comments) -
🧾 Clean, configurable CLI interface
📁 Repository Structure
-
src/era5_tables/converter.py
– CSV → CMOR JSON logic -
src/era5_tables/utils.py
– CSV↔️ Excel conversions -
Tables/source_tables/
– Contains git submodules forobs4MIPs-cmor-tables
andcmip6-cmor-tables
⚙️ Installation
git clone --recursive https://gitlab.dkrz.de/bm1159/cosodax/era5-tables era5-tables
cd era5-tables
make install
This will create a conda environment era5-tables
and install the tool locally.
To update submodules later:
git submodule update --init --recursive
🚀 CLI Tool
Once installed, you can use the CLI from anywhere:
era5-tables --help
🧾 Command line options
usage: era5-tables [OPTIONS]
Convert CMOR table data between CSV, Excel, and JSON formats.
Uses the CMOR JSON schema, based on ['obs4MIPs-cmor-tables', 'cmip6-cmor-tables'],
located in ./era5-tables/Tables/source_tables. (as submodules)
Options:
--csv-to-json Convert CSV to CMOR JSON files
--csv-to-excel Convert CSV to Excel file
--excel-to-csv Convert Excel file to CSV
--csv <path> Path to input CSV file (default: ./Tables/original_tables/ct_ecmwf.rc)
--excel <path> Path to Excel file (default: ./Tables/original_tables/ct_ecmwf.xlsx)
--json <path> Output directory (default: ./Tables/era5-cmor-tables/Tables)
--var <vars...> Variable(s) to include (e.g., tas tasmax tasmin)
--freq <freqs...> Frequency(ies) to include (e.g., 1hr day mon fx)
--ltype <levels...> Level type(s) to include (e.g., sfc ml pl)
--clean Clean the output directory before writing
-v, --verbose Print progress per variable
-h, --help Show this help message and exit
🧠 What Gets Written?
For each variable/frequency/level combination:
- A new entry is written into one of the JSON CMOR tables, e.g.,
ERA5_day_sfc.json
,ERA5Land_mon_sfc.json
. - Each entry includes the following keys:
standard_name, long_name, units, dimensions
cell_methods, cell_measures, comment
out_name, positive, type
- Metadata:
source_table, table, mapping, orig_name
, etc.
These are sourced from: 1 (Obs4MIPs) → 2 (CMIP6) → 3 (CSV config, fallback)
An extra harmonization step ensures consistency across frequencies, especially for the comment field.
Additionally, metadata for extra variables, not originally produced by ECMWF but interesting for
the community at DKRZ, can given they are defined at the src/config.py
.
📝 Example Output File
A file like ERA5_day_sfc.json will look like:
{
"Header": {
"Conventions": "CF-1.7 ODS-2.1",
"table_id": "Table ERA5_day_sfc",
...
},
"variable_entry": {
"tas": {
"standard_name": "air_temperature",
"long_name": "Near-Surface Air Temperature",
"units": "K",
...
},
...
}
}
Example output for tas
at hourly resolution from ERA5Land_1hr_sfc.json
:
"tas": {
"cell_measures": "area: areacella",
"cell_methods": "area: mean time: point",
"comment": "near-surface (usually, 2 meter) air temperature",
"dimensions": "longitude latitude time height2m",
"frequency": "1hr",
"long_name": "Near-Surface Air Temperature",
"modeling_realm": "atmos",
"ok_max_mean_abs": "",
"ok_min_mean_abs": "",
"out_name": "tas",
"positive": "",
"standard_name": "air_temperature",
"type": "real",
"units": "K",
"valid_max": "",
"valid_min": "",
---
"grib_paramID": 128.0,
"grib_code": 167.0,
"orig_short_name": "2t",
"orig_name": "2 metre temperature",
"orig_units": "K",
"grib_description": "This parameter is the temperature of air at 2m above the surface of land, sea or in-land waters. 2m temperature is calculated by interpolating between the lowest model level and the Earth's surface, taking account of the atmospheric conditions.[ See further information ](https://www.ecmwf.int/sites/default/files/elibrary/2016/17117-part-iv- physical-processes.pdf#subsection.3.10.3). This parameter has units of kelvin (K). Temperature measured in kelvin can be converted to degrees Celsius (\u00ac\u221eC) by subtracting 273.15. ",
"orig_grid": "redGG-N320",
"level_type": "sfc_an",
"conversion": "1",
"source_table": "obs4MIPs_A1hr.json",
"table": "A1hr",
"mapping": "obs4MIPs"
},
🔍 Observations:
- The structure of the JSON files does not exactly follow that of e.g. Obs4MIPs as this
tables are meant to be used by
cmor-era5
to post-process and pseudo cmorize GRIB data at the DKRZ/pool/data
. This means that e.g. a variable can appear in surface or model level with e.g. diff dimensions etc. Nevertheless, these tables can be used as starting point to harmonise with the proper CV. - Similarly, each variable contains many more key value pairs than those from Obs4MIPs,
e.g. the ones under
---
in the aforementioned example are meant to be added in the produced netcdf files as global attributes.
🛠️ TODOs
- Harmonize the parameters of certain variables that differ on their time frequencies.
- Sort out the standard naming of certain variables (column
CFNAME
andCFNAME_ANGELIKA
) at the csv file, e.g. flux variables.] - Sort out the CV of ERA5 based on OBS4MIPs, e.g. required global attributes:
{ "required_global_attributes":[ "Conventions", # ---> from Header at each table, tbd via CF checker "activity_id", # ---> from metadata ERA5_CV, need to correct "contact", # ---> ?? "creation_date", # ---> of the metadata files? "data_specs_version", # ---> ?? "frequency", # ---> from metadata ERA5_CV "grid", # ---> from metadata ERA5_CV, need to correct "grid_label", # ---> from metadata ERA5_CV, need to correct "institution", # ---> from metadata ERA5_CV "institution_id", # ---> from metadata ERA5_CV, need to correct "license", # ---> from metadata ERA5_CV, need to correct "nominal_resolution", # ---> from metadata ERA5_CV, need to correct "product", # ---> from metadata ERA5_CV, need to correct "realm", # ---> from metadata ERA5_CV, need to correct "source_id", # ---> from metadata ERA5_CV, need to correct "table_id", # ---> from metadata ERA5_CV, need to correct "tracking_id", # ---> from metadata ERA5_CV, need to correct "variable_id", # ---> from metadata ERA5_CV, need to correct "variant_label" # ---> from metadata ERA5_CV, need to correct ] }
- Variable processing and CF checkers (in more advanced stages, maybe).
📄 License
This project is licensed under the MIT License.