# ERA5 Tables [](https://opensource.org/licenses/MIT) [](https://www.python.org/) []() > A tool to convert ECMWF ERA5/ERA5Land tables into CMOR-compatible JSONs, using CMIP6 and Obs4MIPs as reference vocabularies. > > **Author**: Etor E. Lucio-Eceiza (lucio-eceiza@dkrz.de) > > **Institution**: Deutsches Klimarechenzentrum (DKRZ) --- This repository generates CMOR-compatible JSON tables for **ERA5** and **ERA5Land** based on the CSV configuration from [getera5](https://gitlab.dkrz.de/bm1159/cosodax/getera5/-/blob/era5.1/src/ct_ecmwf.rc?ref_type=heads) and the suggestion made by Angelika Heil (heil@dkrz.de), in charge of the [ERA5/ERA5Land data streams](https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html#the-era5-climate-reanalyses) at /pool/data. It prioritizes **Obs4MIPs** tables as a reference for variable definitions, then **CMIP6**, and finally falls back to the manually defined information in the CSV file when necessary. --- ## ๐ Features - ๐ง **Smart variable matching** from Obs4MIPs and CMIP6 tables - ๐งฑ **Flexible outputs**: JSON, CSV, and Excel - ๐งผ **Post-hoc harmonization** of variable metadata (e.g., comments) - ๐งพ **Clean, configurable CLI interface** --- ## ๐ Repository Structure - `src/era5_tables/converter.py` โ CSV โ CMOR JSON logic - `src/era5_tables/utils.py` โ CSV โ Excel conversions - `Tables/source_tables/` โ Contains git submodules for `obs4MIPs-cmor-tables` and `cmip6-cmor-tables` --- ## โ๏ธ Installation ```bash git clone --recursive https://gitlab.dkrz.de/bm1159/cosodax/era5-tables era5-tables cd era5-tables make install ``` This will create a conda environment `era5-tables` and install the tool locally. To update submodules later: ```bash git submodule update --init --recursive ``` ## ๐ CLI Tool Once installed, you can use the CLI from anywhere: ```bash era5-tables --help ``` ### ๐งพ Command line options ```bash usage: era5-tables [OPTIONS] Convert CMOR table data between CSV, Excel, and JSON formats. Uses the CMOR JSON schema, based on ['obs4MIPs-cmor-tables', 'cmip6-cmor-tables'], located in ./era5-tables/Tables/source_tables. (as submodules) Options: --csv-to-json Convert CSV to CMOR JSON files --csv-to-excel Convert CSV to Excel file --excel-to-csv Convert Excel file to CSV --csv <path> Path to input CSV file (default: ./Tables/original_tables/ct_ecmwf.rc) --excel <path> Path to Excel file (default: ./Tables/original_tables/ct_ecmwf.xlsx) --json <path> Output directory (default: ./Tables/era5-cmor-tables/Tables) --var <vars...> Variable(s) to include (e.g., tas tasmax tasmin) --freq <freqs...> Frequency(ies) to include (e.g., 1hr day mon fx) --ltype <levels...> Level type(s) to include (e.g., sfc ml pl) --clean Clean the output directory before writing -v, --verbose Print progress per variable -h, --help Show this help message and exit ``` ## ๐ง What Gets Written? For each variable/frequency/level combination: - A new entry is written into one of the JSON CMOR tables, e.g., `ERA5_day_sfc.json`, `ERA5Land_mon_sfc.json`. - Each entry includes the following keys: - `standard_name, long_name, units, dimensions` - `cell_methods, cell_measures, comment` - `out_name, positive, type` - Metadata: `source_table, table, mapping, orig_name`, etc. These are sourced from: 1 (Obs4MIPs) โ 2 (CMIP6) โ 3 (CSV config, fallback) An extra harmonization step ensures consistency across frequencies, especially for the comment field. Additionally, metadata for extra variables, not originally produced by ECMWF but interesting for the community at DKRZ, can given they are defined at the [`src/config.py`](https://gitlab.dkrz.de/bm1159/cosodax/era5-tables/-/blob/master/src/config.py?ref_type=heads#L100). ## ๐ Example Output File A file like ERA5_day_sfc.json will look like: ```json { "Header": { "Conventions": "CF-1.7 ODS-2.1", "table_id": "Table ERA5_day_sfc", ... }, "variable_entry": { "tas": { "standard_name": "air_temperature", "long_name": "Near-Surface Air Temperature", "units": "K", ... }, ... } } ``` Example output for `tas` at hourly resolution from [`ERA5Land_1hr_sfc.json`](https://gitlab.dkrz.de/bm1159/cosodax/era5-tables/-/blob/master/Tables/era5-cmor-tables/Tables/ERA5Land_1hr_sfc.json?ref_type=heads#L527): ```json "tas": { "cell_measures": "area: areacella", "cell_methods": "area: mean time: point", "comment": "near-surface (usually, 2 meter) air temperature", "dimensions": "longitude latitude time height2m", "frequency": "1hr", "long_name": "Near-Surface Air Temperature", "modeling_realm": "atmos", "ok_max_mean_abs": "", "ok_min_mean_abs": "", "out_name": "tas", "positive": "", "standard_name": "air_temperature", "type": "real", "units": "K", "valid_max": "", "valid_min": "", --- "grib_paramID": 128.0, "grib_code": 167.0, "orig_short_name": "2t", "orig_name": "2 metre temperature", "orig_units": "K", "grib_description": "This parameter is the temperature of air at 2m above the surface of land, sea or in-land waters. 2m temperature is calculated by interpolating between the lowest model level and the Earth's surface, taking account of the atmospheric conditions.[ See further information ](https://www.ecmwf.int/sites/default/files/elibrary/2016/17117-part-iv- physical-processes.pdf#subsection.3.10.3). This parameter has units of kelvin (K). Temperature measured in kelvin can be converted to degrees Celsius (\u00ac\u221eC) by subtracting 273.15. ", "orig_grid": "redGG-N320", "level_type": "sfc_an", "conversion": "1", "source_table": "obs4MIPs_A1hr.json", "table": "A1hr", "mapping": "obs4MIPs" }, ``` ## ๐ Observations: - The structure of the JSON files does not exactly follow that of e.g. Obs4MIPs as this tables are meant to be used by [`cmor-era5`](https://gitlab.dkrz.de/bm1159/cosodax/cmor-era5) to post-process and pseudo cmorize GRIB data at the DKRZ `/pool/data`. This means that e.g. a variable can appear in surface or model level with e.g. diff dimensions etc. Nevertheless, these tables can be used as starting point to harmonise with the proper CV. - Similarly, each variable contains many more key value pairs than those from Obs4MIPs, e.g. the ones under `---` in the aforementioned example are meant to be added in the produced netcdf files as global attributes. ### ๐ ๏ธ TODOs - Harmonize the parameters of certain variables that differ on their time frequencies. - Sort out the standard naming of certain variables (column `CFNAME` and `CFNAME_ANGELIKA`) at the csv file, e.g. [flux variables](https://swift.dkrz.de/v1/dkrz_d0859f7c-0da8-41c7-824d-58b3765f5ec4/general_share/fluxes_comparison.html).] - Sort out the CV of ERA5 based on OBS4MIPs, e.g. required global attributes: ```json { "required_global_attributes":[ "Conventions", # ---> from Header at each table, tbd via CF checker "activity_id", # ---> from metadata ERA5_CV, need to correct "contact", # ---> ?? "creation_date", # ---> of the metadata files? "data_specs_version", # ---> ?? "frequency", # ---> from metadata ERA5_CV "grid", # ---> from metadata ERA5_CV, need to correct "grid_label", # ---> from metadata ERA5_CV, need to correct "institution", # ---> from metadata ERA5_CV "institution_id", # ---> from metadata ERA5_CV, need to correct "license", # ---> from metadata ERA5_CV, need to correct "nominal_resolution", # ---> from metadata ERA5_CV, need to correct "product", # ---> from metadata ERA5_CV, need to correct "realm", # ---> from metadata ERA5_CV, need to correct "source_id", # ---> from metadata ERA5_CV, need to correct "table_id", # ---> from metadata ERA5_CV, need to correct "tracking_id", # ---> from metadata ERA5_CV, need to correct "variable_id", # ---> from metadata ERA5_CV, need to correct "variant_label" # ---> from metadata ERA5_CV, need to correct ] } ``` - Variable processing and CF checkers (in more advanced stages, maybe). ## ๐ License This project is licensed under the MIT License.