Intake-esm support from DKRZ Data Management
This repo contains material for preparing, scheduling and archiving intake-esm catalogs.
Cronjobs running for generating intake functionalities include
- updating catalogs for the cmip data pool of dkrz (
builder/dkrz_PROJECT_STORE.py
) - testing catalogs (
test/check_load_catalog_PROJECT.py
) - hosting and archiving catalogs at /pool/data/catalogs and in the cloud (
archive-catalog.sh
) - creating statistics for catalogs including kpis like no. of files and datasets
One main catalog collects all catalogs in /pool/data/catalogs and serves as the entry point for dkrz's intake users.
environment.yml
Use that file with conda env create -f environment.yml
to generate a software environment which allows you to use the notebooks wihtin this repository.
esm-collections/
All esm-collections available at DKRZ are saved within this folder. Those are .json
files which can be opend with intake.open_esm_datastore()
.
builder/
This folder contains scripts for generating the catalog data bases (.csv.gz
).
tests/
These scripts tests the newly generated catalogs. If the tests are successfull, the old catalogs are archived in an archive/ directory for documentation of update processes.
If you cannot find files in recent catalogs, check if they are retracted by searching for it in the esgf browser interface with enabled 'show all versions'.
If you want to report anything, please create an issue wihtin this repo.