diff --git a/notebooks/demo/tutorial_intake-1-introduction.ipynb b/notebooks/demo/tutorial_intake-1-introduction.ipynb index 53ae65c12fd633deb0efb1aa35869e2551ebc03c..4ea3584f91f47b881efe7eb26d34d5f4a9f3f056 100644 --- a/notebooks/demo/tutorial_intake-1-introduction.ipynb +++ b/notebooks/demo/tutorial_intake-1-introduction.ipynb @@ -425,7 +425,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The data base is loaded into an underlying `panda`s dataframe which we can access with `col.df`. `col.df.head()` displays the first rows of the table:" + "The data base is loaded into an underlying `panda`s dataframe which we can access with `esm_col.df`. `esm_col.df.head()` displays the first rows of the table:" ] }, { @@ -639,12 +639,14 @@ "source": [ "### How to load more columns\n", "\n", - "If you work remotely away from the data, you can use the **opendap_url**'s to access the subset of interest for all files published at DKRZ. The opendap_url is an *additional* column that can also be loaded.\n", + "Intake allows to load only a subset of the columns that is inside the **intake-esm** catalog. Since the memory usage of **intake-esm** is high, the default columns are only a subset from all possible columns. Sometimes, other columns are of interest:\n", + "\n", + "If you work remotely away from the data, you can use the **opendap_url**'s to access the subset of interest for all files published at DKRZ. The *opendap_url* is an *additional* column that can also be loaded.\n", "\n", "We can define 3 different column name types for the usage of intake catalogs:\n", "\n", "1. **Default** attributes which are loaded from the main catalog and which can be seen via `_entries[CATNAME]._open_args`.\n", - "2. **Overall** attributes or **template** attributes which should be defined for **ALL** catalogs at DKRZ (exceptions excluded). At DKRZ, we use the newly defined **Cataloonie** scheme template which can be found via `dkrz_catalog.metadata[\"parameters\"][\"cataloonie_columns\"]`\n", + "2. **Overall** attributes or **template** attributes which should be defined for **ALL** catalogs at DKRZ (exceptions excluded). At DKRZ, we use the newly defined **Cataloonie** scheme template which can be found via `dkrz_catalog.metadata[\"parameters\"][\"cataloonie_columns\"]`. With these template attributes, there may be redundancy in the columns. They exist to simplify merging catalogs across projects.\n", "3. **Additional** attributes which are not necessary to identify a single asset but helpful for users. You can find these via\n", "\n", "`dkrz_catalog.metadata[\"parameters\"][\"additional_PROJECT_columns\"]`\n", @@ -670,13 +672,6 @@ "```" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There is a lot of redundancy in the columns. That is because they exist to be conform to other kind of standards. This will simplify merging catalogs across projects." - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -711,6 +706,14 @@ "esm_col=dkrz_catalog.dkrz_cmip6_disk(csv_kwargs=dict(usecols=cols))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- â The customization of catalog columns allows highest flexibility for intake users. \n", + "- â In theory, we could add many more columns with additional information because ot all have to be loaded from the data base." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -750,7 +753,7 @@ "query = dict(\n", " variable_id=\"tas\",\n", " table_id=\"Amon\",\n", - " source_id=\"MPI-ESM1-2-HR\",\n", + " source_id=\"MPI-ESM1-2-LR\",\n", " experiment_id=\"historical\")\n", "cat = esm_col.search(**query)\n", "cat" @@ -846,13 +849,31 @@ "- The `time_range` column was used to **concat** data along the `time` dimension\n", "- The `member_id` column was used to generate a new dimension\n", "\n", - "The underlying `dask` package will only load the data into memory if needed." + "The underlying `dask` package will only load the data into memory if needed. Note that attributes which disagree from file to file, e.g. *tracking_id*, are excluded from the dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "How **intake-esm** should open and aggregate the assets is configured in the *aggregation_control* part of the description:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(esm_col.esmcol_data[\"aggregation_control\"][\"aggregations\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Columns can be defined for appending or creating new dimensions. The *options* are keyword arguments for xarray.\n", + "\n", "They **keys** of the dictionary are made with column values defined in the *aggregation_control* of the **intake-esm** catalog. These will determine the **key_template**. The corresponding commands are:" ] }, @@ -906,7 +927,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Pangeo's data store\n", + "### Troubleshooting\n", + "\n", + "The variables are collected in **one** dataset. This requires that **the dimensions and coordinates must be the same over all files**. Otherwise, xarray cannot merge these together.\n", + "\n", + "For CMIP6, most of the variables collected in one **table_id** should be on the same dimensions and coordinates. Unfortunately, there are exceptions.: \n", + "\n", + "- a few variables are requested for *time slices* only. \n", + "- sometimes models use different dimension names from file to file\n", + "\n", + "Using the [preprocessing](https://tutorials.dkrz.de/tutorial_intake-4-preprocessing-derived-vars.html#use-preprocessing-when-opening-assets-and-creating-datasets) keyword argument can help to rename dimensions before merging.\n", + "\n", + "For Intake providers: the more information on the dimensions and coordinates provided already in the catalog, the better the aggregation control." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pangeo's data store\n", "\n", "Let's have a look into Pangeo's ESM Collection as well. This is accessible via cloud from everywhere - you only need internet to load data. We use the same `query` as in the example before." ] diff --git a/notebooks/demo/tutorial_intake-4-preprocessing-derived-vars.ipynb b/notebooks/demo/tutorial_intake-4-preprocessing-derived-vars.ipynb index 08292941504ee98a336a0cd3117bd531385659e8..2eaaf6c61a0ecccdb99565b9c8d5523f5c6d56eb 100644 --- a/notebooks/demo/tutorial_intake-4-preprocessing-derived-vars.ipynb +++ b/notebooks/demo/tutorial_intake-4-preprocessing-derived-vars.ipynb @@ -63,16 +63,6 @@ "esm_dkrz=dkrz_cdp.dkrz_cmip6_disk" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#levante uri to mistral uri:\n", - "esm_dkrz.df[\"uri\"]=esm_dkrz.df[\"uri\"].str.replace(\"lustre/\",\"lustre02/\")" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/notebooks/demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb b/notebooks/demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb index c0afbad436cdf45bd5fdc7041a7e0d4580aed12f..7b2301e4f19ef5fde8b5d741da8d5290f7c741e8 100644 --- a/notebooks/demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb +++ b/notebooks/demo/use-case_advanced_summer_days_intake_xarray_cmip6.ipynb @@ -170,8 +170,10 @@ "outputs": [], "source": [ "# Path to master catalog on the DKRZ server\n", - "col_url = \"https://dkrz.de/s/intake\"\n", - "parent_col=intake.open_catalog([col_url])\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)\n", "\n", "# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n", diff --git a/notebooks/demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb b/notebooks/demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb index a8c961d948e68ae3c60121e149dd0e1120102d45..57f302250794563af3faf92aea0a8b69d605f931 100644 --- a/notebooks/demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb +++ b/notebooks/demo/use-case_calculate-frost-days_intake-xarray_cmip6.ipynb @@ -93,8 +93,10 @@ "outputs": [], "source": [ "# Path to master catalog on the DKRZ server\n", - "col_url = \"https://dkrz.de/s/intake\"\n", - "parent_col=intake.open_catalog([col_url])\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)\n", "\n", "# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n", diff --git a/notebooks/demo/use-case_climate-extremes-indices_cdo.ipynb b/notebooks/demo/use-case_climate-extremes-indices_cdo.ipynb index fe45e2330d92f85dac7f3086baae27f4f8e9fb8c..1b50e5f7a98cc3a2307e3ea8bf8e64fd82fa8cc8 100755 --- a/notebooks/demo/use-case_climate-extremes-indices_cdo.ipynb +++ b/notebooks/demo/use-case_climate-extremes-indices_cdo.ipynb @@ -91,12 +91,14 @@ "source": [ "import intake\n", "# Path to master catalog on the DKRZ server\n", - "col_url = \"https://dkrz.de/s/intake\"\n", - "dkrz_catalog=intake.open_catalog([col_url])\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "dkrz_catalog=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(dkrz_catalog)\n", "\n", "# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n", - "cols=dkrz_catalog.metadata[\"parameters\"][\"cmip6_columns\"][\"default\"]+[\"opendap_url\"]\n", + "cols=dkrz_catalog._entries[\"dkrz_cmip6_disk\"]._open_args[\"csv_kwargs\"][\"usecols\"]+[\"opendap_url\"]\n", "col=dkrz_catalog.dkrz_cmip6_disk(csv_kwargs=dict(usecols=cols))" ] }, diff --git a/notebooks/demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb b/notebooks/demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb index 191c7896b3f49e1be64f7399df66265f6e9e55da..956cf98bc4f00c05126d87852535eba50951493b 100644 --- a/notebooks/demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb +++ b/notebooks/demo/use-case_convert-nc-to-tiff_rioxarray-xesmf_cmip.ipynb @@ -54,7 +54,11 @@ "metadata": {}, "outputs": [], "source": [ - "dkrz_catalog = intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "# Path to master catalog on the DKRZ server\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "dkrz_catalog=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "# Print DKRZ open catalogues\n", "list(dkrz_catalog)" ] diff --git a/notebooks/demo/use-case_ensemble-analysis_intake-xarray_cmip6.ipynb b/notebooks/demo/use-case_ensemble-analysis_intake-xarray_cmip6.ipynb index 259793fcacf0575011eaa8a5d13b9f89759e5079..c0d5abd52e99ba2fb2a46406c2edce43c8345b3a 100644 --- a/notebooks/demo/use-case_ensemble-analysis_intake-xarray_cmip6.ipynb +++ b/notebooks/demo/use-case_ensemble-analysis_intake-xarray_cmip6.ipynb @@ -119,8 +119,11 @@ "metadata": {}, "outputs": [], "source": [ - "col_url = \"https://dkrz.de/s/intake\"\n", - "parent_col=intake.open_catalog([col_url])\n", + "# Path to master catalog on the DKRZ server\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)" ] }, diff --git a/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb b/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb index 833c9134fbe97c2c893427783235e20211bb26dd..08b55cac6261aa0630f5e92b891f67a7b389d77a 100644 --- a/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb +++ b/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb @@ -96,8 +96,11 @@ "metadata": {}, "outputs": [], "source": [ - "col_url = \"https://dkrz.de/s/intake\"\n", - "parent_col=intake.open_catalog([col_url])\n", + "# Path to master catalog on the DKRZ server\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)" ] }, diff --git a/notebooks/demo/use-case_plot-unstructured_psyplot_cmip6.ipynb b/notebooks/demo/use-case_plot-unstructured_psyplot_cmip6.ipynb index e508cd7840d9b510a197c31c847dc63d1299f217..b83a0ccfcb63fd6a69c70c48dd69001e967f6359 100644 --- a/notebooks/demo/use-case_plot-unstructured_psyplot_cmip6.ipynb +++ b/notebooks/demo/use-case_plot-unstructured_psyplot_cmip6.ipynb @@ -56,8 +56,11 @@ "metadata": {}, "outputs": [], "source": [ - "col_url = \"https://dkrz.de/s/intake\"\n", - "parent_col=intake.open_catalog([col_url])\n", + "# Path to master catalog on the DKRZ server\n", + "#dkrz_catalog=intake.open_catalog([\"https://dkrz.de/s/intake\"])\n", + "#\n", + "#only for the web page we need to take the original link:\n", + "parent_col=intake.open_catalog([\"https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml\"])\n", "list(parent_col)" ] },