Fabian Wachsmann
--- a/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb

+ 1

− 1
+++ b/notebooks/demo/use-case_global-yearly-mean-anomaly_xarray-hvplot_cmip6.ipynb

+ 1

− 1
 %% Cell type:markdown id: tags:

 # Create an interactive plot of the Global Yearly Mean Anomaly of a CMIP6 variable

 We will show how to combine, analyse and quickly plot data of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/). We will choose one variable of multiple experiments and compare the results of different models. In particular, we analyse the historical experiment in combination with one of the shared socioeconomic pathway (ssp) experiments.

 This Jupyter notebook is meant to run in the [Jupyterhub](https://jupyterhub.dkrz.de/hub/login?next=%2Fhub%2Fhome) server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/). The DKRZ hosts the CMIP data pool including 4 petabytes of CMIP6 data. Please, choose the Python 3 unstable kernel on the Kernel tab above, it contains all the common geoscience packages. See more information on how to run Jupyter notebooks at DKRZ [here](https://www.dkrz.de/up/systems/mistral/programming/jupyter-notebook).

 Running this Jupyter notebook in your premise, which is also known as [client-side](https://en.wikipedia.org/wiki/Client-side) computing, will require that you install the necessary packages and download data.

 %% Cell type:markdown id: tags:

 ### Learning Objectives

 - How to access a dataset from the DKRZ CMIP data pool with `intake-esm`
 - How to calculate global field means and yearly means with `xarray` and `numpy`
 - How to visualize the results with `hvplot`

 %% Cell type:code id: tags:

 ``` python
 import intake
 import pandas as pd
 import hvplot.pandas
 import numpy as np
 ```

 %% Cell type:markdown id: tags:

 First, we need to set the `variable_id` which we like to plot. This is a selection of the most often analysed variables:

 - `tas` is *Near-surface Air Temperature*
 - `pr` is *Precipitation*
 - `psl` is *Sea level pressure*
 - `tasmax` is *Near-surface Maximum Air Temperature*
 - `tasmin` is *Near-surface Minimum Air Temperature*
 - `clt` is *Total Cloud Cover Percentage*

 %% Cell type:markdown id: tags:

 Choose the variable:

 %% Cell type:code id: tags:

 ``` python
 # Choose one of
 # pr, psl, tas, tasmax, tasmin, clt
 variable_id = "tas"
 %store -r
 ```

 %% Cell type:code id: tags:

 ``` python
 # get formating done automatically according to style `black`
 #%load_ext lab_black
 ```

 %% Cell type:markdown id: tags:

 The `intake-esm` software reads *Catalogs* which we use to **find, access and load** the data we are interested in. Daily updated CMIP6 catalogs are provided in DKRZ's cloud [swift](https://swiftbrowser.dkrz.de/public/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/).

 Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset that you can access before loading the data. It means that thanks to the catalog, you can find out where the "book" is just by using some keywords and you do not need to hold it in your hand to know the number of pages.

 We specify the catalog descriptor for the intake package. The catalog descriptor is created by the DKRZ developers that manage the catalog, you do not need to care so much about it, knowing where it is and loading it is enough:

 %% Cell type:code id: tags:

 ``` python
 col_url = "https://swift.dkrz.de/v1/dkrz_a44962e3ba914c309a7421573a6949a6/intake-esm/dkrz_data-pool_cloudcatalog.yaml"
 parent_col=intake.open_catalog(col_url)
 list(parent_col)
 ```

 %% Cell type:code id: tags:

 ``` python
 col=parent_col["dkrz_cmip6_disk_netcdf_fromcloud"]
 ```

 %% Cell type:markdown id: tags:

 ### Browsing through the catalog

 %% Cell type:markdown id: tags:

 We define a query and specify *keyvalues* for search facets in order to search the catalogs. Possible **Search facets** are all columns of the table identified by its name.

 In this example, we compare the MPI-ESM1-2-HR model of the Max-Planck-Institute and the AWI-CM-1-1-MR from the Alfred Wegner Institute for 3 different experiments. CMIP6 comprises many experiments with lots of simulation members and we will use some of them. You can find more information in the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation).

 We will concatenate historical experiment with two different Shared Socioeconomic Pathway (SSPs) scenarios. The historical experiment uses best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014. SSPs are scenarios of projected socioeconomic global changes.
 - *historical*

    This experiments usese the best estimates for anthropogenic and natural forcing for simulating the historical period 1850-2014.
 - *ssp245*

    The 45 corresponds to the growth in radiative forcing reached by 2100, in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent
 - *ssp585*

    The 85 corresponds to the growth in radiative forcing reached by 2100, in this case, 8.5 W/m2

 %% Cell type:code id: tags:

 ``` python
 query = dict(
    variable_id=variable_id,
    table_id="Amon",
    experiment_id=["historical", "ssp585"], # we have excluded "ssp245" from the list because it would take 15min to finish the nb
    source_id=["MPI-ESM1-2-HR", "AWI-CM-1-1-MR"],
 )
 cat = col.search(**query)
 ```

 %% Cell type:markdown id: tags:

 Let's have a look into the new catalog subset `cat`. We use the underlaying `pandas` dataframe object `df` to display the catalog as a table. Each row refers to one file.

 %% Cell type:code id: tags:

 ``` python
 cat.df
 ```

 %% Cell type:markdown id: tags:

 ### Loading the data

 We can load the data into memory with only one code line. The catalog's `to_dataset_dict` command will aggregate and combine the data from files into comprehending `xarray` datasets using the specifications from the intake descriptor file. The result is a `dict`-type object where keys are the highest granularity which cannot be combined or aggregated anymore and values are the datasets.

 %% Cell type:code id: tags:

 ``` python
-xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":6}})
+xr_dset_dict = cat.to_dataset_dict(cdf_kwargs={"chunks":{"time":1}})
 print(xr_dset_dict.keys())
 ```

 %% Cell type:code id: tags:

 ``` python
 xr_dset_dict['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Amon.gn']
 ```

 %% Cell type:markdown id: tags:

 ### Global yearly mean calculation

 We define a function for calculation the global mean by weighting grid boxes according to their surface area. Afterwards, we groupby years and calculate the yearly mean. This all is done by using `xarray`.

 %% Cell type:code id: tags:

 ``` python
 def global_yearly_mean(hist_dsets):
    # Get weights
    weights = np.cos(np.deg2rad(hist_dsets.lat))
    # Tas weighted
    variable_array = hist_dsets.get(variable_id)
    variable_weights = variable_array.weighted(weights)
    # Tas global mean:
    variable_globalmean = variable_weights.mean(("lon", "lat"))
    # Tas yearly mean:
    variable_gmym = variable_globalmean.groupby("time.year").mean("time")
    return variable_gmym.values
 ```

 %% Cell type:markdown id: tags:

 ### Historical reference period

 We define the period from `1851-1880` as our reference period. In the following, we calculate future simulation anomalies from that period. But first things first, we need the global yearly mean for that period:

 %% Cell type:code id: tags:

 ``` python
 historical = [key for key in xr_dset_dict.keys() if "historical" in key][0]
 dshist = xr_dset_dict[historical]
 dshist_ref = dshist.sel(time=dshist.time.dt.year.isin(range(1851, 1881)))
 # 10member
 var_ref = global_yearly_mean(dshist_ref)
 var_refmean = var_ref.mean()
 ```

 %% Cell type:markdown id: tags:

 ### Get Meta Data

 In order to label the plot correctly, we retrieve the attributes `long_name` and `units`from the chosen variable.

 %% Cell type:code id: tags:

 ``` python
 lname = dshist.get(variable_id).attrs["long_name"]
 units = dshist.get(variable_id).attrs["units"]
 label = "Delta " + lname + "[" + units + "]"
 ```

 %% Cell type:markdown id: tags:

 ### Calculate Anomaly

 1. We save the result - the anomaly values - in a `panda`s dataframe `var_global_yearly_mean_anomaly`. We use this dataframe object because it features the plot function `hvplot` which we would like to use. We start by creating this dataframe based on the datasets which we got from intake.

 %% Cell type:code id: tags:

 ``` python
 lxr = list(xr_dset_dict.keys())
 columns = [".".join(elem.split(".")[1:4]) for elem in lxr]
 print(columns)
 var_global_yearly_mean_anomaly = pd.DataFrame(index=range(1850, 2101), columns=columns)
 ```

 %% Cell type:markdown id: tags:

 2. For all datasets in our dictionary, we calculate the anomaly by substracting the the global mean of the reference period from the global yearly mean.
 3. We add the results to the dataframe. Only years that are in the dataset can be filled into the dataframe.

 %% Cell type:code id: tags:

 ``` python
 for key in xr_dset_dict.keys():
    print([".".join(key.split(".")[1:4])])
    datatoappend = global_yearly_mean(xr_dset_dict[key])[0, :] - var_refmean
    years = list(xr_dset_dict[key].get(variable_id).groupby("time.year").groups.keys())
    var_global_yearly_mean_anomaly.loc[
        years, ".".join(key.split(".")[1:4])
    ] = datatoappend
 ```

 %% Cell type:markdown id: tags:

 ### Plotting the multimodel comparison of the global annual mean anomaly

 %% Cell type:code id: tags:

 ``` python
 plot = var_global_yearly_mean_anomaly.hvplot.line(
    xlabel="Year",
    ylabel=label,
    value_label=label,
    legend="top_left",
    title="Global and yearly mean anomaly in comparison to 1851-1880",
    grid=True,
    height=600,
    width=820,
 )
 ```

 %% Cell type:code id: tags:

 ``` python
 hvplot.save(plot, "globalmean-yearlymean-tas.html")
 ```

 %% Cell type:code id: tags:

 ``` python
 plot
 ```

 %% Cell type:markdown id: tags:

 ### Used data

 - https://doi.org/10.22033/ESGF/CMIP6.6594
 - https://doi.org/10.22033/ESGF/CMIP6.2450
 - https://doi.org/10.22033/ESGF/CMIP6.1869
 - https://doi.org/10.22033/ESGF/CMIP6.2686
 - https://doi.org/10.22033/ESGF/CMIP6.2800
 - https://doi.org/10.22033/ESGF/CMIP6.2817

 %% Cell type:markdown id: tags:

 We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”

 %% Cell type:code id: tags:

 ``` python
 ```