Commit 1a548503 authored by Marco Kulüke's avatar Marco Kulüke
Browse files

Merge branch 'dev_maria' into 'master'

add cmip6 multimodel comparison notebook (is-enes3 demo)

See merge request mipdata/tutorials-and-use-cases!22
parents d4f98abc 26c8cd45
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run a multimodel comparison in a supercomputer holding the data pool\n",
"\n",
"### The goal\n",
"We will plot the global mean annual mean near-surface air temperature for some climate model results as an example of a data-near server-side analysis. \n",
"\n",
"### The data\n",
"We will use the simulations of the Couple Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/). We will concatenate historical data (that is, model results from 1850 to 2014 using the best estimates for anthropogenic and natural forcing) with the model results for the Shared Socioeconomic Pathway SSP2-4.5 based scenario (which corresponds to the growth in radiative forcing reached by 2100, in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent). CMIP6 comprises several kind of experiments with various simulation members, and we will use some of them. You can find more information in the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation).\n",
"\n",
"### The software\n",
"We will use Python 3 [Pandas](https://pandas.pydata.org/), the popular data analysis package focused on labelled tabular data, and [Xarray](http://xarray.pydata.org/), the Pandas generalization for n-dimensional arrays particularly tailored to working with NetCDF files, to process the data, together with the Climate Data Operators [cdo](https://code.mpimet.mpg.de/projects/cdo) package to concatenate the SSP2-4.5 data with their correspondent historical data. \n",
"\n",
"### Where this analysis will happen or what data-near server-side means\n",
"This Jupyter notebook is meant to run in supercomputer [Mistral](https://www.dkrz.de/systems/hpc/hlre-3-mistral?set_language=en&cl=en) at the German Climate Computing Center (DKRZ) within the DKRZ [Jupyterhub](<https://jupyterhub.dkrz.de>) server. See in this [video](https://youtu.be/f0wZX9i0uWQ) the main features of the DKRZ Jupterhub/lab and how to use it. The DKRZ software developers made that all the required packages to run this Jupyter notebook are available under the Python 3 unstable kernel (see the top right corner of this page, you can also choose the kernel in the Kernel tab above), it contains all the common packages for geoscience applications. The DKRZ data managers made that the data pool is directly accessible (no need to download or transfer data, that is what \"data-near\" means) because DKRZ is also a node of the Earth System Grid Federation [ESGF](https://esgf.llnl.gov/) hosting more than 4 petabytes of CMIP6 data. More info about the data pool [here](https://www.dkrz.de/up/services/data-management/cmip-data-pool). Direct access to the data pool is one of the main benefits of the server-side data-near computing we demonstrate in this notebook.\n",
"\n",
"The fist step is to [open an account in Mistral](https://www.dkrz.de/up/my-dkrz/getting-started/account/DKRZ-user-account). After that, you need to join a DKRZ project. For the group for IPCC related data analysis activities in the IPCC DDC Virtual Workspace and the IS-ENES3 related data analysis activities, as the [Analysis Platforms](https://portal.enes.org/data/data-metadata-service/analysis-platforms), the assigned project is bk1088. It means that the processing time (measured in CPU hours) and the storage (measured in gigabytes of memory) you use are allocated in that project.\n",
"\n",
"We follow the original idea and code developed by Dr. Sebastian Milinski from the Max Planck Institute for Meteorology (MPI-M), a chapter scientist for Working Group 1 (chapter 4) of the IPCC AR6. Thanks to the DKRZ software developer Ralf Mueller for porting the notebook to the DKRZ Jupyterhub and the DKRZ data scientist Dr. Maria Moreno for writing the tutorial. Contact her for questions and comments at moreno@dkrz.de."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Import the required python and cdo packages\n",
"\n",
"All we need is available in the Python 3.5.2 module, choose this kernel (right upper corner)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Python packages \n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from scipy.io import netcdf\n",
"import pandas as pd \n",
"import xarray as xr \n",
"import os\n",
"\n",
"# Import the python binder for the Climate Data Operators\n",
"import cdo\n",
"cdo = cdo.Cdo() \n",
"\n",
"# to render the figures in this notebook\n",
"%matplotlib inline "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data paths \n",
"\n",
"All available CMIP6 data at the DKRZ data pool are in: /work/ik1017/CMIP6/data/CMIP6\n",
"\n",
"Check the CMIP6 global availability [here](https://pcmdi.llnl.gov/CMIP6/ArchiveStatistics/esgf_data_holdings/). Use the web accessible search index (one can use the \"guest\" option) [here](http://cmip-esmvaltool.dkrz.de). If you are missing data at the DKRZ data pool, we can make a replica on demand, write to data-pool@dkrz.de. \n",
"\n",
"In order to accelerate the data load, we computed global mean time series for some CMIP6 model results in these folders:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"datdir_hist='/work/bk1088/cmip6_selection/CMIP6_GSAT/CMIP6/historical/all/'\n",
"datdir_ssp245='/work/bk1088/cmip6_selection/CMIP6_GSAT/CMIP6/ssp245/all/'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Identify the historical and scenario match\n",
"Note that some models don't have SSP2-4.5 simulations (yet). And for some models, the historical results might not yet be available on the data pool on Mistral. We will only plot time series for models where both the historical and SSP2-4.5 scenario are available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# for historical-------------------------------------------------------------\n",
"files_hist = os.listdir(datdir_hist)\n",
"files_hist.sort()\n",
"\n",
"# filter for r1\n",
"subs = '_r1i1'\n",
"\n",
"# using list comprehension to get string with substring \n",
"r1_files_hist = [model for model in files_hist if subs in model] \n",
"# print(r1_files_hist)\n",
"\n",
"# extract model names\n",
"models = [filename.split('_')[2] for filename in r1_files_hist]\n",
"# remove duplicate names\n",
"unique_models = list(dict.fromkeys(models))\n",
"\n",
"# identify highest version numer\n",
"modelversions_latest_hist = {}\n",
"for model in unique_models:\n",
" model_files_hist = [filename for filename in r1_files_hist if model in filename]\n",
"# print(model_files_hist)\n",
" versions = [filename.split('_')[7] for filename in model_files_hist]\n",
" versions.sort()\n",
" modelversions_latest_hist[model]=versions[-1]\n",
" \n",
"#print(modelversions_latest_hist)\n",
"\n",
"# for ssp245------------------------------------------------------------------\n",
"files_ssp245 = os.listdir(datdir_ssp245)\n",
"files_ssp245.sort()\n",
"\n",
"# filter for r1\n",
"subs = '_r1i1'\n",
"\n",
"# to get string with substring \n",
"r1_files_ssp245 = [model for model in files_ssp245 if subs in model] \n",
"\n",
"# extract model names\n",
"models = [filename.split('_')[2] for filename in r1_files_ssp245]\n",
"# remove duplicate names\n",
"unique_models = list(dict.fromkeys(models))\n",
"\n",
"# identify highest version numer\n",
"modelversions_latest_ssp245 = {}\n",
"for model in unique_models:\n",
" model_files_ssp245 = [filename for filename in r1_files_ssp245 if model in filename]\n",
" versions = [filename.split('_')[7] for filename in model_files_ssp245]\n",
" versions.sort()\n",
" modelversions_latest_ssp245[model]=versions[-1]\n",
"\n",
"#print(modelversions_latest_ssp245)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# generate list of models that have historical AND ssp245 runs\n",
"model_list = []\n",
"for key in modelversions_latest_hist:\n",
" if key in modelversions_latest_ssp245:\n",
" model_list.append(key)\n",
"\n",
"print(model_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load the data "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Historical\n",
"\n",
"# initialise array:\n",
"operator = '-yearmean -mergetime'\n",
"ds_hist = []\n",
"loop = 0\n",
"for model in list(model_list):\n",
" # find all files for latest version\n",
" model_files = [filename for filename in r1_files_hist if model in filename]\n",
" latest_model_files = [filename for filename in model_files if modelversions_latest_hist[model] in filename]\n",
" ifile = datdir_hist + '*' + model + '*' + modelversions_latest_hist[model] + '*'\n",
" model_timeseries = cdo.addc('0',input=operator+' '+ifile)\n",
" model_timeseries = xr.open_dataset(model_timeseries)['tas'].squeeze()\n",
" # add modelname as attribute\n",
" model_timeseries.attrs['model'] = model\n",
" # some models don't have a readable time axis (replacing with time series derived from first year and length of time axis)\n",
" firstyear = str(model_timeseries['time'][0].values)[0:4]\n",
" time_length = model_timeseries.size\n",
" model_timeseries.coords['time'] = pd.date_range(firstyear + '-01-01', periods=time_length, freq='A-JUL')\n",
" # some models have historical simulations beyond 2014\n",
" model_timeseries = model_timeseries.loc['1850-01-01':'2014-12-31']\n",
" # remove height coords if they exist. Causes problems with xr.concat\n",
" if 'height' in model_timeseries.coords:\n",
" del model_timeseries['height']\n",
"\n",
"\n",
" ds_hist.append(model_timeseries)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Scenario ssp245\n",
"\n",
"# initialise array:\n",
"operator = '-selyear,2015/2099 -yearmean -mergetime' # some models continue their scenario runs after 2099. Here, we only use output until 2099, which is available for all models.\n",
"ds_ssp245 = []\n",
"for model in list(model_list):\n",
" # find all files for latest version\n",
" model_files = [filename for filename in r1_files_ssp245 if model in filename]\n",
" latest_model_files = [filename for filename in model_files if modelversions_latest_ssp245[model] in filename]\n",
" ifile = datdir_ssp245 + '*' + model + '*' + modelversions_latest_ssp245[model] + '*'\n",
" model_timeseries = cdo.addc('0',input=operator+' '+ifile)\n",
" model_timeseries = xr.open_dataset(model_timeseries)['tas'].squeeze()\n",
" # add modelname as attribute\n",
" model_timeseries.attrs['model'] = model\n",
" # some models don't have a readable time axis (replacing with time series derived from first year and length of time axis)\n",
" model_timeseries.coords['time'] = pd.date_range('2015-01-01', periods=85, freq='A-JUL')\n",
" # remove height coords if they exist. Causes problems with xr.concat\n",
" if 'height' in model_timeseries.coords:\n",
" del model_timeseries['height']\n",
"\n",
" ds_ssp245.append(model_timeseries)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create new dataset with concatenated time series\n",
"# last year of historical (2014) is added as a first value to the scenario timeseries to avoid a white gap \n",
"# between 2014 and 2015 in the plot.\n",
"ds_merged_ssp245 = []\n",
"\n",
"for i in list(range(0,len(ds_hist))):\n",
" tas_array = np.concatenate((np.array([ds_hist[i].values[-1]]),ds_ssp245[i].values))\n",
" time_array = np.concatenate((np.array([ds_hist[i]['time'].values[-1]]),ds_ssp245[i]['time'].values))\n",
" timeseries = xr.DataArray(data=tas_array,coords=[time_array],dims='time')\n",
" if ds_hist[i].attrs['model'] is ds_ssp245[i].attrs['model']:\n",
" timeseries.attrs['model'] = ds_hist[i].attrs['model']\n",
" else:\n",
" print('model names do not match')\n",
" \n",
" ds_merged_ssp245.append(timeseries)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Plot the multimodel comparison of the global mean annual mean near-surface air temperature SSP2-4.5 scenario"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Plots path\n",
"plotdir = './plots/CMIP6_overview/'\n",
"if not os.path.exists(plotdir):\n",
" os.makedirs(plotdir)\n",
" \n",
"# IPCC colors\n",
"color_245=[155/255,135/255,12/255]\n",
"color_historical = [160/255,160/255,160/255]\n",
"\n",
"# plotting options (choose reference period to compute anomalies)\n",
"refperiod_start = 1995\n",
"refperiod_end = 2014"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"fig, axes = plt.subplots(1,1, figsize=(15,8))\n",
"plt.rcParams.update({'font.size': 15})\n",
"\n",
"# Compare the annual mean of the MPI-ESM1-2-LR and the IPSL-CM6A-LR model results, for example\n",
"i=0\n",
"for array in ds_merged_ssp245:\n",
" offset_hist=ds_hist[i].loc[str(refperiod_start)+'-01-01':str(refperiod_end)+'-12-31'].mean(dim='time')\n",
" if array.attrs['model'] == 'MPI-ESM1-2-LR':\n",
" axes.plot(ds_hist[i]['time'],ds_hist[i]-offset_hist, color='cyan',label='',zorder=10)\n",
" axes.plot(array['time'],array-offset_hist, color='cyan',label=array.attrs['model'],zorder=10) \n",
" i+=1\n",
" elif array.attrs['model'] == 'IPSL-CM6A-LR':\n",
" axes.plot(ds_hist[i]['time'],ds_hist[i]-offset_hist, color='blue',label='',zorder=10) \n",
" axes.plot(array['time'],array-offset_hist, color='blue',label=array.attrs['model'],zorder=10) \n",
" i+=1\n",
" else:\n",
" axes.plot(ds_hist[i]['time'],ds_hist[i]-offset_hist, color=color_historical,label='') \n",
" axes.plot(array['time'],array-offset_hist, color=color_245,label='') \n",
" i+=1\n",
"\n",
"# zeroline\n",
"x_pos1=np.datetime64('1850-01-01')\n",
"x_pos2=np.datetime64('2100-12-31')\n",
"axes.plot([x_pos1,x_pos2],[0,0], color='black',linewidth=1)\n",
"\n",
"legend = axes.legend(loc='upper left', shadow=True, fontsize='large',ncol=1,frameon=False)\n",
"axes.set_xlim([pd.Timestamp('1850-01-01'),pd.Timestamp('2100-01-01')])\n",
"axes.set_ylim([-2,5])\n",
"\n",
"plt.tick_params(\n",
" axis='x', # changes apply to the x-axis\n",
" which='both', # both major and minor ticks are affected\n",
" bottom=True, # ticks along the bottom edge are off\n",
" top=False, # ticks along the top edge are off\n",
" labelbottom=True) # labels along the bottom edge are off\n",
"\n",
"plt.tick_params(\n",
" axis='y', # changes apply to the x-axis\n",
" which='both', # both major and minor ticks are affected\n",
" left=True, # ticks along the bottom edge are off\n",
" right=False, # ticks along the top edge are off\n",
" labelleft=True) # labels along the bottom edge are off\n",
"\n",
"# introduce second y-axis (makes it easier to read off warming in 2100)\n",
"ax2 = axes.twinx()\n",
"mn, mx = axes.get_ylim()\n",
"ax2.set_ylim(mn, mx)\n",
"ax2.set_ylabel('')\n",
"\n",
"axes.set_ylabel('anomaly relative to '+str(refperiod_start)+'-'+str(refperiod_end)+' (K)')\n",
"axes.set_title('Global mean annual mean near-surface air temperature CMIP6 historical + SSP2-4.5')\n",
"plt.savefig(plotdir+'SSP2-4.5.pdf', bbox_extra_artists=(legend,), bbox_inches='tight', dpi=300)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See above that a .pdf with the figure is created in a folder in your Mistral home called \"/plots/CMPI6_overview\". You can download the plot to your local computer by writing \"scp b123456@mistral:/home/dkrz/b123456/plots/CMIP6_overview/SSP2-4.5.pdf . \" in your local folder, where the last \".\" means that the plot must be downloaded in the folder you are and \"b123456\" must be replaced by your actual user name."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 unstable (using the module python3/unstable)",
"language": "python",
"name": "python3_unstable"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment