Commit 27983fa0 authored by Marco Kulüke's avatar Marco Kulüke
Browse files

Merge branch 'dev_maria' into 'master'

add the .copy() method and some rewording

See merge request mipdata/tutorials-and-use-cases!2
parents d518c3cf 31bb82b9
......@@ -4,12 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Calculate a climate index in a server hosting all the climate model data: \n",
"## run faster and without data transfer\n",
"# Calculate a climate index in a server hosting all the climate model data \n",
"\n",
"We will show here how to count the annual summer days for a particular location of your choice using the results of a climate model, in particular, the historical and shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/).\n",
"We will show here how to count the annual summer days for a particular geolocation of your choice using the results of a climate model, in particular, the historical and shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/).\n",
"\n",
"This Jupyter notebook runs in the Jupyterhub server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/) which is an [ESGF](https://esgf.llnl.gov/) repository that hosts and maintains more than 3 Petabytes of CMIP6 data. Please, choose the ... kernel on the right uper corner of this notebook.\n",
"This Jupyter notebook runs in the Jupyterhub server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/) which is an [ESGF](https://esgf.llnl.gov/) repository that hosts and maintains 4 Petabytes of CMIP6 data. Please, choose the ... kernel on the right uper corner of this notebook.\n",
"\n",
"Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook."
]
......@@ -19,8 +18,8 @@
"metadata": {},
"source": [
"In this Use Case you will learn the following:\n",
"- How to access a data set from the DKRZ CMIP6 model data archive\n",
"- How to count the annual number of summer days for a particular location using this model data set\n",
"- How to access a dataset from the DKRZ CMIP6 model data archive\n",
"- How to count the annual number of summer days for a particular geolocation using this model dataset\n",
"- How to visualize the results\n",
"\n",
"\\\n",
......@@ -44,20 +43,20 @@
"outputs": [],
"source": [
"import intake # a general interface for loading data from an existing catalog\n",
"import folium # visualization tool\n",
"#import folium # visualization tool\n",
"import xarray as xr # handling labelled multi-dimensional arrays\n",
"from ipywidgets import widgets # to use widgets in the Jupyer Notebook\n",
"from geopy.geocoders import Nominatim # Python client for several popular geocoding web services\n",
"#from geopy.geocoders import Nominatim # Python client for several popular geocoding web services\n",
"import numpy as np # fundamental package for scientific computing\n",
"import pandas as pd # data analysis and manipulation tool\n",
"import hvplot.pandas # visualization tool"
"#import hvplot.pandas # visualization tool"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Which data set do we need? -> Choose Shared Socioeconomic Pathway, Place, and Year\n",
"## 1. Which dataset do we need? -> Choose Shared Socioeconomic Pathway, Place, and Year\n",
"\n",
"<a id='selection'></a>"
]
......@@ -135,7 +134,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We have defined the place and time. Now, we can search for the climate model data set."
"We have defined the place and time. Now, we can search for the climate model dataset."
]
},
{
......@@ -143,7 +142,7 @@
"metadata": {},
"source": [
"## 2. Intake Catalog\n",
"Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each data set (the title, author, and number of pages of the book, for instance) that you can access before loading the data (so thanks to the catalog, you do not need to open the book to know the number of pages of the book, for instance).\n",
"Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset (the title, author, and number of pages of the book, for instance) that you can access before loading the data (so thanks to the catalog, you do not need to open the book to know the number of pages of the book, for instance).\n",
"\n",
"### 2.1 Load the Intake Catalog\n",
"We load the catalog descriptor with the intake package. The catalog is updated daily."
......@@ -173,7 +172,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This catalog contains all data sets of the CMIP6 archive at DKRZ. In the next step we narrow the results down by chosing a model and variable."
"This catalog contains all datasets of the CMIP6 archive at DKRZ. In the next step we narrow the results down by chosing a model and variable."
]
},
{
......@@ -192,18 +191,22 @@
"metadata": {},
"outputs": [],
"source": [
"climate_model = \"MPI-ESM1-2-LR\"\n",
"# store the name of the model we chose in a variable named \"climate_model\"\n",
"climate_model = \"MPI-ESM1-2-LR\" # here we choose Max-Plack Institute's Earth Sytem Model in high resolution\n",
"\n",
"# this is how we tell intake what data we want\n",
"query = dict(\n",
" source_id=climate_model, # here we choose Max-Plack Institute's Earth Sytem Model in high resolution\n",
" variable_id=\"tasmax\", # temperature at surface, maximum\n",
" table_id=\"day\", # daily maximum\n",
" experiment_id=experiment_box.label, # historical Simulation, 1850-2014\n",
" member_id=\"r10i1p1f1\", # \"r\" realization, \"i\" initialization, \"p\" physics, \"f\" forcing\n",
" source_id = climate_model, # the model \n",
" variable_id = \"tasmax\", # temperature at surface, maximum\n",
" table_id = \"day\", # daily maximum\n",
" experiment_id = experiment_box.label, # what we selected in the drop down menu,for instance, historical 850-2014\n",
" member_id = \"r10i1p1f1\", # \"r\" realization, \"i\" initialization, \"p\" physics, \"f\" forcing\n",
")\n",
"\n",
"# intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ\n",
"cat = col.search(**query)\n",
"\n",
"# Show query\n",
"# show query results\n",
"cat.df"
]
},
......@@ -211,16 +214,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see our query results. This is like the list of results you get when you search for keywords with a search engine. In the next section we will find the data set which contains our selected year."
"The result of the query are like the list of results you get when you search for articles in the internet by writing keywords in your search engine (duck duck go, ecosia, google,...). Thanks to intake, we did not need to know the path of each dataset, just selecting some keywords (the model name, the variable,...) was enough to obtain the results. If advance users are still interested in the location of the data inside the DKRZ archive, intake also provides the path and the OpenDAP URL (see the last columns above). Now we will find which file in the dataset contains our selected year so in the next section we can just load that specific file."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Load the model data\n",
"\n",
"### 3.1 Find Data Set Which Contains the Year You Selected in Drop Down Menu Above"
"### 2.3 Find the Dataset That Contains the Year You Selected in Drop Down Menu Above"
]
},
{
......@@ -229,16 +230,25 @@
"metadata": {},
"outputs": [],
"source": [
"# copying dataframe \n",
"# TO DO: check if .copy() is better\n",
"# https://stackoverflow.com/questions/47972633/in-pandas-does-iloc-method-give-a-copy-or-view/47972710#47972710\n",
"# https://stackoverflow.com/questions/48173980/pandas-knowing-when-an-operation-affects-the-original-dataframe\n",
"\n",
"query_result_df = cat.df.copy() \n",
"# copying the cat.df dataframe to a new dataframe, thus further modifications do not affect the original cat.df \n",
"query_result_df = cat.df.copy() # new dataframe to play with\n",
"\n",
"# each dataset contains many files, extract the initial and final year of each file \n",
"query_result_df[\"start_year\"] = query_result_df[\"time_range\"].str[0:4].astype(int) # add column with start year\n",
"query_result_df[\"end_year\"] = query_result_df[\"time_range\"].str[9:13].astype(int) # add column with end year\n",
"\n",
"# delete the time range column\n",
"query_result_df.drop(columns=[\"time_range\"], inplace = True) # if \"inplace\" is False, .drop() creates a new df\n",
"\n",
"query_result_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a column labelling the year selection as True or False\n",
"# TO DO: is there a non boolean way that is better? smth like query_result_df[query_result_df['star_year'] == year_box_value]?\n",
"query_result_df[\"selection\"] = (year_box.value >= query_result_df[\"start_year\"]) & (\n",
......@@ -260,7 +270,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 Loading the Model Data"
"## 3. Load the model data"
]
},
{
......@@ -438,9 +448,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "test_env",
"display_name": "Python 3 unstable (using the module python3/unstable)",
"language": "python",
"name": "test_env"
"name": "python3_unstable"
},
"language_info": {
"codemirror_mode": {
......@@ -452,7 +462,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.7.8"
}
},
"nbformat": 4,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment