Skip to content
Snippets Groups Projects
Commit 30dab3b6 authored by Marco Kulüke's avatar Marco Kulüke
Browse files

add further comments

parent b50ebc1a
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Calculate a climate index in a server hosting all the climate model data: # Calculate a climate index in a server hosting all the climate model data:
## run faster and without data transfer ## run faster and without data transfer
We will show here how to count the annual summer days for a particular location of your choice using the results of a climate model, in particular, the historical and socioeconomics pathways of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/). We will show here how to count the annual summer days for a particular location of your choice using the results of a climate model, in particular, the historical and shared socioeconomic pathway (ssp) experiments of the Coupled Model Intercomparison Project [CMIP6](https://pcmdi.llnl.gov/CMIP6/).
This Jupyter notebook runs in the Jupyterhub server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/) which is an [ESGF](https://esgf.llnl.gov/) repository that hosts and maintains more than 3 Petabytes of CMIP6 data. Please, choose the ... kernel on the right uper corner of this notebook. This Jupyter notebook runs in the Jupyterhub server of the German Climate Computing Center [DKRZ](https://www.dkrz.de/) which is an [ESGF](https://esgf.llnl.gov/) repository that hosts and maintains more than 3 Petabytes of CMIP6 data. Please, choose the ... kernel on the right uper corner of this notebook.
Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook. Thanks to the data and computer scientists Marco Kulüke, Fabian Wachsmann, Regina Kwee-Hinzmann, Caroline Arnold, Felix Stiehler, Maria Moreno, and Stephan Kindermann at DKRZ for their contribution to this notebook.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In this Use Case you will learn the following: In this Use Case you will learn the following:
- How to access a data set from the DKRZ CMIP6 model data archive - How to access a data set from the DKRZ CMIP6 model data archive
- How to count the annual number of summer days for a particular location using this model data set - How to count the annual number of summer days for a particular location using this model data set
- How to visualize the results - How to visualize the results
\ \
You will use: You will use:
- [Intake](https://github.com/intake/intake) for finding the data in the DKRZ catalog - [Intake](https://github.com/intake/intake) for finding the data in the DKRZ catalog
- [Xarray](http://xarray.pydata.org/en/stable/) for loading and processing the data in the DKRZ Jupyterhub server - [Xarray](http://xarray.pydata.org/en/stable/) for loading and processing the data in the DKRZ Jupyterhub server
- [hvPlot](https://hvplot.holoviz.org/index.html) for visualizing the data in the Jupyter notebook and save the plots in your local computer - [hvPlot](https://hvplot.holoviz.org/index.html) for visualizing the data in the Jupyter notebook and save the plots in your local computer
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 0. Load Packages ## 0. Load Packages
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import intake # a general interface for loading data from an existing catalog import intake # a general interface for loading data from an existing catalog
import folium # visualization tool import folium # visualization tool
import xarray as xr # handling labelled multi-dimensional arrays import xarray as xr # handling labelled multi-dimensional arrays
from ipywidgets import widgets # to use widgets in the Jupyer Notebook from ipywidgets import widgets # to use widgets in the Jupyer Notebook
from geopy.geocoders import Nominatim # Python client for several popular geocoding web services from geopy.geocoders import Nominatim # Python client for several popular geocoding web services
import numpy as np # fundamental package for scientific computing import numpy as np # fundamental package for scientific computing
import pandas as pd # data analysis and manipulation tool import pandas as pd # data analysis and manipulation tool
import hvplot.pandas # visualization tool import hvplot.pandas # visualization tool
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 1. Which data set do we need? -> Choose Scenario, Place, and Year ## 1. Which data set do we need? -> Choose Shared Socioeconomic Pathway, Place, and Year
<a id='selection'></a> <a id='selection'></a>
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Produce Widgets # Produce Widgets
scenarios = {'historical':range(1850, 2015), 'ssp585':range(2015, 2101), 'ssp126':range(2015, 2101), 'ssp245':range(2015, 2101), 'ssp119':range(2015, 2101), 'ssp434':range(2015, 2101), 'ssp460':range(2015, 2101)} experiments = {'historical':range(1850, 2015), 'ssp585':range(2015, 2101), 'ssp126':range(2015, 2101), 'ssp245':range(2015, 2101), 'ssp119':range(2015, 2101), 'ssp434':range(2015, 2101), 'ssp460':range(2015, 2101)}
scenario_box = widgets.Dropdown(options=scenarios, description="Select scenario: ", disabled=False,) experiment_box = widgets.Dropdown(options=experiments, description="Select experiment: ", disabled=False,)
display(scenario_box) display(experiment_box)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
place_box = widgets.Text(description="Enter place:") place_box = widgets.Text(description="Enter place:")
display(place_box) display(place_box)
x = scenario_box.value x = experiment_box.value
year_box = widgets.Dropdown(options=x, description="Select year: ", disabled=False,) year_box = widgets.Dropdown(options=x, description="Select year: ", disabled=False,)
display(year_box) display(year_box)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 1.1 Find Coordinates of chosen Place ### 1.1 Find Coordinates of chosen Place
If ambiguous, the most likely coordinates will be chose If ambiguous, the most likely coordinates will be chose
\ \
e.g. "Hamburg" results in "Hamburg, 20095, Deutschland", (53.55 North, 10.00 East) e.g. "Hamburg" results in "Hamburg, 20095, Deutschland", (53.55 North, 10.00 East)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
geolocator = Nominatim(user_agent="any_agent") geolocator = Nominatim(user_agent="any_agent")
location = geolocator.geocode(place_box.value) location = geolocator.geocode(place_box.value)
print(location.address) print(location.address)
print((location.latitude, location.longitude)) print((location.latitude, location.longitude))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 1.2 Show Place on a Map ### 1.2 Show Place on a Map
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
m = folium.Map(location=[location.latitude, location.longitude]) m = folium.Map(location=[location.latitude, location.longitude])
tooltip = location.latitude, location.longitude tooltip = location.latitude, location.longitude
folium.Marker([location.latitude, location.longitude], tooltip=tooltip).add_to(m) folium.Marker([location.latitude, location.longitude], tooltip=tooltip).add_to(m)
display(m) display(m)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We have defined the place and time. Now, we can search for the climate model data set. We have defined the place and time. Now, we can search for the climate model data set.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 2. Intake Catalog ## 2. Intake Catalog
Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each data set (the title, author, and number of pages of the book, for instance) that you can access before loading the data (so thanks to the catalog, you do not need to open the book to know the number of pages of the book, for instance). Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each data set (the title, author, and number of pages of the book, for instance) that you can access before loading the data (so thanks to the catalog, you do not need to open the book to know the number of pages of the book, for instance).
### 2.1 Load the Intake Catalog ### 2.1 Load the Intake Catalog
We load the catalog descriptor with the intake package. The catalog is updated daily. We load the catalog descriptor with the intake package. The catalog is updated daily.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Path to catalog descriptor on the DKRZ server # Path to catalog descriptor on the DKRZ server
col_url = "/work/ik1017/Catalogs/mistral-cmip6.json" col_url = "/work/ik1017/Catalogs/mistral-cmip6.json"
# Open the catalog with the intake package and name it "col" as short for collection # Open the catalog with the intake package and name it "col" as short for collection
col = intake.open_esm_datastore(col_url) col = intake.open_esm_datastore(col_url)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Let's see what is inside the intake catalog. The underlying data base is given as a pandas dataframe which we can access with "col.df". Then, "col.df.head()" shows us the first rows of the table of the catalog. Let's see what is inside the intake catalog. The underlying data base is given as a pandas dataframe which we can access with "col.df". Then, "col.df.head()" shows us the first rows of the table of the catalog.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This catalog contains all data sets of the CMIP6 archive at DKRZ. In the next step we narrow the results down by chosing a model and variable. This catalog contains all data sets of the CMIP6 archive at DKRZ. In the next step we narrow the results down by chosing a model and variable.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 2.2 Browse the Intake Catalog ### 2.2 Browse the Intake Catalog
In this example we chose the Max-Planck Earth System Model in High Resolution Mode ("MPI-ESM1-2-HR") and the maximum temperature near surface ("tasmax") as variable. In this example we chose the Max-Planck Earth System Model in High Resolution Mode ("MPI-ESM1-2-HR") and the maximum temperature near surface ("tasmax") as variable.
\ \
CMIP6 comprises several kind of experiments. Each experiment has various simulation members. More information can be found via the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation). CMIP6 comprises several kind of experiments. Each experiment has various simulation members. More information can be found via the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
climate_model = "MPI-ESM1-2-LR" climate_model = "MPI-ESM1-2-LR"
query = dict( query = dict(
source_id=climate_model, # here we choose Max-Plack Institute's Earth Sytem Model in high resolution source_id=climate_model, # here we choose Max-Plack Institute's Earth Sytem Model in high resolution
variable_id="tasmax", # temperature at surface, maximum variable_id="tasmax", # temperature at surface, maximum
table_id="day", # daily maximum table_id="day", # daily maximum
experiment_id=scenario_box.label, # historical Simulation, 1850-2014 experiment_id=experiment_box.label, # historical Simulation, 1850-2014
member_id="r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing member_id="r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing
) )
cat = col.search(**query) cat = col.search(**query)
# Show query # Show query
cat.df cat.df
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here we see our query results. This is like the list of results you get when you search for keywords with a search engine. In the next section we will find the data set which contains our selected year. Here we see our query results. This is like the list of results you get when you search for keywords with a search engine. In the next section we will find the data set which contains our selected year.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 3. Load the model data ## 3. Load the model data
### 3.1 Find Data Set Which Contains the Year You Selected in Drop Down Menu Above ### 3.1 Find Data Set Which Contains the Year You Selected in Drop Down Menu Above
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# copying dataframe # copying dataframe
# TO DO: check if .copy() is better # TO DO: check if .copy() is better
# https://stackoverflow.com/questions/47972633/in-pandas-does-iloc-method-give-a-copy-or-view/47972710#47972710 # https://stackoverflow.com/questions/47972633/in-pandas-does-iloc-method-give-a-copy-or-view/47972710#47972710
# https://stackoverflow.com/questions/48173980/pandas-knowing-when-an-operation-affects-the-original-dataframe # https://stackoverflow.com/questions/48173980/pandas-knowing-when-an-operation-affects-the-original-dataframe
query_result_df = cat.df query_result_df = cat.df.copy()
query_result_df["start_year"] = query_result_df["time_range"].str[0:4].astype(int) # add column with start year query_result_df["start_year"] = query_result_df["time_range"].str[0:4].astype(int) # add column with start year
query_result_df["end_year"] = query_result_df["time_range"].str[9:13].astype(int) # add column with end year query_result_df["end_year"] = query_result_df["time_range"].str[9:13].astype(int) # add column with end year
# create a column labelling the year selection as True or False # create a column labelling the year selection as True or False
# TO DO: is there a non boolean way that is better? smth like query_result_df[query_result_df['star_year'] == year_box_value]? # TO DO: is there a non boolean way that is better? smth like query_result_df[query_result_df['star_year'] == year_box_value]?
query_result_df["selection"] = (year_box.value >= query_result_df["start_year"]) & ( query_result_df["selection"] = (year_box.value >= query_result_df["start_year"]) & (
year_box.value <= query_result_df["end_year"] year_box.value <= query_result_df["end_year"]
) )
selected_path_index = query_result_df.loc[query_result_df["selection"] == True][ selected_path_index = query_result_df.loc[query_result_df["selection"] == True][
"path" "path"
].index[0] ].index[0]
# select the rows with True in the column "selection" # select the rows with True in the column "selection"
selected_path = query_result_df["path"][selected_path_index] selected_path = query_result_df["path"][selected_path_index]
# show path for selected year # show path for selected year
selected_path selected_path
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 3.2 Loading the Model Data ### 3.2 Loading the Model Data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Load Data with the open_dataset() xarray method # Load Data with the open_dataset() xarray method
ds_tasmax = xr.open_dataset(selected_path) ds_tasmax = xr.open_dataset(selected_path)
# Open variable "tasmax" over the whole time range # Open variable "tasmax" over the whole time range
tasmax_xr = ds_tasmax["tasmax"] tasmax_xr = ds_tasmax["tasmax"]
# Define start and end time string # Define start and end time string
time_start = str(year_box.value) + "-01-01T12:00:00.000000000" time_start = str(year_box.value) + "-01-01T12:00:00.000000000"
time_end = str(year_box.value) + "-12-31T12:00:00.000000000" time_end = str(year_box.value) + "-12-31T12:00:00.000000000"
# Slice selected year # Slice selected year
tasmax_year_xr = tasmax_xr.loc[time_start:time_end, :, :] tasmax_year_xr = tasmax_xr.loc[time_start:time_end, :, :]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Let's have a look at the xarray data array # Let's have a look at the xarray data array
tasmax_year_xr tasmax_year_xr
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We see not only the numbers, but also information about it, such as long name, units, and the data history. This information is called metadata. We see not only the numbers, but also information about it, such as long name, units, and the data history. This information is called metadata.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 4. Compare Model Grid Cell with chosen Location ## 4. Compare Model Grid Cell with chosen Location
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Find nearest model coordinate by finding the index of the nearest grid point # Find nearest model coordinate by finding the index of the nearest grid point
abslat = np.abs(tasmax_year_xr["lat"] - location.latitude) abslat = np.abs(tasmax_year_xr["lat"] - location.latitude)
abslon = np.abs(tasmax_year_xr["lon"] - location.longitude) abslon = np.abs(tasmax_year_xr["lon"] - location.longitude)
c = np.maximum(abslon, abslat) c = np.maximum(abslon, abslat)
([xloc], [yloc]) = np.where(c == np.min(c)) # xloc and yloc are the indices of the neares model grid point ([xloc], [yloc]) = np.where(c == np.min(c)) # xloc and yloc are the indices of the neares model grid point
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Draw map again # Draw map again
m = folium.Map(location=[location.latitude, location.longitude], zoom_start=8) m = folium.Map(location=[location.latitude, location.longitude], zoom_start=8)
tooltip = location.latitude, location.longitude tooltip = location.latitude, location.longitude
folium.Marker( folium.Marker(
[location.latitude, location.longitude], [location.latitude, location.longitude],
tooltip=tooltip, tooltip=tooltip,
popup="Location selected by You", popup="Location selected by You",
).add_to(m) ).add_to(m)
# #
tooltip = float(tasmax_year_xr["lat"][yloc].values), float(tasmax_year_xr["lon"][xloc].values) tooltip = float(tasmax_year_xr["lat"][yloc].values), float(tasmax_year_xr["lon"][xloc].values)
folium.Marker( folium.Marker(
[tasmax_year_xr["lat"][yloc], tasmax_year_xr["lon"][xloc]], [tasmax_year_xr["lat"][yloc], tasmax_year_xr["lon"][xloc]],
tooltip=tooltip, tooltip=tooltip,
popup="Model Grid Cell Center", popup="Model Grid Cell Center",
).add_to(m) ).add_to(m)
# Define coordinates of model grid cell (just for visualization) # Define coordinates of model grid cell (just for visualization)
rect_lat1_model = (tasmax_year_xr["lat"][yloc - 1] + tasmax_year_xr["lat"][yloc]) / 2 rect_lat1_model = (tasmax_year_xr["lat"][yloc - 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon1_model = (tasmax_year_xr["lon"][xloc - 1] + tasmax_year_xr["lon"][xloc]) / 2 rect_lon1_model = (tasmax_year_xr["lon"][xloc - 1] + tasmax_year_xr["lon"][xloc]) / 2
rect_lat2_model = (tasmax_year_xr["lat"][yloc + 1] + tasmax_year_xr["lat"][yloc]) / 2 rect_lat2_model = (tasmax_year_xr["lat"][yloc + 1] + tasmax_year_xr["lat"][yloc]) / 2
rect_lon2_model = (tasmax_year_xr["lon"][xloc + 1] + tasmax_year_xr["lon"][xloc]) / 2 rect_lon2_model = (tasmax_year_xr["lon"][xloc + 1] + tasmax_year_xr["lon"][xloc]) / 2
# Draw model grid cell # Draw model grid cell
folium.Rectangle( folium.Rectangle(
bounds=[[rect_lat1_model, rect_lon1_model], [rect_lat2_model, rect_lon2_model]], bounds=[[rect_lat1_model, rect_lon1_model], [rect_lat2_model, rect_lon2_model]],
color="#ff7800", color="#ff7800",
fill=True, fill=True,
fill_color="#ffff00", fill_color="#ffff00",
fill_opacity=0.2, fill_opacity=0.2,
).add_to(m) ).add_to(m)
m m
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. Take this in mind when comparing model data with observed data (e.g. weather stations). Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. Take this in mind when comparing model data with observed data (e.g. weather stations).
\ \
\ \
Now, we will visualize the daily maximum temperature time series of the model grid cell. Now, we will visualize the daily maximum temperature time series of the model grid cell.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 5. Draw Temperature Time Series and Count Summer days ## 5. Draw Temperature Time Series and Count Summer days
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The definition of a summer day varies from region to region. According to the [German Weather Service](https://www.dwd.de/EN/ourservices/germanclimateatlas/explanations/elements/_functions/faqkarussel/sommertage.html), "a summer day is a day on which the maximum air temperature is at least 25.0°C". Depending on the place you selected, you might want to apply a different threshold. The definition of a summer day varies from region to region. According to the [German Weather Service](https://www.dwd.de/EN/ourservices/germanclimateatlas/explanations/elements/_functions/faqkarussel/sommertage.html), "a summer day is a day on which the maximum air temperature is at least 25.0°C". Depending on the place you selected, you might want to apply a different threshold.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
tasmax_year_place_xr = tasmax_year_xr[:, yloc, xloc] - 273.15 # Convert Kelvin to °C tasmax_year_place_xr = tasmax_year_xr[:, yloc, xloc] - 273.15 # Convert Kelvin to °C
tasmax_year_place_df = pd.DataFrame(index = tasmax_year_place_xr['time'].values, columns = ['Temperature', 'Summer Day Threshold']) # Create Pandas Series tasmax_year_place_df = pd.DataFrame(index = tasmax_year_place_xr['time'].values, columns = ['Temperature', 'Summer Day Threshold']) # Create Pandas Series
tasmax_year_place_df.loc[:, 'Model Temperature'] = tasmax_year_place_xr.values # Insert model data into Pandas Series tasmax_year_place_df.loc[:, 'Model Temperature'] = tasmax_year_place_xr.values # Insert model data into Pandas Series
tasmax_year_place_df.loc[:, 'Summer Day Threshold'] = 25 # Insert threshold into Pandas series tasmax_year_place_df.loc[:, 'Summer Day Threshold'] = 25 # Insert threshold into Pandas series
# Plot data and define title and legend # Plot data and define title and legend
tasmax_year_place_df.hvplot.line(y=['Model Temperature', 'Summer Day Threshold'], tasmax_year_place_df.hvplot.line(y=['Model Temperature', 'Summer Day Threshold'],
value_label='Temperature in °C', legend='bottom', title='Daily maximum Temperature near Surface for ' +place_box.value, height=500, width=620) value_label='Temperature in °C', legend='bottom', title='Daily maximum Temperature near Surface for ' +place_box.value, height=500, width=620)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
As we can see, the maximum daily temperature is highly variable over the year. As we are using the mean temperature in a model grid cell, the amount of summer days might we different that what you would expect at a single location. As we can see, the maximum daily temperature is highly variable over the year. As we are using the mean temperature in a model grid cell, the amount of summer days might we different that what you would expect at a single location.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Summer days index calculation # Summer days index calculation
no_summer_days_model = tasmax_year_place_xr[tasmax_year_place_xr > 25].size # count the number of summer days no_summer_days_model = tasmax_year_place_xr[tasmax_year_place_xr > 25].size # count the number of summer days
# Print results in a sentence # Print results in a sentence
print("According to the German Weather Service definition, in the scenario " +scenario_box.label +" the " +climate_model +" model shows " +str(no_summer_days_model) +" summer days for " +str(place_box.value) + " in " + str(year_box.value) +".") print("According to the German Weather Service definition, in the " +experiment_box.label +" experiment the " +climate_model +" model shows " +str(no_summer_days_model) +" summer days for " +str(place_box.value) + " in " + str(year_box.value) +".")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
[Try another location and year](#selection) [Try another location and year](#selection)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment