Commit 1aab0641 authored by Fabian Wachsmann's avatar Fabian Wachsmann
Browse files

updated cdo use case

parent 66ed71e9
Pipeline #19405 passed with stage
in 3 minutes and 13 seconds
......@@ -98,7 +98,7 @@
"list(dkrz_catalog)\n",
"\n",
"# Open the catalog with the intake package and name it \"col\" as short for \"collection\"\n",
"cols=dkrz_catalog.metadata[\"parameters\"][\"cmip6_columns\"][\"default\"]+[\"opendap_url\"]\n",
"cols=dkrz_catalog._entries[\"dkrz_cmip6_disk\"]._open_args[\"csv_kwargs\"][\"usecols\"]+[\"opendap_url\"]\n",
"col=dkrz_catalog.dkrz_cmip6_disk(csv_kwargs=dict(usecols=cols))"
]
},
......
%% Cell type:markdown id: tags:
# Climate Extremes Indices with CDOs according to the ETCCDI standard
%% Cell type:markdown id: tags:
A **climate index** is a calculated measure for the state and/or variations of the climate system. In the field of meteorology, many definitions for different types of climate indices exist. For example, the German Weather Service defines a **Klimakenntag**: If a climatological parameter exceeds a specific threshold at one day, the day is considered as a specific klimakenntag.
The expert team ETCCDI has defined a core set of descriptive indices of extremes (Climate Extremes Indices, CEI) in order to
> "gain a **uniform perspective on observed changes in weather and climate extremes**.
These indices have become a standard in the climate science community. They describe particular characteristics of extremes including *frequency, amplitude and persistence*.
%% Cell type:markdown id: tags:
### Learning Objectives
In this notebook, you will learn to
- calculate 4 kinds of CEIs, **absolute**, **threshold**, **duration** and **percentile**-based indices according to the ETCCDI standard with CDOs
- calculate running window percentiles according to the ETCCDI standard with CDOs
%% Cell type:markdown id: tags:
### Requirements
- intake
- cdo
%% Cell type:code id: tags:
``` python
from IPython.display import HTML
HTML('<iframe src="https://slides.com/wachsylon/cdoetccdi/embed" width="576" height="420" scrolling="no" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>')
```
%% Cell type:markdown id: tags:
## 0. Preparation
In the following,
- you will `import` packages required to run `cdo` in python. We use the `cdo` binary from the environment which was used to create the kernel. You might want to change that.
- you will get a time series of the variables `tasmin` and `pr` which can be used to calculate all types of indices. We choose a subset from the `historical` experiment form the recent model intercomparison project *CMIP6* for the time interval 1970-1989.
%% Cell type:code id: tags:
``` python
#set cdo binary to the one installed in the environment of the kernel
import sys
import os
cdobin=os.path.sep.join(sys.executable.split(os.path.sep)[:-1]+["cdo"])
#
#import python cdo
from cdo import *
cdo = Cdo(cdobin)
cdo.debug=True
#This prohibits that existing files are created a second time
cdo.forceOutput = False
```
%% Cell type:code id: tags:
``` python
import intake
# Path to master catalog on the DKRZ server
#dkrz_catalog=intake.open_catalog(["https://dkrz.de/s/intake"])
#
#only for the web page we need to take the original link:
dkrz_catalog=intake.open_catalog(["https://gitlab.dkrz.de/data-infrastructure-services/intake-esm/-/raw/master/esm-collections/cloud-access/dkrz_catalog.yaml"])
list(dkrz_catalog)
# Open the catalog with the intake package and name it "col" as short for "collection"
cols=dkrz_catalog.metadata["parameters"]["cmip6_columns"]["default"]+["opendap_url"]
cols=dkrz_catalog._entries["dkrz_cmip6_disk"]._open_args["csv_kwargs"]["usecols"]+["opendap_url"]
col=dkrz_catalog.dkrz_cmip6_disk(csv_kwargs=dict(usecols=cols))
```
%% Cell type:code id: tags:
``` python
# Store the name of the model we chose in a variable named "climate_model"
source_id = "MPI-ESM1-2-LR" # here we choose Max-Plack Institute's Earth Sytem Model in high resolution
query = dict(
source_id = source_id, # the model
variable_id = ["tasmin","pr"],
table_id = "day", # daily
experiment_id = "historical", #
member_id = "r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing
time_range = "19700101-19891231"
)
# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
col.df["uri"]=col.df["uri"].str.replace("lustre/","lustre02/")
cat = col.search(**query)
# Show query results
cat.df
```
%% Cell type:code id: tags:
``` python
#download files
urlpr=cat.df[cat.df["variable_id"] == "pr"]["opendap_url"].values[0].replace("dodsC","fileServer")
urltas=cat.df[cat.df["variable_id"] == "tasmin"]["opendap_url"].values[0].replace("dodsC","fileServer")
!wget {urlpr}
!wget {urltas}
```
%% Cell type:code id: tags:
``` python
#define temporary output files
prorig = cat.df[cat.df["variable_id"] == "pr"]["opendap_url"].values[0].split('/')[-1]
tasorig = cat.df[cat.df["variable_id"] == "tasmin"]["opendap_url"].values[0].split('/')[-1]
prHamburg ="pr_hamburg.nc"
tasminHamburg="tasmin_hamburg.nc"
```
%% Cell type:code id: tags:
``` python
#Select a subarea because memory might not be large enough
#and change units
prep_ts="-sellonlatbox,9,10,53,54 "
prep_pr="-mulc,86400 "+prep_ts
pr=cdo.mulc("1",
input=prep_pr+prorig,
output=prHamburg)
tasmin=cdo.mulc("1",
input=prep_ts+tasorig,
output=tasminHamburg)
```
%% Cell type:markdown id: tags:
## 1. Absolute indices
***
- `txx`, `txn`, `tnx`, `tnn`
- daily temperature range `dtr`, intensity intdex `sdii`
- `rx1day`, `rx5day`, `prcptot`
***
%% Cell type:markdown id: tags:
While the other 3 categories of indices are defined in temporal *units*, the **absolute** indices have the same *units* as the input variable. The **absolute indices** are useful because its values
> can often be related to extreme events that affect human society and the natural environment
Absolute indices are easy to compute with basic CDOs. E.g., the minimum of daily minimum temperature can be calculated with
```bash
cdo yearmin tasmin tnn.nc
```
However, the `etccdi_` indices produces variables named according to the ETCCDI standard. Also, you can specify an output frequency. When it comes to the *precipiation related* absolute indices, you cannot work with basic cdos anymore. In the following, we show you an example for the *highest five day precipitation sum*:
%% Cell type:code id: tags:
``` python
#Highest 5day percipitation sum
!export CDO_TIMESTAT_DATE="last"
#$cdo eca_rx5day,50,freq=year -runsum,5
rx5day="rx5day_hamburg.nc"
rx5day_values = cdo.etccdi_rx5day(input="-runsum,5 "+prHamburg,
output="rx5day_hamburg.nc",
returnCdf=True).variables[
"rx5dayETCCDI"][:]
rx5day_values = rx5day_values.flatten()
```
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
plt.plot(rx5day_values)
plt.grid()
plt.xlabel("Year")
plt.ylabel("Precipitation sum over 5 days [mm]")
plt.show()
```
%% Cell type:markdown id: tags:
## 2. Threshold exceedances
***
- frost days `fd`, ice days `id`, summer days `su`, tropical nights `tr`
- `r1mm`, `r10mm`, `r20mm`
***
%% Cell type:markdown id: tags:
Threshold based CEIs are
> Indices based on the count of days crossing certain **fixed thresholds** (for example, the 0°C threshold as used in the frost days index FD) can also be related to observed impacts, in particular if the thresholds refer to values of physical, hydrological or biological significance.
We can calculate `frost days` with yearly frequency as follows:
%% Cell type:code id: tags:
``` python
#frost days
# $cdo eca_fd,freq=year
fd_values = cdo.etccdi_fd(input=tasminHamburg,
output="fd_hamburg.nc",
returnCdf=True).variables[
"fdETCCDI"][:]
fd_values = fd_values.flatten()
```
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
plt.plot(fd_values)
plt.grid()
plt.xlabel("Year")
plt.ylabel("Number of Frost days per year")
plt.show()
```
%% Cell type:markdown id: tags:
## 3. Create Percentiles
%% Cell type:markdown id: tags:
In contrast to fixed thresholds based indices, for some indices individual grid cell percentiles for each day are used as thresholds. Using only one day per day results in a rather small sample size which is equal to the lenght of the base period. In climate science, this base period is usually 20 or 30 years long. In order to construct percentiles that are more *meaningful*, we use a running window around the target day to enlarge the sample size. We will run `ydrunpctl` which also requires `ydrunmin` and `ydrunmax`.
%% Cell type:markdown id: tags:
The running minimum and running maximum are needed as input files for *ydrunpctl* so that the operator can create **bins** for a histogram. However, CEIs are based not on **bin-sorted** histograms but use a **value-sorted** histogram. Such a histogram can only be created if there is enough memory to save all values. This is only the case, if the environment parameter *CDO_PCTL_NBINS* can be set to
```bash
nbins="$((windowsize*(end_year-start_year+1)*2+2))"
```
which is in our case
```bash
nbins="$((5*(1989-1970+1)*2+2))"
nbins=202
#and
export CDO_PCTL_NBINS=202
```
Whether this is possible depends on the system.
%% Cell type:markdown id: tags:
Two new arguments have been introduced for the `ydrun`* operators in comparison to older cdo versions for matching the requirements defined by the ETCCDI standard calculation methods:
- **rm** : The **read_method** can be set to `c` for "circular" which takes into account the last time steps at the begin of the time period and vise versa. Otherwise, the first and last time steps are not used as often as the other time steps in the calculations.
- **pm** : Since a lot of methods exist to calculate a percentile, CDO will allow to set ***percentileMethod*** in the operator call. ETCCDI recommends a method implemented in the software language R as type8. This is right now the only option for the argument.
%% Cell type:markdown id: tags:
The resulting command line calls of these operators required for CEIs look like:
```bash
cdo ydrunmin,5,rm=c tasmin_hamburg.nc tasmin_runmin.nc
cdo ydrunmax,5,rm=c tasmin_hamburg.nc tasmin_runmax.nc
cdo ydrunpctl,5,rm=c,pm=r8 tasmin_runmin.nc tasmin_runmax.nc tn10thresh.nc
```
%% Cell type:code id: tags:
``` python
windowsize=5
readMethod="circular"
percentileMethod="rtype8"
```
%% Cell type:code id: tags:
``` python
# $cdo ydrunmin,5,rm=c $tasminMerged $tasminrunmin
cdo.ydrunmin(windowsize,"rm="+readMethod,
input=tasminHamburg,
output="tasmin_runmin.nc")
```
%% Cell type:code id: tags:
``` python
# $cdo ydrunmax,5,rm=c $tasminMerged $tasminrunmin
cdo.ydrunmax(windowsize,"rm="+readMethod,
input=tasminHamburg,
output="tasmin_runmax.nc")
```
%% Cell type:code id: tags:
``` python
#If you set this environment parameter,
#histograms are ordered values instead of bins
!export CDO_PCTL_NBINS=202
# $cdo subc,273.15 -ydrunpctl,10,5,pm=r8,rm=c $tasminMerged ${tasminrunmin} ${tasminrunmax} ${tn10thresh}
cdo.ydrunpctl(10,windowsize,"rm="+readMethod,"pm="+percentileMethod,
input=tasminHamburg+" tasmin_runmin.nc tasmin_runmax.nc",
output="tn10thresh.nc")
# $cdo subc,273.15 -ydrunpctl,90,5,pm=r8,rm=c $tasminMerged ${tasminrunmin} ${tasminrunmax} ${tn10thresh}
cdo.ydrunpctl(90,windowsize,"rm="+readMethod,"pm="+percentileMethod,
input=tasminHamburg+" tasmin_runmin.nc tasmin_runmax.nc",
output="tn90thresh.nc")
```
%% Cell type:markdown id: tags:
## 4. Duration indices
***
- cold spell duration index `csdi`, warm spell duration index `wsdi`
- consecutive dry days `cdd`, consecutive wet days `cwd`
- growing season lengths `gsl`
***
%% Cell type:markdown id: tags:
Duration indices are self-explanatory. They allow to characterize periods of extremes. In comparison to older `eca_` indices, the `etccdi_` indices implemented in `cdo` have two major changes. They
1. count periods over overlapping years (or month, depending on the output frequency) and the final period will get the time stamp of the last contributing day.
2. allow to have less time steps in the percentile threshold file than in the original time series file.
The corresponding command line call looks like
```bash
cdo eca_cwfi,6,freq=year tasminHamburg.nc tn10thresh.nc csdi_hamburg.nc
```
%% Cell type:code id: tags:
``` python
#Cold spell duration index (cold wave index)
csdi="csdi_hamburg.nc"
csdiValues = cdo.etccdi_csdi(6,"freq=year",
input=tasminHamburg+" tn10thresh.nc",
output=csdi,
returnCdf=True).variables['csdiETCCDI'][:]
csdiValues = csdiValues.flatten()
```
%% Cell type:code id: tags:
``` python
plt.plot(csdiValues)
plt.show()
```
%% Cell type:code id: tags:
``` python
#Consecutive Wet Days
#Precipitation threshold [mm]:
pt=1
#Minimum number of days exceeded for a second variable:
md=5
#!cdo eca_cwd,1,5,freq=year prHamburg.nc cwd_hamburg.nc
```
%% Cell type:code id: tags:
``` python
cwd_values = cdo.etccdi_cwd(input=prHamburg,
output="cwdHamburg.nc",
returnCdf=True).variables[
"cwdETCCDI"][:]
cwd_values = cwd_values.flatten()
```
%% Cell type:code id: tags:
``` python
plt.hist(cwd_values,bins= [5.5,6.5,7.5,8.5,
9.5,10.5,11.5,12.5,13.5])
plt.grid()
plt.xlabel("Largest number of consecutive"
" wet days per year")
plt.ylabel("Frequency")
plt.show()
```
%% Cell type:markdown id: tags:
## 5. Percentile based indices
***
tx10p tx90p tn10p tn90 r95p r99p
***
%% Cell type:markdown id: tags:
The reason for choosing mostly percentile thresholds rather than fixed thresholds is
> the number of days exceeding percentile thresholds is more evenly distributed in space and is meaningful in every region”
The temperature related percentile based indices (*tx90p, tx10p, tn90p, tn10p*) require a special percentile calculation method for years that lie *inside the base period*. For that years, **bootstrapping** must be applied where the base period is modified: the index year is taken from the base period and is replaced by another year. Then, the percentile as well as the index is calculated for the new 30 year base period. This is done 29 times so that each year from the base period will be accounted twice. In the end, the mean of 29 indices is taken.<br>
Therefore, the operators need input arguments:
1. The window size (5)
2. The start year of the bootstrapping interval (1970)
3. The end year of the bootstrapping interval (1989)
4. The output frequency
The corresponding command line call looks like:
```bash
cdo etccdi_tn10p,5,1970,1989,freq=year tasmin_hamburg.nc tasmin_runmin.nc tasmin_runmax.nc tn10p_hamburg.nc
```
%% Cell type:code id: tags:
``` python
!rm tn10p_hamburg.nc
!export CDO_PCTL_NBINS=302
tn10p_values = cdo.etccdi_tn10p(5,1970,1989,"freq=year",
input=tasminHamburg+" tasmin_runmin.nc tasmin_runmax.nc",
output="tn10p_hamburg.nc",
returnCdf=True).variables["tn10pETCCDI"][:]
tn10p_values = tn10p_values.flatten()
```
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
plt.plot(tn10p_values)
plt.grid()
plt.xlabel("Year")
plt.ylabel("Number of days with tmin < tmin90")
plt.show()
```
%% Cell type:markdown id: tags:
### Used data
- https://doi.org/10.22033/ESGF/CMIP6.6595
%% Cell type:markdown id: tags:
We acknowledge the CMIP community for providing the climate model data, retained and globally distributed in the framework of the ESGF. The CMIP data of this study were replicated and made available for this study by the DKRZ.”
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment