Skip to content
Snippets Groups Projects

AH minor edits to xarray_intro_part1.ipynb

Closed Angelika Heil requested to merge AH20241010_minor_edits into master
1 file
+ 2311
0
Compare changes
  • Side-by-side
  • Inline
+ 2311
0
%% Cell type:markdown id:08a8d53e-167a-480b-beac-15bc6b378f94 tags:
***
<p align="right">
<img src="https://www.dkrz.de/@@site-logo/dkrz.svg" width="12%" align="right" title="DKRZlogo" hspace="20">
<img src="https://wr.informatik.uni-hamburg.de/_media/logo.png" width="12%" align="right" title="UHHLogo">
</p>
<div style="font-size: 20px" align="center"><b> Python Course for Geoscientists, 8-11 October 2024</b></div>
<div style="font-size: 15px" align="center">
<b>see also <a href="https://gitlab.dkrz.de/pythoncourse/material">https://gitlab.dkrz.de/pythoncourse/material</a></b>
</div>
***
%% Cell type:markdown id:f1b99208-9410-49dc-9d65-ba9290f86878 tags:
<p align="center">
<img src="https://docs.xarray.dev/en/stable/_static/Xarray_Logo_RGB_Final.svg" width="35%" align="right" title="xarraylogo" hspace="20">
</p>
%% Cell type:markdown id:814a8d6e-d0f5-4562-acab-75a2c957266b tags:
# xarray introduction I
%% Cell type:markdown id:77e159a0-19ec-4a69-a871-cee6379d7ae1 tags:
<br>
______________________________________________________________________
# A) What is xarray?
______________________________________________________________________
%% Cell type:markdown id:e81e31bd-f599-448f-93b8-8017e91133ae tags:
*xarray* is a Python package that simplifies the handling of *multi-dimensional datasets*. It offers a wide range of functions for data manipulation, visualization, and advanced analytics, building on the capabilities of *numpy, Pandas, and Matplotlib.*
The underlying data model of xarray is based on the network Common Data Form ([netCDF](https://www.unidata.ucar.edu/software/netcdf/)). netCDF is a standard file format widely used in climate science.
*xarray* enables *efficient and intuitive data analysis* of netCDF data, but it also supports other file formats like *GRIB, HDF5, and Zarr*.
*xarray* documentation: https://docs.xarray.dev/en/stable/index.html
%% Cell type:markdown id:775514a1-b245-46ea-bfc9-9c4037be7ee4 tags:
## A1) xarray's data structure
xarray provides two primary data structures for handling multi-dimensional data: **DataArray** and **Dataset**.
_see also https://tutorial.xarray.dev/fundamentals/01_datastructures.html_
%% Cell type:markdown id:a905fc01-5d69-411f-9e76-4505363928e2 tags:
### A1.1) DataArray
The DataArray class enhances multi-dimensional arrays by attaching dimension names, coordinates, and attributes. Essentially, a DataArray is a numpy ndarray enriched with additional metadata. The metadata describes the data using elements such as coordinate labels, named dimensions, units, attributes, and variable names.
<span style="font-size: 1.1em; color: blue; ">class xarray.DataArray(data=nan, dims=None, coords=None, attrs=None, name=None, ..., ...)</span>
<div style="margin-left: 20px;">
**data**:
Values of the data variable as numpy ndarray or ndarray-like.
**dims**:
Dimensions represent the axes along which the data is organized. In xarray, each dimension is assigned a unique name.
_Note: Commonly, the dimension order in multidimensional Earth Science data is (time, level, latitude, longitude)_
**coords**:
Coordinates ("tick labels") provide descriptive labels for the dimensions of your data, offering additional context and meaning to the data points along each dimension.
**attrs**:
Attributes, also known as metadata, are additional pieces of information attached to DataArrays, coordinate variables and/or Datasets. They include descriptive information such as units, CF standard_name, FillValue attributes, and comments.
**name**:
Name of data variable.
</div>
%% Cell type:markdown id:a39d647e-dea2-4cd4-b426-2c4ba5ea1a39 tags:
![xr1_DataStructure.png](../images/xr1_DataStructure.png)
*Figure 1: An overview of xarray’s main data structures (adapted from https://docs.xarray.dev/en/stable/_images/dataset-diagram.png)*
%% Cell type:markdown id:9edeac20-70a5-4066-be92-84f07007fab1 tags:
### A1.2) DataSet
A Dataset class is a dictionary-like collection of one or more DataArray objects with aligned dimensions.
It serves as a container for organizing and managing multiple data variables within a coherent structure, mirroring the structure of a netCDF data file object.
<span style="font-size: 1.1em; color: blue; ">class xarray.Dataset(data_vars=None, coords=None, attrs=None)</span>
<div style="margin-left: 20px;">
**data_vars**:
A dictionary-like organization of different data variables within a dataset, allowing access to these variables by their respective names.
Each dimension must have the same length in all data variables in which it appears.
**coords**:
Dataset coordinates represent the coordinates of all data variables in the dataset, whether shared or not. They serve as a unified reference framework, helping users understand how the data is structured.
**attrs**:
Global attributes associated with this dataset, such as e.g. Conventions, creator, history, references.
</div>
__Note__: The dimensions (dims) are not passed directly as an argument when creating the dataset.
Instead, they are inferred from the shape of the data variables and coordinates that you provide.
<br>
%% Cell type:markdown id:6c3ec783-efd6-4f3e-bdac-b5744de290c9 tags:
_For more details on xarray’s main data structures, see also [Hoyer and Hamman (2017); DOI: 10.5334/jors.148](https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148) Figure 2._
%% Cell type:markdown id:252d06c6-fa7c-46e3-8241-2c17d50d856d tags:
<br>
______________________________________________________________________
# B) Preparation: Configure the Notebook
______________________________________________________________________
%% Cell type:code id:77c14150-5cc1-42c2-b819-c0351ba9edfb tags:
``` python
import xarray as xr
import numpy as np
import pandas as pd
# Just in case: To install packages in your local directory on a system kernel in Levante or elsewere."""
try:
import cfgrib
except ImportError:
import subprocess
subprocess.run(["bash", "-c", "pip install --user ecmwflibs --quiet"])
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
# Set this to render all evaluated output of a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# Set default figure size and font size
mpl.rcParams.update({
"figure.figsize": (3.5, 2.5),
"font.size": 9
})
```
%% Cell type:markdown id:de21481f-f17b-4097-9259-496462a9ddfe tags:
__Note__: *xr* serves as an alias for the *imported xarray package*.
%% Cell type:markdown id:b87bed17-e651-4791-850e-688c75afb20c tags:
<br>
______________________________________________________________________
# C) xr.DataArray() Showcases and Exercises
______________________________________________________________________
%% Cell type:markdown id:7f87cdb2-6255-4e18-8299-08e004b96ded tags:
The `xr.DataArray()` constructor call in xarray is used to create a multi-dimensional array-like object that encapsulates a numpy ndarray or another ndarray-like data structure (referred to as data).
This constructor call accepts optional additional keyword arguments to provide **labeled dimensions (dims), coordinates (coords), a name (name), and metadata (attrs)**. Below is the general syntax:
```python
xr.DataArray(data,
coords=None,
dims=None,
name=None,
attrs=None
)
```
__Note__: *xr* serves as an alias for the *imported xarray package*.
The constructor call `xr.DataArray()` returns a data object with a default data structure, consisting of dimension names, coordinates, indexes, and attributes.
*If not provided with the `xr.DataArray()` constructor call, these presettings are empty.* The dimensions have the default names dim_0, dim_1,.... You can, however, modify them afterwards.
It"s important to **configure coordinate values** properly not only for xarray but also for other software tools. **Labeled geospatial** information from coordinates is crucial for various tasks, including:
* **Plotting**: Mapping data onto a real-world grid for visualization.
* **Analysis**: Performing routines such as calculating area-weighted means.
%% Cell type:markdown id:6f8dc295-da8b-43e3-84ed-b47c228f86b4 tags:
***
### C1 Showcase: Construct a "naked" xarray DataArray and modify it
%% Cell type:markdown id:bfd34065-71cd-4a28-be7b-41fb9d26f14b tags:
To create a basic xarray DataArray, you can simply pass a numpy ndarray to the `xr.DataArray()` constructor call.
%% Cell type:markdown id:4942104e-047f-42b5-856b-935399067740 tags:
#### C1.1: Construct DataArray
%% Cell type:code id:3744f16b-3b33-4fd2-a308-45ec4e121db4 tags:
``` python
#== Creating a numpy ndarray with shape (2, 4)
nlat, nlon = 2, 4
ndarray = np.random.rand(nlat, nlon) * 10 # or e.g. np.arange(1, nlat * nlon + 1).reshape(nlat,nlon)
ndarray
```
%% Cell type:code id:0754376b-82c2-4553-af59-f27a97e1147f tags:
``` python
#== Creating a plain xarray DataArray
da1 = xr.DataArray(ndarray)
```
%% Cell type:markdown id:23bb9bb7-d59f-4b0d-aed5-a611868c883f tags:
#### C1.2: Look at the DataArray
%% Cell type:code id:27c279b9-b3e3-44da-9e7d-cd129379f2d2 tags:
``` python
#== Look at da1 in plain text view
print(da1)
```
%% Cell type:code id:7af1bfe3-c055-45dc-887e-c7f1be838614 tags:
``` python
#== Look at da1 in HTML View
da1
```
%% Cell type:code id:fc7f81e5-bc97-4ef2-a128-71329022cf2e tags:
``` python
#== Displaying individual components of the DataArray
print("Data Array Components:")
print("=======================")
print(f"Data:\n{da1.data}\n")
print(f"Dimensions: {da1.dims}")
print(f"Attributes: {da1.attrs}")
print(f"Name: {da1.name}")
print(f"Shape: {da1.shape}")
print(f"Size: {da1.size}")
```
%% Cell type:markdown id:57870275-4ac6-4d50-8fe1-839232242194 tags:
#### C1.3: Rename the DataArray dimensions with rename()
%% Cell type:markdown id:ec1eba23-b48d-4d22-a6a8-fd15bedbde50 tags:
The DataArray class comes with many built-in methods.
To see all possible methods available for a DataArray object, you can e.g. use the `dir()` function:
%% Cell type:code id:5f5363f3-3f85-4a68-83d6-4d275aba3a38 tags:
``` python
#dir(da1)
```
%% Cell type:markdown id:bd7afd55-a11c-464a-ac9e-ca5789509628 tags:
One such method is `rename`, which allows you to change the names of dimensions and coordinates of the DataArray.
You can view the details of the `rename` method using e.g. `help(da1.rename)` or https://docs.xarray.dev/en/stable/generated/xarray.DataArray.rename.html.
%% Cell type:code id:3f6add14-a39f-41ea-b923-e27b446ef654 tags:
``` python
help(da1.rename) #--alternative: da1.rename?
```
%% Cell type:code id:2ef0defa-3e21-4292-bbc6-44d3700cd7f4 tags:
``` python
#== Renaming dimension names of the DataArray with .rename()
# Note: the rename method creates a new DataArray object, it does not modify da1 in-place.
da2 = da1.rename({"dim_0": "lat", "dim_1": "lon"})
print(da2)
```
%% Cell type:code id:2d4cd4aa-3e4b-4c40-81be-12e766dc0bae tags:
``` python
print(da2.coords)
```
%% Cell type:markdown id:9ed54b56-85f6-41a0-b097-07f64c41197a tags:
#### C1.4: Assign coordinate variables to the DataArray with assign_coords()
%% Cell type:code id:06c9b6d7-4902-454e-aaf6-ea8c0bd3f260 tags:
``` python
#== Creating coordinate variables lons and lats
lons, lats = np.linspace(0., 20., nlon), np.linspace(-3., 3., nlat)
# Note: the assign_coords method creates a new DataArray object, it does not modify da2 in-place.
da3 = da2.assign_coords({"lon": lons, "lat": lats})
print(da3)
```
%% Cell type:markdown id:cf95e961-7047-4f65-a555-27ff3fdd4b70 tags:
#### C1.5: Assign a name to the DataArray
%% Cell type:code id:a59dbc4b-b3d5-4c75-9425-6baf867ad93b tags:
``` python
da3.name = "var1"
print(da3)
```
%% Cell type:markdown id:e8e26eaf-2611-4fb6-bf17-308627b67b7e tags:
#### C1.6: Assign attributes to the data variable with attrs
%% Cell type:code id:30ccdbba-5860-42e0-b9af-77da0f5d26ee tags:
``` python
# Tip: myda.attrs["attribute_key"] = "attribute_value" #- attaches metadata attributes to the data array
da3.attrs["standard_name"] = "age_of_sea_ice" #- CF standard name for variable
da3.attrs["units"] = "mm" #- units associated to CF standard name
```
%% Cell type:markdown id:a6d2c3eb-7565-42fc-b399-59bf772c758f tags:
_**Note**_: For CF standard names, see https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
%% Cell type:code id:e191937e-9571-49c9-8ed3-9dc85357b61f tags:
``` python
print(da3)
```
%% Cell type:markdown id:ce80854a-cb0c-43dc-bd14-282a3af840c2 tags:
#### C1.7: Assign attributes to the coordinate variable lon with attrs
%% Cell type:markdown id:e1836b8e-3827-4cc8-967e-60a77c02c439 tags:
_**Note**_: There are two options to access a coordinate variable:
* da3.MYVAR: attribute-style access
* da3["MYVAR"]: dictionary-style
%% Cell type:code id:286d9f25-9854-41ad-af9d-4c4f9b3dc967 tags:
``` python
da3.lon.attrs["standard_name"] = "longitude" #- same as da3["lon"].attrs["standard_name"] = "longitude"
da3["lon"].attrs["units"] = "degrees_east"
```
%% Cell type:code id:f7368d01-ec89-4e49-adfc-794798ba3487 tags:
``` python
da3
```
%% Cell type:markdown id:b0fd80cc-237f-48cb-82f3-12d88228b53f tags:
#### C1.8: Access and inspect individual components of the DataArray
%% Cell type:code id:54cfef0e-4da2-4589-bda8-d231cb00e100 tags:
``` python
mycoord = "lon"
# Access the "units" attribute of the coordinate `mycoord`
da3[mycoord].attrs["units"]
# Check the type of the object corresponding to `mycoord`
type(da3[mycoord])
# Retrieve the raw numpy array of values associated with the coordinate `mycoord`
da3[mycoord].values
# Retrieve the data type (dtype) of the values in the `mycoord` coordinate
da3[mycoord].values.dtype
```
%% Cell type:markdown id:498f25f5-7d7d-4250-9731-114c711f0113 tags:
***
### C2 Exercise xr.DataArray()
%% Cell type:code id:ed37edda-48b4-4447-954f-e0779b19dd11 tags:
``` python
# 1. Generate a 3-dimensional xarray DataArray and look at it in plain text view.
```
%% Cell type:code id:3cc0a54c-4e88-4a2e-adf6-b2daea344bcd tags:
``` python
# 2. Change the default dimension names and add coordinate values.
# Tip: use myDataArray.assign_coords(t=tcoords, y=ycoords, x=xcoords)
```
%% Cell type:code id:6b22d453-a20b-4ad4-980a-ee4067236cf6 tags:
``` python
# 3. Print the coordinates and the dimensions.
# Tip: These are referred to as `coords` and `dims`.
# Inspect the type of these objects using the `type()` function and the dtype() attribute.
# Also, check the type of the data object stored in the DataArray.
```
%% Cell type:code id:8af37379-0a87-450b-b811-9633fc4fdcae tags:
``` python
# 4. Add some attributes, including a standard_name and units attribute.
# Tip: myDataArray.attrs["attribute_key"] = "attribute_value"
# For CF standard names, see https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
```
%% Cell type:code id:5ab568b7-edc6-47d5-815b-c706a18a7c72 tags:
``` python
# 5. Look at the new DataArray in html-view
# Tip: try also out myDataArray.head(2) the first two elements from your DataArray.
```
%% Cell type:markdown id:11c1a116-afaa-4aaa-8206-b9d3e65041bd tags:
<br>
#### C2 Solution Exercise xr.DataArray()
%% Cell type:code id:6783227d-9eda-4d61-98ab-a429873fe48a tags:
``` python
# 1. Generate a 3-dimensional xarray DataArray and print in in plain text view.
ntime, nlat, nlon = 3, 5, 2
mydata = np.random.random((ntime, nlat, nlon))
myda1 = xr.DataArray(mydata)
print(myda1)
```
%% Cell type:code id:2c832062-2276-4de6-9965-a34024315de2 tags:
``` python
# 2. Change the default dimension names and add coordinate values.
myda2 = myda1.rename({"dim_0":"time", "dim_1":"lat", "dim_2":"lon"})
tcoords = ["2022-03-01", "2022-03-02", "2022-03-03"]
# better alternatives:
#tcoords = xr.cftime_range(start="2022-03-01", periods=ntime, freq="D")
#tcoords = pd.date_range("2022-03-01", periods=ntime, freq="D") #-- with pandas
ycoords = [30, 31, 32, 33, "b"]
xcoords = np.linspace(1, 4, nlon)
myda3 = myda2.assign_coords(time=tcoords, lat=ycoords, lon=xcoords)
```
%% Cell type:code id:4499e621-f38d-4d4f-b421-ccb4e7678846 tags:
``` python
# 3. Print the coordinates and the dimensions and their types.
myda3.coords
myda3.dims
print(f"object type of coords = {type(myda3.coords)}")
print(f"dtype of latitude coordinate = {myda3['lat'].values.dtype}")
print(f"object type of dims = {type(myda3.dims)}")
# print the type of the data object and is data type
print(f"object type of data = {type(myda3.data)}")
print(f"data type of data = {myda3.lat.dtype}")
```
%% Cell type:code id:10c8e399-6de1-40c9-a8b0-b72237e477ee tags:
``` python
# 4. Add some attributes, including a standard_name attribute and a units.
# Tip: myDataArray.attrs["attribute_key"] = "attribute_value"
# For CF standard names, see https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
myda3.attrs["standard_name"] = "fire_temperature"
myda3.attrs["units"] = "K"
myda3.attrs["comment"] = "dummy data"
myda3.lat.attrs["standard_name"] = "latitude"
myda3.lat.attrs["units"] = "degrees_north"
```
%% Cell type:code id:b6262cdb-55ea-46f8-bb38-aa7edfb21992 tags:
``` python
# 5. Look at the new DataArray in html-view
# Tip: try also out myDataArray.head(2) to subset the first two elements of your DataArray.
myda3
myda3.head(2)
```
%% Cell type:markdown id:28343707-be55-4b67-9a46-66498a700a46 tags:
***
### C3 Showcase: Create a DataArray with metadata with a single DataArray() call
%% Cell type:markdown id:aab547d7-698f-4dc5-9622-cee17c94a1a3 tags:
When creating a DataArray with xr.DataArray(), you can directly include:
* name: Assign a name to the DataArray.
* dims: Specify dimension names (as tuple)
* coords: Define dimension values for alignment and indexing
(as dictionary with key=>dimension name; value=> array/list representing coordinate values)
* attrs: Attach metadata attributes to the DataArray (as dictionary)
%% Cell type:code id:8a1da962-beb0-4ac3-97cf-fdac4568bd9c tags:
``` python
da4 = xr.DataArray(data=ndarray,
name="var1",
dims=("lat","lon"),
coords={"lat": lats,
"lon": lons},
attrs={"standard_name":"age_of_sea_ice",
"units": "mm"})
```
%% Cell type:markdown id:38911a76-501f-487a-b9e9-55b17a603414 tags:
Note: The xr.DataArray() function does not allow for direct specification of coordinate metadata within the `coords` parameter.
These metadata attributes for coordinates have to be added afterwards.
%% Cell type:code id:95786b3a-9039-455d-9504-0341ca7da3b1 tags:
``` python
da4["lon"].attrs={"standard_name":"longitude", "units":"degrees_east"}
da4["lat"].attrs={"standard_name":"latitude", "units":"degrees_north"}
```
%% Cell type:code id:2cbd0c1d-f75c-4838-b451-03d9415c8477 tags:
``` python
da4
```
%% Cell type:markdown id:176dc1b3-ba4b-4472-95ee-843a6e138c39 tags:
***
%% Cell type:markdown id:4e2555e5-62a7-470c-a0ea-22154af2a8fa tags:
### C4 Exercise xr.DataArray()
%% Cell type:code id:676a6b1a-bf7c-4d00-b692-2b611177140e tags:
``` python
# 1. Create the same DataArray as in C2 with just one call of xr.DataArray
# except for the attachment of coordinate variable attributes
# TIP:
# da = xr.DataArray(
# data=ndarray, # The data array (e.g., numpy array)
# dims=(tuple...), # Tuple of dimension names
# coords={dictionary...}, # Dictionary of coordinates
# attrs={dictionary...} # Dictionary of attributes for the data array
# )
# 2. Try using a different dimension order when creating the DataArray.
# Does this work? what if no coordinates are specified?
```
%% Cell type:markdown id:5db43695-50eb-4175-8a83-2666fd4de808 tags:
<br>
#### C4 Solution Exercise xr.DataArray()
%% Cell type:code id:2b3a0fde-d734-465b-b16e-61e2018b6327 tags:
``` python
# 1. Create the same DataArray as in C2 with just one call of xr.DataArray,
myda4 = xr.DataArray(mydata,
dims=("time", "lat", "lon"),
coords={"time": tcoords,
"lat": ycoords,
"lon": xcoords},
attrs=myda3.attrs)
```
%% Cell type:code id:d497ee63-0818-4322-96e0-c5aa0333f4a2 tags:
``` python
# 2. Try using a different dimension order when creating the DataArray, does this work?
# To try it out, comment out the codeblock
code_block = '''
myda5 = xr.DataArray(mydata, dims=("time", "lon", "lat"),
attrs=myda3.attrs,
coords={"time": tcoords,
"lat": ycoords,
"lon": xcoords},)
'''
# Answer: raises an error because when coordinates are provided,
# Xarray checks that the shape of the data matches the size of the coordinates,
# and in this case, there is a mismatch
# If no coordinates are provided, dims just provides the names for the dimensions with no shape check.
myda5 = xr.DataArray(mydata, dims=("time", "lon", "lat"),
attrs=myda3.attrs)
myda5
```
%% Cell type:markdown id:f60b6143-b4dd-4abb-addc-d963986dfe1f tags:
***
%% Cell type:markdown id:cf9c606d-7fc9-4405-930d-2f41477371ab tags:
### C5 Showcase: Plotting functionality on DataArrays
%% Cell type:markdown id:6b0cb7af-b1d8-4fd2-84ff-3372ccbd4da6 tags:
xarray DataArray objects can be plotted by invoking the `plot()` method. <br>
Principally, it is possible to customize the plot by providing arguments. <br>
However, compared to more advanced plotting libraries like Matplotlib, there is only *a limited set of customization options*.
%% Cell type:markdown id:c0731464-5fc3-4a85-a159-231372c43e5a tags:
When plotting xarray DataArray objects using the `plot()` method, the default chart type is automatically determined based on the dimensionality of the data. <br>
You can also explicitly specify a chart type by using `plot.CHARTTYPE()`, with CHARTTYPE being line, hist, pcolormesh, scatter, etc...<br>
The default chart types are:
- 1-D data: Line2D plot &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; ==> equivalent to plot.line()
- 2-D data: QuadMesh plot (resembling a filled contour plot) &emsp;&emsp; ==> equivalent to plot.pcolormesh()
- other dimenionality: histogram plot &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; ==> equivalent to plot.hist()
see https://docs.xarray.dev/en/stable/generated/xarray.DataArray.plot.html
_**NOTE**_: By default, the `plot()` method call uses metadata provided by the DataArray for plotting the labels.
%% Cell type:markdown id:00f79a62-435a-4999-b038-cf9d863d4cac tags:
#### C5.1: Make a default chart type plot
%% Cell type:code id:84519740-713f-4810-a29f-c63fadaee760 tags:
``` python
#-- Check the dimensionality of the DataArray da4
da4.dims
```
%% Cell type:code id:9ca36b6c-2b96-4bba-9ab6-5f6bae5778bc tags:
``` python
#-- Create a default plot from DataArray da4 with plot()
da4.plot() #-- equivalent to da4.plot.pcolormesh() as da4 is 2-dim
```
%% Cell type:markdown id:64e58618-0db2-4e78-9b94-96d5768d1d1c tags:
#### C5.2: Make a histogram plot
%% Cell type:markdown id:6e3597a6-6a5c-45e3-b355-51a97aa26e79 tags:
The da.plot.hist() method returns a tuple of size 3, with
1. Values: counts for each bin
2. Edges: bin edges
3. Axes: Matplotlib axes object of the plot
This allows you extracting the computed values of the histogram from the xarray plot.hist() method.
%% Cell type:code id:831f6e51-593d-4f66-918a-5cbf16bc24cb tags:
``` python
#-- Create a histogram plot from DataArray da4
da4.plot.hist()
```
%% Cell type:code id:293493da-b0e6-45c6-8d82-6d83221a3acc tags:
``` python
#-- Create a histogram plot from DataArray da4, specify the bin edges and extract computed values
hist_values, hist_edges, ax = da4.plot.hist(bins=[0, 2, 4, 6])
print(hist_values)
```
%% Cell type:markdown id:346988a6-3260-4301-82b2-7f89e65d6632 tags:
#### C5.3: Make a scatter plot
%% Cell type:code id:04095e5a-8fa3-45a1-ad2b-2f0872da7fc2 tags:
``` python
#-- Note: plot.scatter() method on DataArray has many limitations.
#-- E.g. it always plots the variable’s values on the y-axis
#-- You can, however, specify which dimension is plotted on the x-axis by passing the dimension name as an argument.
da4.plot.scatter() #-- same as da4.plot.scatter(x="lat") or da4.plot.scatter(y="lat")
```
%% Cell type:markdown id:0f46accb-9758-41a9-8a84-bcc12cfca39f tags:
#### C5.4: Make a pcolormesh plot and customize it
%% Cell type:code id:d4364096-076b-47d9-87d4-a76391503cce tags:
``` python
#da4.plot.pcolormesh?
```
%% Cell type:code id:2f6c3e6d-9cf6-4a10-9efb-0eec0f9daee5 tags:
``` python
da4.plot.pcolormesh(figsize=(3, 2), cmap="Reds", add_colorbar=True,
cbar_kwargs={"label": "This is the colorbar label",
"shrink": 1.1})
#== Additional customizations using Matplotlib functions
#== since these are not accessible with the plot() function.
plt.title("mytitle");
plt.ylabel("latitude");
plt.grid("on");
```
%% Cell type:markdown id:1e64dde4-1cd9-4dee-b136-401e1e884e9c tags:
_**NOTE**_: Alternatively, you could e.g. also do plot.contour() or plot.contourf().
For available plotting methods, `dir(da4.plot)`
%% Cell type:markdown id:d606ad5b-5221-4283-adc1-48f2eda317fd tags:
#### C5.5: Make a line plot
%% Cell type:code id:ffb9a6be-560f-409f-bdd2-6e3db3bf2f7b tags:
``` python
#== Create a 1D DataArray by slicing da4 and plot a line plot
da4.lat.values
```
%% Cell type:code id:941cc526-bc72-4f3c-8f7f-1052ff265b38 tags:
``` python
da4[0,:].plot(figsize=(3, 2), color="purple", marker="o");
```
%% Cell type:markdown id:a0ba3fef-1a8f-4dab-8d49-f64d9cef9920 tags:
<br>
______________________________________________________________________
# D) xr.DataDataset() Showcases and Exercises
______________________________________________________________________
%% Cell type:markdown id:4ad1ddf4-041a-499f-a41d-68ee88ff147e tags:
***
### D1 Showcase: Convert an xarray.DataArray to an xarray.Dataset and modify metadata
%% Cell type:markdown id:c23d0e17-3729-4c3d-ba82-dfddcf33c82d tags:
#### D1.1: Create the Dataset with `.to_dataset()`
%% Cell type:markdown id:d0ec789f-0ee7-4cb1-b668-167616e7a899 tags:
You can transform an xarray DataArrays into a xarray Dataset by using the DataArray.to_dataset() function call.
%% Cell type:code id:8fadc3c7-8907-491d-92d8-aa38c6925e40 tags:
``` python
#== We use the xarray DataArray da4 from from the C3 Showcase
ds1 = da4.to_dataset()
```
%% Cell type:markdown id:25311196-60b3-40c0-87f1-7482e39d61c0 tags:
_**NOTE**_: converting an unnamed DataArray to a Dataset will raise a ValueError
unless an explicit name is provided an argument to the to_dataset() call
ds = da_unnamed.to_dataset(name="dsvarname")
%% Cell type:markdown id:4468f775-f55a-4930-bb1a-ab85d0a7a2e3 tags:
#### D1.2: Look at the Dataset
%% Cell type:code id:d665a100-55b9-4abe-885a-a5a0f0a1bc5f tags:
``` python
#== Print the xarray Dataset.
print(ds1)
```
%% Cell type:code id:b890f123-473b-46b0-b396-93bb8133c219 tags:
``` python
#== Print the input DataArray da4 for comparison
print(da4)
```
%% Cell type:markdown id:7b6033d9-cda5-4ac6-acf9-906ae6a22946 tags:
_**NOTE**_: The asterisk (*) signifies that a coordinate is a primary dimension coordinate.
%% Cell type:code id:5327da03-1de8-4bb9-90b7-cf9535f9b47a tags:
``` python
#== Show ds1 in Html view
```
%% Cell type:code id:f2196608-d203-4334-9d85-06aeed77ce51 tags:
``` python
ds1
```
%% Cell type:markdown id:5b994337-a640-401c-80bd-83f6e46fc2e7 tags:
_**NOTE**_: The indexes displayed in the HTML view highlight the internal structure and
are mainly useful for more advanced operations.
%% Cell type:code id:069f891b-71eb-464a-a81a-323a174df9a8 tags:
``` python
#== “The .info() method is used to display basic information on ds1 in a ‘ncdump’-like fashion.
#== Note: The .info() method is not available for DataArrays.
ds1.info()
```
%% Cell type:markdown id:6148641c-fee3-4cf1-a99e-587e8bc4b48c tags:
#### D1.3: Access and inspect individual components of the Dataset
%% Cell type:code id:605586cc-3e1a-4f9d-a2a6-2542345ed134 tags:
``` python
#== The data_vars attribute returns a container holding all the data variables in the dataset.
ds1.data_vars
```
%% Cell type:code id:481ec3cc-9eef-4def-895a-13b0288b98c6 tags:
``` python
#== Retrieves the names of all data variables in the Dataset as a list
list(ds1.data_vars)
# alternative
list(ds1)
```
%% Cell type:code id:94a18f3f-ebd4-41db-afe8-a6c22ea14f20 tags:
``` python
#== Retrieves the name of the first data variable and print the attributes of this first data variable
first_varname = list(ds1)[0]
print(ds1[first_varname].attrs) #-- same result as hard-coded print(ds1.var1.attrs)
```
%% Cell type:code id:d5e5bb6c-fabc-44e4-9ee2-15cb5cc69d9c tags:
``` python
#== Print the attributes of the var1 data variable
print(ds1.var1.attrs)
```
%% Cell type:code id:a8001303-bac5-47c6-8f74-e898029b58f1 tags:
``` python
#== Print the global attributes of the Dataset
ds1.attrs
```
%% Cell type:markdown id:5d40d18b-729f-4376-b027-184006d4761c tags:
#### D1.4: Add some global attributes to the Dataset
%% Cell type:code id:7ceb8fc3-2a39-4fe7-a9f3-87a1467813ce tags:
``` python
# Mandatory CF global attribute
ds1.attrs = {"Conventions": "CF-1.8"}
# Recommended global attributes
global_attrs_recommended = {
"title": "Your Dataset Title",
"institution": "Your Institution Name",
"source": "Data source description",
"history": "Description of processing history",
"references": "References for data or methodology",
"comment": "Any additional comments regarding the dataset"
}
# Append ds1 attributes with recommended attributes
ds1.attrs.update(global_attrs_recommended)
# Display the updated Dataset
ds1.info()
```
%% Cell type:markdown id:5a4e7573-ac8d-4e51-b7eb-24bd4a7651d8 tags:
***
### D2 Showcase: Construct an xarray Dataset with xr.Dataset()
%% Cell type:markdown id:e76b3222-c5ee-4201-8e9f-651d94e28eb7 tags:
#### D2.1: Create some dummy data
%% Cell type:code id:759302ad-9024-4c35-bfcd-cb0c7d006896 tags:
``` python
ntime, nlat, nlon = 7, 41, 21
# Create some dummy data for the data variables
mydata1 = np.random.rand(ntime, nlat, nlon) * 100 + 195.
mydata2 = np.clip((6.e+4 - (np.square(mydata1))) * 2.e-3, 0, 2.e+2) #-- Values outside 0 and 200 are set to zero
# Create some dummy data for the coordinate variables (see also C2)
t_coords = xr.cftime_range(start="1990-01-01", periods=ntime, freq="M")
y_coords = np.linspace(10, 20, nlat)
x_coords = np.linspace(-160, -120, nlon)
```
%% Cell type:markdown id:8abbc8a4-8495-49ba-aa96-22ded0069c28 tags:
#### D2.2: Create individual components of the Dataset
%% Cell type:markdown id:9787c220-1fa7-40e2-bb69-d9ae29b3f313 tags:
For simplicity, we will firstly create the individual components of the Dataset.
%% Cell type:markdown id:6eb7f0a8-efe3-487a-aff7-a5ebea0dcc2b tags:
(a)
Create a dictionary named data_vars that defines the data variables for the xarray Dataset.
The data variables share the same dimensions
The syntax is:
```python
my_data_vars = {
"var_name_1": (dimensions_1, data_1 [, attrs_1]),
"var_name_2": (dimensions_2, data_2 [, attrs_2]),
...}```
%% Cell type:code id:3beceb26-74cb-488b-9339-7dec33d090ad tags:
``` python
data_vars = {"temp": (["time","lat","lon"], mydata1, {"long_name": "Temperature", "units":"K"}),
"prec": (["time","lat","lon"], mydata2, {"standard_name": "thickness_of_rainfall_amount", "units":"m"})}
```
%% Cell type:markdown id:3ea161cf-4aad-445e-bc7f-782544aad02b tags:
(b)
Create a dictionary named coords that defines the coordinate variables for the xarray Dataset.
%% Cell type:code id:b9c130ba-574e-4fb8-9b26-749e0b162333 tags:
``` python
coords = {"time": t_coords,
"lat": y_coords,
"lon": x_coords}
```
%% Cell type:markdown id:6d455782-37d6-4251-b31a-1791eeb8bf75 tags:
(c)
Create a dictionary named attrs that defines the global attributes of the Dataset.
%% Cell type:code id:eb04fce1-122a-4d30-ba33-3c2d1bdec00a tags:
``` python
attrs = {"Conventions": "1.8",
"history": "created on YYYY-MM-DD",
"creator": "AH",}
```
%% Cell type:markdown id:1fffc34f-1710-40f4-88ed-1cd346e0d047 tags:
#### D2.3: Pass the individual components to the xr.Dataset() constructor call
%% Cell type:code id:14c482db-740d-46a8-8ee9-dfbf40ef4911 tags:
``` python
# Create the Dataset with data variables, coordinates, and dimensions
ds2 = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)
```
%% Cell type:markdown id:46b0cd85-681a-4956-b75d-40899e2bae01 tags:
#### D2.3: Inspect the Dataset
%% Cell type:code id:c238e08a-0361-4982-981e-f4567903d528 tags:
``` python
ds2
```
%% Cell type:code id:670920ca-d4d5-4e1b-b19a-9c12428e8788 tags:
``` python
#### D2.4: Access and inspect individual components of the Dataset
```
%% Cell type:code id:6261fd03-19cf-42b7-b343-12f111fdaeb4 tags:
``` python
#== List names of all data variables in the Dataset ds2.
list(ds2) #--same as list(ds2.data_vars)
```
%% Cell type:code id:79c008ca-77f3-4ad2-a87c-d3b32d2746e4 tags:
``` python
#== List names of all variables (e.g. data and coordinate variables) in the Dataset ds2.
list(ds2.variables)
```
%% Cell type:code id:81bb750f-ddc1-4c78-a063-a3eae89a0c96 tags:
``` python
#== Access the data variable temp and check object type
myvar = "temp"
ds2_temp = ds2[myvar]
type(ds2_temp)
```
%% Cell type:code id:a1dee167-53f6-4fdd-adfc-0bb2ac628224 tags:
``` python
#== Access the latitude coordinate variable associated with the temp variable.
ds2[myvar].lat #-- this is identical to ds2.lat since lat is a global coordinate variable within the dataset
```
%% Cell type:markdown id:b2d8c444-162b-420f-a1ed-c1018fc5065e tags:
#### D2.4: Add a data variable and remove it
%% Cell type:code id:7fb07107-1308-42de-9eef-cdd0c0beac00 tags:
``` python
ds2["newvar"] = ds2[myvar][:,0,:]
ds2
```
%% Cell type:code id:c5916c5e-760c-4b9b-8d9c-def930854d4f tags:
``` python
# ds2["newvar"].lat # -- won't work since # ds2["newvar"] doent have a lat dimension
```
%% Cell type:code id:048e197e-4e8d-4928-87e0-0a4ac93f3b68 tags:
``` python
#== Get specific information about the dimensions of a Dataset and the temp data variable
print(f"dimensions of Dataset: {ds2.dims}")
print(f"dimensions of data variable of Dataset (=> DataArray): {ds2[myvar].dims}")
```
%% Cell type:markdown id:55e34e30-74ad-47f5-a475-3230661d9e69 tags:
_**NOTE**_:
When you access the dimensions of the entire dataset using ds2.dims, a Frozen dictionary is returned which is regular dictionary, but immutable.
When you access the dimensions of a specific data variable (e.g., ds2[myvar].dims), a tuple of the dimension names for that particular variable is returned.
%% Cell type:markdown id:55c0d329-a335-4623-8837-87ef5048cf66 tags:
**How to remove the new data variable? What method is available for ds2 to do so?**
%% Cell type:code id:7d7cfb4a-e8a3-4692-abd6-e083f7babac6 tags:
``` python
#dir(ds2)
# or more specific using string comparison with a search term on the dir(ds2) output
mysearch_term = "var"
[item for item in dir(ds2) if mysearch_term in item]
```
%% Cell type:code id:aa78a9cb-3037-4f68-8941-19e21308b603 tags:
``` python
ds2.drop_vars?
```
%% Cell type:code id:10764121-d800-430a-941a-0c2f3993b38c tags:
``` python
ds2 = ds2.drop_vars("newvar")
ds2
```
%% Cell type:markdown id:2fe9f2dd-c701-4137-b1c1-bde828643ef2 tags:
***
### D3 Showcase: Construct an xarray Dataset with xr.Dataset() by merging two DataArrays
%% Cell type:code id:472a4098-89fc-4ab9-94ee-51bf36aa1775 tags:
``` python
#== Generate two DataArrays that differ in dimensionality and dimension size.
#== Use the data and coordinate variable values from D2
myda6 = xr.DataArray(mydata1,
name="var1",
dims=["time", "lat", "lon"],
coords=[t_coords, y_coords, x_coords]) #-- 3D DataArray
myda7 = xr.DataArray(mydata2[-1,1:-4,:],
name="var2",
dims=["lat2", "lon"],
coords=[y_coords[1:-4], x_coords],
attrs={"units":"m"}) #-- 2D DataArray
```
%% Cell type:code id:dc62ad35-29b3-40ad-9218-b3f123adfff1 tags:
``` python
#== Merge them with xr.merge(). This function call returns an xarray Dataset.
ds3 = xr.merge([myda6, myda7])
#== NOTE: Merging only works if each DataArray has a name attribute!
# As alternative, you can merge DataArrays with the xr.Dataset() constructor call with specifying the name like, e.g.
# ds = xr.Dataset({"airtemp": da5, "var": myda6})
# This will overwrite the name attribute if the DataArray already had one.
```
%% Cell type:code id:af24037b-a2d6-4bdf-9dcc-a4d6c0a2b7f7 tags:
``` python
print(ds3)
```
%% Cell type:markdown id:1adb8c2a-4efa-42ab-a032-d32478fe9a70 tags:
_**NOTE**_: You can also merge a DataArray to an existing Dataset with `xr.merge()`.
%% Cell type:code id:7804e5ed-81d1-493c-959c-865fc234de08 tags:
``` python
#== You can also merge a DataArray to an existing Dataset with `xr.merge()`.
ds3.data_vars
myda6.name
ds4 = xr.merge([ds3, myda6.rename("var3")])
print(ds4)
```
%% Cell type:code id:c2a49623-f551-4bda-bfde-71e35e3c89e3 tags:
``` python
ds4b = xr.merge([ds4,myda6[:,-1,:-2].rename("var4")])
```
%% Cell type:code id:44ceae39-a892-4045-814b-a8147200e516 tags:
``` python
print(ds4b)
```
%% Cell type:markdown id:4e3ba38f-7d5a-4417-b454-2e7b84cc26c2 tags:
<br>
______________________________________________________________________
# E) Indexing and slicing with xarray data: Showcases and Exercises
______________________________________________________________________
%% Cell type:markdown id:8ca2cef7-79dd-4a3e-b4f2-72d415dca6b4 tags:
**Overview of the indexing options** <br>
see also: https://docs.xarray.dev/en/stable/user-guide/indexing.html#
| Dimension lookup | Index lookup | DataArray syntax | Dataset syntax |
|:-----------------|:--------------|:-------------------------------------------------|:-------------------------------------------------|
| Positional | By integer | `da[0, :, :]` | not available |
| Positional | By label | `da.loc["2001-01-01", :, :]` | not available |
| By name | By integer | `da.isel(time=0)` or <br> `da[dict(time=0)]` | `ds.isel(time=0)` or <br> `ds[dict(time=0)`] |
| By name | By label | `da.sel(time="2001-01-01")` or <br> `da.loc[dict(time="2001-01-01")`] | `ds.sel(time="2001-01-01"`) or <br> `ds.loc[dict(time="2001-01-01")]` |
%% Cell type:markdown id:d2bf2c7b-8e77-43be-8a1b-87c4497c90a2 tags:
***
### E1 Showcase: Indexing and Slicing on xarray DataArrays
%% Cell type:markdown id:38b5c1cd-ffd4-4d0a-8998-4457c70413b6 tags:
For this showcase, we use the 2D DataArray myda7 from D3
%% Cell type:markdown id:11ddf0e1-b343-4f85-b693-91f0458ccb60 tags:
#### E1.1: Index positional, by integer
%% Cell type:markdown id:8a41bcec-889e-45c5-9917-bb364c0e04fd tags:
Indexing can be done by the index(es) of the dimension(s).
The general syntax for accessing values in a two-dimensional DataArray using two indices is:
`data_array [index_of_dim_0][index_of_dim_1]`
or
`data_array [index_of_dim_0, index_of_dim_1]`
<br>
<br>
The general syntax for accessing values in a two-dimensional DataArray using a range of indices along one or more dimensions (_slicing_).<br>
`data_array[start_index_0:end_index_0][start_index_1:end_index_1]`
or
`data_array[start_index_0:end_index_0, start_index_1:end_index_1]`
<br>
In two-dimensional arrays, the typical interpretation is that:
• _rows_ correspond to the _first dimension_ (dimension 0)
• _columns_ correspond to the _second dimension_ (dimension 1).
%% Cell type:code id:285652a6-8adf-477e-8ec0-5fcee1c3b65e tags:
``` python
#== Index the value from the myda7 DataArray at the index positions [1][0],
#== corresponding to the second row and the first column.
print(f"myda7.shape= {myda7.shape}, myda7.dims={myda7.dims}")
myda8 = myda7[1][0]
print(f"myda8.data= {myda8.data}")
print(f"myda8.shape= {myda8.shape}, since a a scalar (0-dimensional) is returned")
```
%% Cell type:code id:22f7b17d-8451-41b8-8ba9-bf28bd356282 tags:
``` python
#== When a single index is provided for slicing, it operates along the first dimension.
#== Using myda7[0], the entire first row (i.e., the first dimension) of myda7 is selected,
#== which corresponds to extracting the first latitudinal band of the data.
myda7[0].shape
#== The same result can be obtained with a less efficient command due to redundant slicing:
myda7[0][:].shape
```
%% Cell type:markdown id:f3b60e2c-caf7-4e69-834a-a96deea1e0f6 tags:
#### E1.2: Index by name, by integer
%% Cell type:markdown id:5c7d1fe5-406a-4adc-abd1-d83d392f125b tags:
Indexing with the `.isel()` method uses the dimension name and the integer index.<br>
Slicing with the `.isel()` method uses the dimension name and the integer index range indicated with `slice`.
%% Cell type:code id:98d4c335-da25-4641-a265-2c44b8e4b0a7 tags:
``` python
#== Select (index) the second row and the first column of the DataArray myda7
myda9 = myda7.isel(lat2=1, lon=0) #-- equivalent to myda7[1][0]
print(f"myda9.data= {myda9.data}")
print(f"myda9.shape= {myda9.shape}")
```
%% Cell type:code id:ba382700-641d-43d0-bda2-109f2dcd232e tags:
``` python
#== Select (slice) the first and second row and the second and third column of myda7.
myda10 = myda7.isel(lat2=slice(0,2), lon=slice(1,3)) #-- equivalent to myda7[0:2, 1:3]
print(f"myda10.data=\n {myda10.data}")
print(f"myda10.shape= {myda10.shape}")
print(f"\nCompare the original and sliced longitude coordinate values:")
print(f"myda10.lon.values=\n {myda10.lon.values}")
print(f"myda7.lon.values=\n {myda7.lon.values}")
print(f"myda7.lat2.values=\n {myda7.lat2.values}")
```
%% Cell type:markdown id:55ef5f86-3894-4c27-8863-9af0a3f7592f tags:
#### E1.3: Index by label, by name
%% Cell type:markdown id:0805cc04-a591-4822-9f78-24004942adec tags:
With the `.sel()` method, you can select specific values from a DataArray based on _coordinate values_.<br>
This can also be done with the `.loc[]` method.
%% Cell type:code id:ff260cdf-bb0a-44d6-be0f-d3147074f4af tags:
``` python
#== Selecting a value from myda7 at the coordinates lat2=13.5 and lon=-156
myda11 = myda7.sel(lat2=13.5, lon=-156.) # By default, sel requires exact matches.
# == Alternatively, you can use .loc[]
#myda11 = myda7.loc[{"lat2": 13.5, "lon": -156}]
```
%% Cell type:code id:5ed7c3ec-01e9-4cc3-93d7-6cdfe62e64a6 tags:
``` python
print(myda11)
```
%% Cell type:code id:fc26f1be-c990-4c19-a321-4efcf02ee4d7 tags:
``` python
#== Selecting a values from myda7 at the coordinates closest to lat2=14.69
# using nearest neighbor interpolation (method="nearest") with an optional tolerance of 1.0.
# The tolerance restricts the maximum distance for inexact matches.
mylat2=14.69
myda12 = myda7.sel(lat2=mylat2, method="nearest", tolerance=1.0)
#-- tolerance of 0.05 results in KeyError since no values found that match this criteria
print(f"myda12.shape= {myda12.shape}")
print(f"myda12.lat2.values= {myda12.lat2.values}")
```
%% Cell type:markdown id:74de761c-ab70-442c-b336-fa61d268bf91 tags:
_**Note**_: In contrast to `.sel()`, `.loc[]` does not offer specifying a method!
%% Cell type:markdown id:8b485da1-d397-4c8a-9ef5-422ef880c49d tags:
#### E1.4: Index positional, by label
%% Cell type:markdown id:4feb56d0-ca44-421c-988f-b99d911a3cdd tags:
With the `.loc[]` method, you can select specific values from a DataArray based on coordinate values.
%% Cell type:code id:7bcb76c2-cc47-486d-9c3c-950f0d7f87b0 tags:
``` python
myda13 = myda7.loc[13.5, -156]
```
%% Cell type:code id:fe2a1b3f-cdf2-4621-a6ed-9ba84cad2693 tags:
``` python
print(myda13) #-- same as myda11
```
%% Cell type:markdown id:303ce799-6f25-4c6d-b78f-9b3f82c28242 tags:
***
### E2 Showcase: Indexing and Slicing on xarray Datasets
%% Cell type:markdown id:5f73845e-2463-49ca-9b1d-fc3590c742bf tags:
Indexing resp. slicing on xarray Datasets can only be done
* _by name| by integer_
* _by name| by label_
__Note__: Using these methods directly on the Dataset affects all contained data variables, applying the selection along the shared coordinates.
%% Cell type:markdown id:f631e9e9-2e90-4e12-8c75-61ed3e987551 tags:
For demonstration, we use the Dataset ds3 from Showcase D3.
%% Cell type:code id:d4a0dd89-9de5-4840-a4b9-d348c0a3d8eb tags:
``` python
print(ds3)
```
%% Cell type:markdown id:a63d7566-bf71-46ff-ad28-4b92b9dc7319 tags:
#### E2.1: By name, by integer with `.isel()`
%% Cell type:code id:420c438a-96e8-404d-a4d9-fb67941d1446 tags:
``` python
ds3.isel(time=2, lat=slice(3,6), lon=slice(0,1))
```
%% Cell type:markdown id:03f27e2e-48c4-44da-b391-374efbfe8ace tags:
#### E2.2: By name, by label with `.sel()`
%% Cell type:markdown id:b8f52aab-1557-4d8a-ac37-7e61d4087f6a tags:
Using the .sel() method directly on the Dataset affects all contained data variables, applying the selection along the shared coordinates.
%% Cell type:code id:5c9ad0ac-3342-45bb-a88e-4f7c0088d742 tags:
``` python
# Extract data for the time step corresponding to 1990-03-31 from all variables in ds3
ds3.sel(time="1990-03-31") #-- returns a Dataset
```
%% Cell type:code id:ec321a82-e334-4b93-8d5e-5be1e9c18d3a tags:
``` python
#### E2.3: By name, by label with `.loc[]`
```
%% Cell type:code id:8e4d9626-b458-4e8e-8d05-71406df71c69 tags:
``` python
ds3.loc[{"time":"1990-03-31", "lon": slice(-150,-140)}]
```
%% Cell type:code id:d97449e2-4fc7-4a05-bdb7-ae6d8c8e87b1 tags:
``` python
#### E2.4: Indexing and slicing applied to a DataArray within the Dataset.
```
%% Cell type:code id:2c2cbec2-118c-490a-976f-436cda185dfc tags:
``` python
# Extract data from the variable var1 for the time step 1990-03-31
# and certain longitude and latitude coordinate value ranges.
# The slice method hence applies to the DataArray within the Dataset.
ds3.var1.sel(time="1990-03-31", lat=slice(12,13), lon=slice(-150, -147))
```
%% Cell type:markdown id:62498905-f033-4c0c-a28c-3ec1bd6f3b95 tags:
### E3 Exercise: Indexing and Slicing
%% Cell type:markdown id:9c22ebd0-af37-4643-93a7-c7784d79d333 tags:
Extract some data from the Dataset ds3 from Showcase D3 using<br>
1. .isel()
2. .sel()
3. .loc[]
Apply these functions both on the entire Dataset and data variables.
Which method do you like better `.sel()` or `.loc[]`?
%% Cell type:code id:3370f2ba-13c4-4336-afd8-07d06099b77c tags:
``` python
# 1
```
%% Cell type:code id:56992a3b-c3ee-46c3-bd2d-ea6acae788ab tags:
``` python
# 2
```
%% Cell type:code id:9b6f873b-6cb1-40c0-9322-7212d938310c tags:
``` python
# 3
```
%% Cell type:markdown id:edcaf3ab-4bc3-4520-ab00-06b92cc3ead9 tags:
#### E3 Solution Exercise
%% Cell type:code id:c374864d-09fa-4c28-a124-e5f47a2f5f68 tags:
``` python
# 1 on the Dataset
ds3.isel(lat=2,lon=0,time=1).values
```
%% Cell type:code id:93112261-32ca-4907-a17b-1c1922aa42a1 tags:
``` python
# 1 on a data variable
ds3.var2.isel(lat2=2,lon=0).values
```
%% Cell type:code id:9818597a-f2b6-453f-88da-f1a9f752d0e3 tags:
``` python
# 2 on the Dataset
ds3.sel(lat2 = 18.5, time = "1990-02-28")
```
%% Cell type:code id:e452778c-d4db-4435-9e98-a9b3342529e9 tags:
``` python
# 2 on the data variable
ds4.var2.sel(lat2 = slice(18,19.2))
```
%% Cell type:code id:2e8be05b-6657-4c4f-856d-07b7e2d1fb6b tags:
``` python
# 3 on the Dataset
ds4.loc[{"lat2": [10.5,18.25], "time": "1990-02-28"}]
```
%% Cell type:markdown id:bdc2e17f-0636-4e7a-a4b1-b73a8661a257 tags:
***
## F) Save, open, and read files with xarray
***
%% Cell type:markdown id:566e78fe-7c7d-41c1-a645-74a9aefe15ce tags:
***
### F1 Showcase: Saving xarray DataArrays or Datasets as netCDF file
%% Cell type:markdown id:b68c908b-b3ec-45d2-9bfd-04b08d50b677 tags:
`to_netcdf()` allows you to save a Dataset or DataArray to a netCDF file.<br>
The default mode is "w" (write), but you can specify other modes such as "a" (append) or "r" (read) as arguments to the method call.
%% Cell type:code id:fa43249d-bf4e-4196-8081-2d78f64a4d6c tags:
``` python
#== Save the xarray Dataset ds3 from Showcase D3.
ds3.to_netcdf("ds3.nc") #-- ("ds3.nc")
```
%% Cell type:code id:1c1afb8d-1433-4040-bdee-32ba1e9381fc tags:
``` python
#== Have an overview look at the netCDF file content with ncdump (Shell command available on Levante)
```
%% Cell type:code id:719e1bdc-33d9-4e7d-b18f-997efb317472 tags:
``` python
!ncdump -h ds3.nc
```
%% Cell type:code id:0d6f0c0c-e08f-49c9-90cd-996022f4a797 tags:
``` python
#== Slice the ds3 prior saving it to ds3.nc. Overwrite existing ds3.nc
ds3.isel(lat = slice(3,9), lat2 = slice(10,14)).to_netcdf("ds3.nc", mode="w")
```
%% Cell type:code id:94c5b03e-b362-495c-b543-2787a8cd06a8 tags:
``` python
!ncdump -h ds3.nc
```
%% Cell type:markdown id:aaaf5cba-b948-4e3e-8c99-cb975d88a71e tags:
***
### F2 Showcase: Open and read a netCDF file with xarray
%% Cell type:markdown id:a0188112-5f2f-49a0-9bcc-46919e05c49d tags:
xarray provides the function `xr.open_dataset()` to open a file with the file format netCDF, GRIB, HDF5, or Zarr.
Default format is netCDF.<br>
`ds_in = xr.open_dataset("infile.nc")`<br>
is the same as<br>
`ds_in = xr.open_dataset("infile.nc", engine="netCDF4")`
%% Cell type:markdown id:61db211a-a491-45c7-a55b-6ac77386d5e0 tags:
<div class="alert alert-info">
<b>xarray is Lazy:</b> By default, xarray performs lazy loading: With open_dataset, it only loads the metadata <br>
(such as variable names, dimensions, and attributes) from the netCDF file into memory, without loading the actual data values.<br>
The data values are loaded into memory only when you explicitly access them or perform operations that require accessing the data.<br>
<b>Lazy loading helps saving memory and enhancing the performance.</b>
</div>
%% Cell type:code id:7f2dc515-49b3-4cdc-89fe-81131bdf520b tags:
``` python
#== Open (lazy-loading) the netCDF file created in Showcase 13.
xr.open_dataset("ds3.nc")
```
%% Cell type:code id:15f07932-b524-4212-b8b0-4853f13f1709 tags:
``` python
#== Open the netCDF file created in Showcase F1 and load the data into memory.
ds5 = xr.open_dataset("ds3.nc").load()
```
%% Cell type:code id:87065f99-cffd-40b5-8831-7fad9686c293 tags:
``` python
ds5.info
```
%% Cell type:code id:1c3ba127-ba60-4b15-bd73-5e1b4d88f791 tags:
``` python
#== In case you wish to remove the reference to the Dataset ds5, use:
# del(ds5)
```
%% Cell type:code id:1ad5e56e-190c-44b2-af52-2243f7b4c4b2 tags:
``` python
#== Open ds3.nc and then lazily loading only the variable "var1" it.
ds6 = xr.open_dataset("ds3.nc")[["var1"]] #-- [["var1", "var2"]] if two variables
```
%% Cell type:code id:bd175f57-528d-4c01-b1fd-63c85cabdda4 tags:
``` python
ds6
```
%% Cell type:markdown id:2424d12e-3433-42f2-9d39-9edd24d23290 tags:
***
### F3 Showcase: Open and read a GRIB file with xarray
%% Cell type:markdown id:c496f4c5-f38e-44a9-80f1-ffc8ba003fce tags:
By default, the xr.open_dataset() constructor call selects the _engine_ (_i.e. library or backend used to read the file_) to open the file
based on the file name extension or attempts to infer it the file format based on the file contents. <br>
For GRIB files, the engine is usually `cfgrib`.
%% Cell type:markdown id:b4a0b3a0-7cfa-4651-889b-4f00a283af94 tags:
_**NOTE**_: when opening a GRIB file with cfgrib as the engine in xarray, an index file (.idx) is typically created or used if available. If an available index file is incompatible with the GRIB file, xarray issues a warning indicating that the index file is being ignored.
%% Cell type:code id:81ffa90e-d4b8-46e9-a361-5a08bfa51f99 tags:
``` python
ds7 = xr.open_dataset("../data/MET9_IR108_cosmode_0909210000.grb2")
# same as
# ds7 = xr.open_dataset("../data/MET9_IR108_cosmode_0909210000.grb2", engine="cfgrib")
```
%% Cell type:code id:d11b846a-c077-4afa-ac08-8208bf8d9cfc tags:
``` python
#-- List the data variables
print(f"{ds7.data_vars} \n {ds7.coords}")
```
%% Cell type:markdown id:69443de1-d3cd-4f80-a76e-9a392dd73d11 tags:
***
### F4 Showcase: Open and read a multiple netCDF files at once with xarray
%% Cell type:markdown id:43ffefd1-b40b-4c9e-8901-3f127d1b2e9e tags:
*xarray* provides the function `xr.open_mfdataset()` to read multiple files in one step as a single Dataset.<br>
When using `xr.open_mfdataset()`, it recommended to have **dask** enabled (by having dask installed in your environment).<br>
**Dask** is utilized to load and process the data in a parallelized manner.
%% Cell type:markdown id:f5559513-bb2b-43d6-9054-4438261d2619 tags:
For demonstration, we open three netCDF files _precip_day01.nc, precip_day02.nc, and precip_day03.nc_, each containing the data of one day in 6 hour intervals.
%% Cell type:code id:1cc4ab67-5444-4c1b-bfe7-6733e94a96f8 tags:
``` python
#== Open the files with xr.open_mfdataset
ds8 = xr.open_mfdataset("../data/precip_day*.nc")
```
%% Cell type:markdown id:d3cc7d3b-b518-4c4c-90c1-8dae6025d314 tags:
_**NOTE**_: By default, xr.open_mfdataset will concatenate the dimensions of the Datasets along which they overlap.
In this example, the data are concatenated along the time dimension.
%% Cell type:markdown id:8592569c-470c-4fa2-bbc1-428c30c1c366 tags:
<div class="alert alert-info">
<b>Dask is Very Lazy!</b> <br>
When Dask is used to load the data, the data are loaded as dask.array objects.
</div>
%% Cell type:code id:7ef53a23-a737-48a8-aaa3-f73467276e4e tags:
``` python
#== Look at the description of the data variable of ds8.
#== You will see that precip is represented as dask.array object.
ds8
```
%% Cell type:code id:f52cfbbb-e6f1-4beb-81f8-c5df80e9cbc4 tags:
``` python
#== Indexing dask.array objects in ds8 will not directly display the exact values,
#== but instead provide a preview of the data.
#== To access a specific point in the array, you need to load the data into memory first.
ds8.precip[1,4,5]
```
%% Cell type:code id:be82b529-91ec-4106-ab5f-296fae96a563 tags:
``` python
#== Loading the accessed data point as xarray DataArray.
ds8.precip[1,4,5].load()
```
%% Cell type:code id:0d740310-1c01-4360-a10e-e07287c7153e tags:
``` python
#== Alternative: loading the accessed data point as numpy array.
ds8.precip[1,4,5].values
```
%% Cell type:code id:74da3f4d-efa8-404a-8835-a34acb65ac04 tags:
``` python
#== You can also load the entire Dataset or individual data variables.
ds8.load().head(2)
# or
#ds8.precip.load().head(2)
```
%% Cell type:markdown id:e3e2cb78-16a5-408e-9972-009a24d6728e tags:
<br>
***
### F5 Exercise: Open a netCDF file with `xr.DataArray`
<br>
%% Cell type:code id:300aba36-5fea-4a1e-bf0a-cbb794cbe21e tags:
``` python
# 1. Open the file "../data/rectilinear_grid_2D.nc" with xarray
```
%% Cell type:code id:cfb0bb07-a02a-4dbf-8b90-993bba39b55b tags:
``` python
# 2. List all data variables of the Dataset as a list
```
%% Cell type:code id:83afc2ad-daf3-485b-bcc4-75ecf430eba0 tags:
``` python
# 3. List all data variables and coordinate variables of the Dataset as a list
```
%% Cell type:code id:d935b15e-23f0-4bbe-ac76-16d83c65eaee tags:
``` python
# 4. Lazy load the first variable of the list, print it with head(2)
# Then load it into memory while assigning it to a variable da_rg.
# Print it again with head(2)
```
%% Cell type:code id:429b43e4-2321-4cf0-bc58-48bd459468f3 tags:
``` python
# 5. Plot the first timestep of da_rg
```
%% Cell type:markdown id:60b1ce25-8fcb-49b4-a3a1-758b174363ad tags:
#### Solution Exercise 5
%% Cell type:code id:e87e1214-104e-42ae-9d1d-98808a2e2270 tags:
``` python
# 1.
ds9 = xr.open_dataset("../data/rectilinear_grid_2D.nc")
```
%% Cell type:code id:8e257ce3-11b0-4384-be47-e53a9cd423d0 tags:
``` python
# 2.
list(ds9.data_vars)
list(ds9.keys())
list(ds9)
```
%% Cell type:code id:5890c3f2-9810-4657-a441-640efbd9c656 tags:
``` python
# 3.
list(ds9.variables)
```
%% Cell type:code id:9d657253-7029-48d8-a661-1b36bcc1fe4a tags:
``` python
# 4.
ds9[list(ds9)[0]].head(2)
print("*"*100)
da_rg = ds9[list(ds9)[0]].load()
da_rg.head(2)
```
%% Cell type:code id:f2b329a4-2bad-4960-8659-2377e99e6651 tags:
``` python
# 5.
da_rg.isel(time=0).plot();
```
%% Cell type:markdown id:10b0181a-9436-40fb-b771-9ff8ff8d9773 tags:
***
## G) Appendix
***
%% Cell type:markdown id:bfe73a3d-f6d1-4f2c-bfb8-0862cf88449f tags:
***
### G1 Showcase: From numpy arrays to xarray DataArrays
%% Cell type:markdown id:306dcc84-6851-412e-a682-d76ae5140e94 tags:
You can directly convert a numpy array into an xarray DataArray type by using it as input for xarray"s function `xr.DataArray`.
%% Cell type:markdown id:f97a43cf-b52c-435c-8d8c-7cce87240b24 tags:
We use the _atmosphere water vapor content_ data from the file `../data/prw.dat` by loading it with numpy.
%% Cell type:code id:56fb99e3 tags:
``` python
#== Show the first 5 lines of the ascii input file.
!head -5 ../data/prw.dat
```
%% Cell type:code id:a21e702a-ad00-4b37-a0fc-9396c9e831f5 tags:
``` python
#== Read columns 1 to 3 of the input file while skipping the header.
prw_data = np.loadtxt("../data/prw.dat", usecols=(1,2,3),skiprows=1) #-- data are read in as float64
prw_stations = np.loadtxt("../data/prw.dat", usecols=0, skiprows=1, dtype="U10") #-- data are read in as string
```
%% Cell type:code id:169713cf-7534-412a-9dbe-689273cc2ef4 tags:
``` python
#== Get the shape of the numpy ndarray prw_data.
prw_data.shape
```
%% Cell type:code id:0bb6b812-310d-4fcf-8bc7-f7e80d04349e tags:
``` python
#== Get the first 4 rows of the prw_data.
prw_data[:4, :]
```
%% Cell type:code id:c0e6134e-fdd4-4433-8541-e6ddf14bbe88 tags:
``` python
#== Convert the numpy ndarray prw_data into an xarray DataArray.
prw_da1 = xr.DataArray(prw_data)
```
%% Cell type:code id:dff469f0-9b16-4de3-9bc6-1359b2dead4e tags:
``` python
#== Get the first four elements (rows) of the xarray DataArray using slicing.
prw_da1[:4, :]
```
%% Cell type:markdown id:f74124c7-7eac-421c-a055-ac1aa8e16399 tags:
The comparison of `prw_da1[:4, :])` and `prw_data[:4, :]` illustrates that when slicing a xarray DataArray,the structure and metadata are preserved. In contrast, slicing a numpy array results in a new numpy array without additional metadata.
%% Cell type:code id:a3b2fbc0-e14a-4a39-b2f3-341ebda4a564 tags:
``` python
# As an alternative to prw_da1[:4, :], you can use the functionality `head` on the xarray DataArray.
# This functionality is not available for a numpy ndarray.
# prw_da1.head(4)
```
%% Cell type:code id:624a60e6-b2a9-4612-bcfd-60dcdc3f22a5 tags:
``` python
print(f"type of prw_data: {type(prw_data)}; type of prw_da1: {type(prw_da1)};")
print(f"the data structure wrapped in the xarray DataArray is: {type(prw_da1.data)}")
print(f"the data type of the elements of the wrapped data is: {prw_da1.data.dtype}")
```
%% Cell type:markdown id:6cbfc94d-9f3e-4a5d-b6d3-6a0881afc976 tags:
Unlike a numpy array, the xarray DataArray can differentiate the variable of interest as a *data variable* (`prw_da1.data`) from *coordinate* variables. This is because the xarray DataArray has the following components:
- **dimensions**: Named dimensions that define the structure of the array (`prw_da1.dims`).
- **coordinates**: Variables associated with each dimension, providing context and labeling (`prw_da1.coords`).
- **attributes**: Additional metadata describing the array or its components (`prw_da1.attrs`).
%% Cell type:code id:c18c8d09-e4f1-4ca5-a7cf-57705c8d2b59 tags:
``` python
#print(f"data variable: {prw_da1.data}")
print(f"coordinate variable: {prw_da1.coords}")
print(f"dimension names: {prw_da1.dims}")
print(f"data variable name: {prw_da1.name}")
print(f"data variable metadata: {prw_da1.attrs}")
```
%% Cell type:markdown id:0a613a89-fb8a-4726-9f9f-8f5f9c3720a2 tags:
When created without specific parameters in the xr.DataArray function call, prw_da1 defaults to empty coordinates, default dimension names ("dim_0", "dim_1"), and lacks a data variable name or associated metadata.
%% Cell type:markdown id:9bfca75f-a2a6-4435-bec2-29772d2d164a tags:
***
### G2 Showcase: Specify the structure/parameters during the `xarray.DataArray()` function call
%% Cell type:markdown id:945a83fa-dfb1-429f-80c1-5e410db2426f tags:
Create a new xarray DataArray called prw_da2:
1. The **data variable** corresponds to the first column of the numpy array `prw_data`.
2. We have one dimension (**dims**) which represents individual stations; we name it **_Station_**. <br>
By default, it is an index running from 0 to the length of a column minus 1.
3. The **coords** are the first and second columns of the numpy array `prw_data`. We want to call them `lat` and `lon`. <br>
They have the same dimensions as the data array, namely **_Station_**.
The general syntax for declaring the coordinate is: <br>
`coords = {coords_name: (dimension_name, coordinate_values)}`
4. The **name** of the data variable is **prw**.
5. In the **attrs**, we can store variable attributes like **_units_**. The **standard_name** of prw is **_atmosphere_mass_content_of_water_vapor_**; <br> the corresponding canonical units are **_kg m-2_**.
%% Cell type:code id:6f65ead5-64a1-48c4-b981-7cf253a7c748 tags:
``` python
prw_da2 = xr.DataArray(prw_data[:,2],
dims=["Station"],
coords={"lat":("Station", prw_data[:,0]),
"lon":("Station", prw_data[:,1])},
name="prw",
attrs={"units":"kg m-2",
"standard_name":"atmosphere_mass_content_of_water_vapor"})
```
%% Cell type:code id:9831c927-346e-4f1d-80d0-0f910be7856d tags:
``` python
prw_da2
```
%% Cell type:code id:588b23fa-9c50-40bc-9a58-2dd4d50cb7fa tags:
``` python
print(f"data variable (sliced): {prw_da2.data[:8]}")
print(f"dimension names: {prw_da2.dims}")
print(f"coordinate variable:\n {prw_da2.coords}")
print(f"name of DataArray (coresponding to data variable name): {prw_da2.name}")
print(f"metadata of DataArray (corresponding to data variable metadata):\n {prw_da2.attrs}")
```
%% Cell type:code id:39ea8ad8-345f-4b7c-84ea-d85f1b257918 tags:
``` python
print(f"shape the DataArray: {prw_da2.shape}")
print(f"size of the DataArray: {prw_da2.sizes}")
# Note: sizes returns a eturns an immutable dictionary-like object, often referred to as a "frozen" object in xarray.
```
%% Cell type:code id:6689de19-4091-40d7-b32e-13e7fcf7f189 tags:
``` python
prw_da2.plot(figsize=(3, 2));
```
%% Cell type:markdown id:4e8900b7-b4ef-4512-b161-3513be782a78 tags:
***
### G3 Exercise: xr.DataArray
%% Cell type:code id:e0717073 tags:
``` python
# 1. Create a two dimensional numpy called prw_data_2d with the size `len(prw_data)` x `len(prw_data)`
# and assign `NaN` values to the entire array.
# Use np.Nan and np.full() or np.empty().
```
%% Cell type:code id:4baf6447-377d-4abb-88ac-4dbc55efa73c tags:
``` python
# 2. On the diagonal of the quadratic array, insert the values of prw_data. Use a for loop.
```
%% Cell type:code id:28e4d302-85ab-4d2f-a67c-0b0e474e4d96 tags:
``` python
# 3. Convert the numpy array to a xarray DataArray named prw_da3.
```
%% Cell type:code id:34840ed8-55d0-40ab-9e43-08cb415216cf tags:
``` python
# 4. Show prw_da3 by plotting.
```
%% Cell type:markdown id:7271d6a0-f926-4e0f-b08c-96acb3a392ec tags:
#### G3 Solution Exercise
%% Cell type:code id:2dbcd161-a5a2-4d98-a6fd-17bfb44ec015 tags:
``` python
# 1.
#prw_data.shape
prw_data_2d = np.full([len(prw_data), len(prw_data)], np.nan) #-- np.full creates an array with a specified value
# Alternatives:
# prw_data_2d = np.empty([len(prw_data),len(prw_data)]) * np.nan
```
%% Cell type:code id:daf21374-4bb0-4305-9479-853d90b422f3 tags:
``` python
# 2.
for i in range(0, len(prw_data)):
prw_data_2d[i,i] = prw_data[i,2]
```
%% Cell type:code id:beb4c823-b603-4232-b4c6-cdf8cdce5533 tags:
``` python
# 3.
prw_da3 = xr.DataArray(prw_data_2d)
```
%% Cell type:code id:b97580ea-645d-4d83-924d-0d43dfce82b5 tags:
``` python
# 4.
prw_da3.plot();
```
%% Cell type:markdown id:a03fe6f2-dabe-4075-bb6b-babb461d0d21 tags:
***
### G4 Showcase: Add a new coordinate variable to an xarray DataArray
%% Cell type:code id:131bdc62-72fa-4855-9672-5c0d0e2c6168 tags:
``` python
#== Add a new coordinate variable named "station_name" in the prw_da2 DataArray.
#== This new coordinate variable contains the station names provided in the prw_stations array.
prw_da2["station_name"] = xr.DataArray(prw_stations, dims="Station")
prw_da2.head(5)
```
%% Cell type:code id:cdf8fc95-481d-479d-a3a5-07391939961f tags:
``` python
#== When plotting, we can now use the new coordinate variable to label the x-axis.
prw_da2.plot(x="station_name", figsize=(16,4));
plt.xticks(rotation=90);
```
%% Cell type:markdown id:a8140a54-ba7e-4ca2-b716-b33c0d5089f1 tags:
### G5 Showcase: Gridding station data with `xr.DataArray`
To put the station data prw_data from showcase 1 on a regular lat lon grid, use the following steps:
%% Cell type:code id:fb50f9bb-2f3a-481f-bf84-71b557dc7126 tags:
``` python
#== 1. Define the latitudes and longitudes for the regular grid using np.linspace,
# with the minimum and maximum values of latitude and longitude from the station coordinates as bounds.
nlat, nlon = 25, 50
latitudes = np.linspace(min(prw_data[:,0]), max(prw_data[:,0]), num=nlat)
longitudes = np.linspace(min(prw_data[:,1]), max(prw_data[:,1]), num=nlon)
```
%% Cell type:code id:8df7b816-5ca3-4d6a-ae28-1589f18790cd tags:
``` python
#== 2. Create a 2D grid of NaN values with the size (nlat,nlon).
data_grid = np.full((nlat, nlon), np.nan)
```
%% Cell type:code id:adb6c672-dba4-49ec-b203-19c2a393fa8d tags:
``` python
#== 3. Find the indices for assigning data values to the grid.
lat_indices = np.searchsorted(latitudes, prw_data[:,0])
lon_indices = np.searchsorted(longitudes, prw_data[:,1])
```
%% Cell type:code id:cc4f13b1-52bd-4714-8dd5-c0cbaea621d4 tags:
``` python
#== 4. Assign the data values to the corresponding grid points.
data_grid[lat_indices, lon_indices] = prw_data[:,2]
```
%% Cell type:code id:5455c807-8de5-48f7-b63c-ef9f3dc836c2 tags:
``` python
#== 5. Create the DataArray with latitude and longitude coordinates.
pwr_da4 = xr.DataArray(data_grid,
coords={"lat": latitudes, "lon": longitudes},
dims=["lat", "lon"])
```
%% Cell type:code id:8bc4cfd3-0449-440e-a982-d4340aee0582 tags:
``` python
#== 6. Plot the new DataArray
pwr_da4.plot();
```
%% Cell type:markdown id:ca3fa0a8-05a5-4d3e-9004-bd8127dcf9f9 tags:
### G6 Showcase: Combine of xarray plotting with more advanced matplotlib/cartopy features for creating maps
%% Cell type:markdown id:524efad7-29fd-4ce8-852b-a0417d7fe7d4 tags:
In this showcase, we use the regularly gridded station data pwr_da4 from Showcase G5.
%% Cell type:code id:50700c8c-a189-4ca3-9438-b123813c3321 tags:
``` python
import cartopy.crs as ccrs
proj = ccrs.PlateCarree() # choose map projection
fig, ax = plt.subplots(figsize=(7, 9), subplot_kw={"projection":proj})
ax.set_extent([-102, -90, 29, 41], proj)
img = ax.stock_img() # add satellite image as background
img.set_alpha(0.4) # adjust background image transparency
ax.gridlines(draw_labels=True, color="None", zorder=0) # turn on axis label, turn off gridlines
ax.coastlines() # add coastlines
pwr_da4.plot(cmap="turbo", cbar_kwargs={"shrink": 0.5, "pad": 0.1});
```
%% Cell type:markdown id:1bdff15d-30c9-4d11-af8a-4b3ec9b3cc77 tags:
<br>
***
### G7 Exercise: `xr.DataArray` Modify pwr_da4
<br>
%% Cell type:code id:d8de744d-f91c-4d20-b3b5-96baba2facf2 tags:
``` python
# 1. print the variable attributes and add a variable long_name attribute
```
%% Cell type:code id:56cc7d97-ee01-4bbf-a0c6-65f1171574b4 tags:
``` python
# 2. print the variable name and change it
```
%% Cell type:code id:6ebad36e-a623-43ca-a8dd-c8316f1a84db tags:
``` python
# 3. print the dimension names.
# rename them into lat1 and lon and assign the returned object as pwr_da5
```
%% Cell type:code id:15c68af8-08cb-4cd5-89c6-c716140b800f tags:
``` python
# 4. add a standard_name and units attribute to the coordinate variable lon1 in pwr_da5
# and print the new attributes
```
%% Cell type:code id:7585ab2f-97c6-4ef5-a7a5-bc1f182965d4 tags:
``` python
# 5. exchange the lat1 coordinate variable values by a numpy array;
# print the first 5 values of the new lat1 coordinate variable
```
%% Cell type:code id:3b92e684-221c-441d-86d6-28306e58e491 tags:
``` python
# 6. set the values of the first 6 rows and first 6 columns of
# the pwr_data_da4 DataArray to -50
```
%% Cell type:code id:80b66dc3-905e-45ed-8adf-481f0bb9e4b2 tags:
``` python
# 7. plot the modified DataArray
```
%% Cell type:markdown id:4987e40e-fcf0-4e50-8afd-a7c67ee5cb8d tags:
#### G7 Solution Exercise
%% Cell type:code id:7f3ba964-0336-4001-9c8c-d67acd4bc1e5 tags:
``` python
# 1.
print(f"variable attributes is: {pwr_da4.attrs}")
pwr_da4.attrs={"long_name": "water vapor content"}
```
%% Cell type:code id:f4bf07af-ad05-4c6c-9b86-6fc3e32e599a tags:
``` python
# 2.
print(f"variable name is: {pwr_da4.name}")
pwr_da4.name = "prw"
```
%% Cell type:code id:634519ad-778b-41fe-836d-41fa1b05e72a tags:
``` python
# 3.
print(f"dimension names are: {pwr_da4.dims}")
pwr_da5 = pwr_da4.rename({"lat": "lat1", "lon": "lon1"})
print(f"dimension names are: {pwr_da5.dims}")
```
%% Cell type:code id:7db4bf3e-a677-4bce-9e9b-3b9f59cec5a1 tags:
``` python
# 4.
pwr_da5["lon1"].attrs={"standard_name": "longitude", "units":"degrees_east"}
pwr_da5.lon1.attrs
```
%% Cell type:code id:666f635d-e91d-45da-8c2e-5e352fd4a35a tags:
``` python
# 5.
newlats = np.linspace(0,10, pwr_da5.sizes["lat1"])
pwr_da5.coords["lat1"] = newlats
print(f"first 5 values of new lat1 coordinate variable are: {pwr_da5.lat1.head(5).values}")
```
%% Cell type:code id:36e9732f-6e09-49ae-aecd-47be33c2485b tags:
``` python
# 6.
pwr_da5[:6, :6] = -50
```
%% Cell type:code id:b3e77203-d69e-4b36-9f6c-52bbad99310e tags:
``` python
# 7.
pwr_da5.plot();
```
%% Cell type:code id:e9fe3bc1-3a11-400f-ad52-cf70230e40f8 tags:
``` python
```
%% Cell type:markdown id:61d518b4-7410-46f3-9480-cee9bb1e06e5 tags:
***
### G8 Showcase: Expand the dimensions of a xarray DataArray
%% Cell type:markdown id:f109a858-3383-405f-8794-eaf160745328 tags:
With `DataArray.expand_dims()`, you can add a new dimension, e.g. time, to a DataArray A. This function call returns a new DataArray object.
In the new, expanded DataArray B, the original values of A are repeated along the new dimension (e.g. time). In other words, the original values of A are broadcasted along this new dimension to match the shape of the resulting DataArray B.
%% Cell type:code id:f1e51bca-4e86-42dc-9915-89054f10e6ec tags:
``` python
#== Add a time dimension with values 1 and 2 to the pwr_da5 DataArray.
time = [1, 2]
pwr_da6 = pwr_da5.expand_dims({"time":time}, axis=0)
# Note: axis=0 ensures that the new dimension is inserted as the first dimension,
# so that the resulting dimension order is time,lat,lon.
```
%% Cell type:code id:50af7478-0672-48b3-a5a8-6df1f8202aa8 tags:
``` python
#== Compare the dimensionality and shape of the old and new DataArray:
print(f" dims before: {pwr_da5.dims}\n dims after: {pwr_da6.dims}")
print("===="*20)
print(f" shape before: {pwr_da5.shape}\n shape after: {pwr_da6.shape}")
print("===="*20)
print(f" sizes before: {pwr_da5.sizes}\n sizes after: {pwr_da6.sizes}")
```
%% Cell type:code id:6da14987-2dcb-4291-bb57-efa2e88e7c7e tags:
``` python
#== Compare if the data values at the first time step are equal to the second time step.
import numpy.testing as npt
try:
npt.assert_array_equal(pwr_da6.isel(time=0).data, pwr_da6.isel(time=1).data)
print("Equal")
except AssertionError:
print("Not equal")
```
Loading