Rechunking NetCDF data.
Rechunking of exsisting netcdf files to an optimal chunk size. This code provides a simple command line interface (cli) to rechunk existing netcdf data to an optimal chunksize of around 128 MB.
Installation
To install the cli simply use the following pip command:
pip install (--user) https://gitlab.dkrz.de/ch1187/rechunk-data/-/archive/2206.0.2/rechunk-data-2206.0.2.zip
User the --user
flag if you do not have super user rights and are not using anaconda
, pipenv
or virtual env
Usage
Using the python module
from rechunk_data import rechunk_dataset
import xarray as xr
dset = xr.open_mfdataset("/data/*", parallel=True, combine="by_coords")
new_data = rechunk_dataset(dset)
Using the command line interface:
rechunk-data --help
usage: rechunk-data [-h] [--output OUTPUT] [--netcdf-engine {h5netcdf,netcdf4}] [-v] [-V] input
Rechunk input netcdf data to optimal chunk-size. approx. 126 MB per chunk
positional arguments:
input Input file/directory. If a directory is given all ``.nc`` files in all sub directories will be processed
optional arguments:
-h, --help show this help message and exit
--output OUTPUT Output file/directory of the chunked netcdf file(s). Note: If ``input`` is a directory output should be a
directory. If None given (default) the ``input`` is overidden. (default: None)
--netcdf-engine {h5netcdf,netcdf4}
The netcdf engine used to create the new netcdf file. (default: h5netcdf)
-v
-V, --version show program's version number and exit
You can either use the cli in various ways:
- specified
input
-output
file pairs. Hereinput
andoutput
have to be files. - all files within an
input
directory will are stored in anoutput
directory. Hereinput
andoutput
have to be directories. -
override a specified
input
file, or override all files within aninput
directory. Here omit the--output
flag.
Support
If you need help submit an issue in the gitlab repository.