Skip to content
Snippets Groups Projects

Rechunking NetCDF data.

Rechunking of exsisting netcdf files to an optimal chunk size. This code provides a simple command line interface (cli) to rechunk existing netcdf data to an optimal chunksize of around 128 MB.

Installation

To install the cli simply use the following pip command:

pip install (--user) https://gitlab.dkrz.de/ch1187/rechunk-data/-/archive/2206.0.2/rechunk-data-2206.0.2.zip

User the --user flag if you do not have super user rights and are not using anaconda, pipenv or virtual env

Usage

Basic usage:

rechunk-data --help
usage: rechunk-data [-h] [--output OUTPUT] [--netcdf-engine {h5netcdf,netcdf4}] [-v] [-V] input

Rechunk input netcdf data to optimal chunk-size. approx. 126 MB per chunk

positional arguments:
  input                 Input file/directory. If a directory is given all ``.nc`` files in all sub directories will be processed

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT       Output file/directory of the chunked netcdf file(s). Note: If ``input`` is a directory output should be a
                        directory. If None given (default) the ``input`` is overidden. (default: None)
  --netcdf-engine {h5netcdf,netcdf4}
                        The netcdf engine used to create the new netcdf file. (default: h5netcdf)
  -v
  -V, --version         show program's version number and exit

You can either use the cli in various ways:

  • specified input - output file pairs. Here input and output have to be files.
  • all files within an input directory will are stored in an output directory. Here input and output have to be directories.
  • override a specified input file, or override all files within an input directory. Here omit the --output flag.

Support

If you need help submit an issue in the gitlab repository.