This Notebook gives a guideline how to use recent basic lossy and lossless compression for netdf output via `xarray`. It is an implementation of [these slides](https://www.researchgate.net/publication/365006139_NetCDF_Compression_Improvements) using example data of Earth System Models AWI-CM. We will compare the writing speed and the compression ratio of different methods.
For using lossless compression methods on netcdf files, we use the [hdf5plugin](https://github.com/silx-kit/hdf5plugin/blob/b53d70b702154a149a60f653c49b20045228aa16/doc/index.rst) tool that enables to use additional hdf5filters within python.
For *lossy* compression, we use the [numcodecs](https://github.com/zarr-developers/numcodecs) lib to calculate the bitrounding.
We use the [BitShuffle](https://github.com/kiyo-masui/bitshuffle) filter prior to compression whenever possible. This will rearrange the binary data in order to improve compression.
## Requirements
- Lossy compression requires a lot of memory.
- Reading lossless compressions other than deflated requires netcdf version 4.9 or newer with access to the HDF5 filters
zlib has been the standard compression method since it was introduced in netcdf. It is based on the deflate algorithm, other compression packages use dictionaries.
### [Zstandard](https://github.com/facebook/zstd)
Zstandard allows multithreading. It is used by package registries and is supported by linux systemss. Zstandard offers features beyond zlib/zlib-ng, including better performance and compression.
### [LZ4](https://github.com/lz4/lz4)
LZ4 has a focus on very fast compression. It scales with CPUs.
### [Blosc](https://github.com/Blosc/c-blosc)
Blosc uses the blocking technique to not only reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations. Its default compressor is based on LZ4 and Zstandard.