README.md 7.46 KB
Newer Older
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
1 2 3 4 5 6 7 8 9 10
# libaec - Adaptive Entropy Coding library

Libaec provides fast lossless compression of 1 up to 32 bit wide
signed or unsigned integers (samples). The library achieves best
results for low entropy data as often encountered in space imaging
instrument data or numerical model output from weather or climate
simulations. While floating point representations are not directly
supported, they can also be efficiently coded by grouping exponents
and mantissa.

11 12 13
## Scope

Libaec implements extended
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
14
[Golomb-Rice](http://en.wikipedia.org/wiki/Golomb_coding) coding as
15 16 17
defined in the CCSDS recommended standard [121.0-B-3][1]. The library
covers the adaptive entropy coder and the preprocessor discussed in
sections 1 to 5.2.6 of the [standard][1].
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
18

19 20
## Downloads

21
Source code and binary installer can be [downloaded here](https://gitlab.dkrz.de/k202009/libaec/tags) [or here](https://github.com/MathisRosenhauer/libaec).
22

23 24 25
## Patent considerations

As stated in section A3 of the current [standard][1]
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
26

27 28
> At time of publication, the specifications of this Recommended
> Standard are not known to be the subject of patent rights.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
29 30 31

## Installation

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
32
See [INSTALL.md](INSTALL.md) for details.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
33 34 35

## SZIP Compatibility

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
36
[Libaec can replace SZIP](README.SZIP).
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
37 38 39 40 41 42 43

## Encoding

In this context efficiency refers to the size of the encoded
data. Performance refers to the time it takes to encode data.

Suppose you have an array of 32 bit signed integers you want to
44 45
compress. The pointer pointing to the data shall be called `source`,
output goes into `dest`.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
46

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
47
```c
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
#include <libaec.h>

...
    struct aec_stream strm;
    int32_t *source;
    unsigned char *dest;

    /* input data is 32 bits wide */
    strm.bits_per_sample = 32;

    /* define a block size of 16 */
    strm.block_size = 16;

    /* the reference sample interval is set to 128 blocks */
    strm.rsi = 128;

    /* input data is signed and needs to be preprocessed */
    strm.flags = AEC_DATA_SIGNED | AEC_DATA_PREPROCESS;

    /* pointer to input */
    strm.next_in = (unsigned char *)source;

    /* length of input in bytes */
    strm.avail_in = source_length * sizeof(int32_t);

    /* pointer to output buffer */
    strm.next_out = dest;

    /* length of output buffer in bytes */
    strm.avail_out = dest_length;

    /* initialize encoding */
    if (aec_encode_init(&strm) != AEC_OK)
        return 1;

    /* Perform encoding in one call and flush output. */
    /* In this example you must be sure that the output */
    /* buffer is large enough for all compressed output */
    if (aec_encode(&strm, AEC_FLUSH) != AEC_OK)
        return 1;

    /* free all resources used by encoder */
    aec_encode_end(&strm);
...
```

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
94
`block_size` can vary from 8 to 64 samples. Smaller blocks allow the
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
95 96 97 98
compression to adapt more rapidly to changing source
statistics. Larger blocks create less overhead but can be less
efficient if source statistics change across the block.

99 100 101 102 103 104
`rsi` sets the reference sample interval in blocks. A large RSI will
improve performance and efficiency. It will also increase memory
requirements since internal buffering is based on RSI size. A smaller
RSI may be desirable in situations where errors could occur in the
transmission of encoded data and the resulting propagation of errors
in decoded data has to be minimized.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
105 106 107

### Flags:

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
108
* `AEC_DATA_SIGNED`: input data are signed integers. Specifying this
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
109 110
  correctly increases compression efficiency. Default is unsigned.

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
111
* `AEC_DATA_PREPROCESS`: preprocessing input will improve compression
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
112 113 114 115
  efficiency if data samples are correlated. It will only cost
  performance for no gain in efficiency if the data is already
  uncorrelated.

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
116
* `AEC_DATA_MSB`: input data is stored most significant byte first
117
  i.e. big endian. Default is little endian on all architectures.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
118

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
119
* `AEC_DATA_3BYTE`: the 17 to 24 bit input data is stored in three
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
120 121
  bytes. This flag has no effect for other sample sizes.

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
122 123
* `AEC_RESTRICTED`: use a restricted set of code options. This option is
  only valid for `bits_per_sample` <= 4.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
124 125 126 127

### Data size:

The following rules apply for deducing storage size from sample size
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
128
(`bits_per_sample`):
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
129

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
130 131 132 133 134 135 136
 **sample size**  | **storage size**
--- | ---
 1 -  8 bits  | 1 byte
 9 - 16 bits  | 2 bytes
17 - 24 bits  | 3 bytes (only if `AEC_DATA_3BYTE` is set)
25 - 32 bits  | 4 bytes (if `AEC_DATA_3BYTE` is set)
17 - 32 bits  | 4 bytes (if `AEC_DATA_3BYTE` is not set)
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
137 138 139

If a sample requires less bits than the storage size provides, then
you have to make sure that unused bits are not set. Libaec does not
140
enforce this for performance reasons and will produce undefined output
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
141 142 143 144
if unused bits are set. All input data must be a multiple of the
storage size in bytes. Remaining bytes which do not form a complete
sample will be ignored.

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
145 146
Libaec accesses `next_in` and `next_out` buffers only bytewise. There
are no alignment requirements for these buffers.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
147 148 149

### Flushing:

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
150 151
`aec_encode` can be used in a streaming fashion by chunking input and
output and specifying `AEC_NO_FLUSH`. The function will return if either
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
152
the input runs empty or the output buffer is full. The calling
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
153 154
function can check `avail_in` and `avail_out` to see what occurred. The
last call to `aec_encode()` must set `AEC_FLUSH` to drain all
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
155
output. [aec.c](src/aec.c) is an example of streaming usage of encoding and
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
156 157 158 159 160
decoding.

### Output:

Encoded data will be written to the buffer submitted with
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
161
`next_out`. The length of the compressed data is `total_out`.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
162 163 164 165 166 167 168 169 170 171

See libaec.h for a detailed description of all relevant structure
members and constants.


## Decoding

Using decoding is very similar to encoding, only the meaning of input
and output is reversed.

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
172
```c
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
#include <libaec.h>

...
    struct aec_stream strm;
    /* this is now the compressed data */
    unsigned char *source;
    /* here goes the uncompressed result */
    int32_t *dest;

    strm.bits_per_sample = 32;
    strm.block_size = 16;
    strm.rsi = 128;
    strm.flags = AEC_DATA_SIGNED | AEC_DATA_PREPROCESS;
    strm.next_in = source;
    strm.avail_in = source_length;
    strm.next_out = (unsigned char *)dest;
    strm.avail_out = dest_lenth * sizeof(int32_t);
    if (aec_decode_init(&strm) != AEC_OK)
        return 1;
    if (aec_decode(&strm, AEC_FLUSH) != AEC_OK)
        return 1;
    aec_decode_end(&strm);
...
```

It is strongly recommended that the size of the output buffer
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
199
(`next_out`) is a multiple of the storage size in bytes. If the buffer
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
200
is not a multiple of the storage size and the buffer gets filled to
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
201
the last sample, the error code `AEC_MEM_ERROR` is returned.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
202

Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
203 204
It is essential for decoding that parameters like `bits_per_sample`,
`block_size`, `rsi`, and `flags` are exactly the same as they were for
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
205 206 207 208 209 210 211 212
encoding. Libaec does not store these parameters in the coded stream
so it is up to the calling program to keep the correct parameters
between encoding and decoding.

The actual values of coding parameters are in fact only relevant for
efficiency and performance. Data integrity only depends on consistency
of the parameters.

213 214 215 216 217 218
The exact length of the original data is not preserved and must also be
transmitted out of band. The decoder can produce additional output
depending on whether the original data ended on a block boundary or on
zero blocks. The output data must therefore be truncated to the
correct length. This can also be achieved by providing an output
buffer of just the correct length.
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
219 220 221

## References

222 223 224
[Lossless Data Compression. Recommendation for Space Data System
Standards, CCSDS 121.0-B-3. Blue Book. Issue 3. Washington, D.C.:
CCSDS, August 2020.][1]
Mathis Rosenhauer's avatar
Mathis Rosenhauer committed
225

226
[1]: https://public.ccsds.org/Pubs/121x0b3.pdf