README.md 11 KB
Newer Older
Thomas Jahns's avatar
Thomas Jahns committed
1
2
# Introduction

Thomas Jahns's avatar
Thomas Jahns committed
3
This project provides scripts to build and install the libraries
Thomas Jahns's avatar
Thomas Jahns committed
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
needed for ICON I/O into working configurations with as little effort
as possible.

The intention is, on the one hand, to gather recipes that allowed for
a successful build at multiple sites to highlight the specific and
general issues involved and keep track of required patches. On the
other hand it is hoped that this way, should a new, desirable version
of some library arise, recompiling the whole stack becomes an easier
affair. Since the libraries need to be consistent with each other, the
aspect of rapidly building the whole stack and running the
corresponding test suites should improve confidence that the resulting
installation can actually be used for ICON.

# Scope

This script facilitates a consistent from-source installation for the
following packages:

Thomas Jahns's avatar
Thomas Jahns committed
22
23
+ __libaec__ <https://gitlab.dkrz.de/k202009/libaec>
  Adaptive entropy encoding data compression library, used
Thomas Jahns's avatar
Thomas Jahns committed
24
  by __hdf5__ and __eccodes__.
Thomas Jahns's avatar
Thomas Jahns committed
25
26
+ __HDF5__ <https://www.hdfgroup.org/solutions/hdf5>
  Data container library commonly used by netcdf version 4 and
Thomas Jahns's avatar
Thomas Jahns committed
27
  later.
Thomas Jahns's avatar
Thomas Jahns committed
28
+ __Pnetcdf__, also known as __parallel-netcdf__ <https://parallel-netcdf.github.io/>
Thomas Jahns's avatar
Thomas Jahns committed
29
30
  A library for parallel access to those file formats of the netcdf
  library not using the HDF5 library.
Thomas Jahns's avatar
Thomas Jahns committed
31
+ __NetCDF-C__ <https://github.com/Unidata/netCDF-C>
Thomas Jahns's avatar
Thomas Jahns committed
32
33
  A library to access a data file format common in a number of
  physical sciences.
Thomas Jahns's avatar
Thomas Jahns committed
34
+ __netcdf-fortran__ <https://github.com/Unidata/netCDF-Fortran>
Thomas Jahns's avatar
Thomas Jahns committed
35
  A Fortran wrapper library for netcdf-c.
Thomas Jahns's avatar
Thomas Jahns committed
36
+ __eccodes__ <https://confluence.ecmwf.int/display/ECC/ecCodes+Home>
Thomas Jahns's avatar
Thomas Jahns committed
37
38
  A library to read and/or write data in the WMO GRIB1 and GRIB2
  formats. Provided by ECMWF.
Thomas Jahns's avatar
Thomas Jahns committed
39
+ __YAXT__ <https://www.dkrz.de/redmine/projects/yaxt>
Thomas Jahns's avatar
Thomas Jahns committed
40
41
42
  Facilitates various data exchanges for arrays distributed over
  multiple MPI tasks. Effectively removes the necessity to code
  invidual MPI message passing calls.
Thomas Jahns's avatar
Thomas Jahns committed
43
+ __PPM__ <https://www.dkrz.de/redmine/projects/scales-ppm>
Thomas Jahns's avatar
Thomas Jahns committed
44
45
  Partitioning and Parallelization Module, a library to aid in various
  recurring tasks of parallel programs.
Thomas Jahns's avatar
Thomas Jahns committed
46
+ __CDI__ <https://code.mpimet.mpg.de/projects/cdi>
Thomas Jahns's avatar
Thomas Jahns committed
47
48
49
50
51
  Provides an abstraction for multiple data formats to ease switching
  between COARDS conforming netCDF and WMO GRIB formats.
  Includes the *CDI-PIO* parallelization layer needed for parallel
  output from a number of climate/weather models.

52
53
54
55
56
In various parts of the scripts, each package is referred to by its
name in all lower-case. This so called package key is instrumental in
referencing the various associative arrays storing information on each
package.

Thomas Jahns's avatar
Thomas Jahns committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
## Scripts Overview

The supplied scripts provide the following:

+ __build-cdi-pio-stack.sh__
  This is the main driver that is used by all the system-specific
  scripts. It provides the basic recipes and takes care of the
  dependencies and general handling of static and/or dynamic
  libraries. It provides various seams (see above) to customize its
  operation to the specific properties of the target system.
+ __build-cdi-pio-stack-daint-cce.sh__
  Builds the library stack for Piz Daint at CSCS with the current Cray
  compiler.
+ __build-cdi-pio-stack-daint-cce-10.0.2.sh__
  Builds the library stack for Piz Daint at CSCS with the older Cray
  compiler version 10.0.2.
+ __build-cdi-pio-stack-daint-pgi.sh__
  Builds the library stack for Piz Daint at CSCS with the PGI
  compiler version 20.1.
Thomas Jahns's avatar
Thomas Jahns committed
76
+ __build-cdi-pio-stack-juwels-booster-nvhpc-20.11.sh__
Thomas Jahns's avatar
Thomas Jahns committed
77
78
  Builds the libraries for JSC JUWELS Booster with the NVidia
  compilers, version 20.11.
Thomas Jahns's avatar
Thomas Jahns committed
79
80
81
+ __build-cdi-pio-stack-juwels-booster-nvhpc-21.5.sh__
  Builds the libraries for JSC JUWELS Booster with the NVidia
  compilers, version 21.5.
Thomas Jahns's avatar
Thomas Jahns committed
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
+ __build-cdi-pio-stack-mistral-intel-openmpi2.sh__
  Builds the libraries on DKRZ Mistral with the Intel compiler, version
  17.0.6 for compatibility with more user setups.
+ __build-cdi-pio-stack-mistral-nag-openmpi2.sh__
  Builds the libraries on DKRZ Mistral with the NAG compiler version 6.2.
+ __build-cdi-pio-stack-vader-gcc-10.2-ompi-4.0.5.sh__
  Builds the whole stack on the DKRZ ML cluster with gcc 10.2.0 and
  OpenMPI 4.0.5.
+ __build-cdi-pio-stack-vader-icc-impi.sh__
  Builds the whole stack on the DKRZ ML cluster with Intel compiler
  2021.1 and Intel MPI 2021.1.
+ __build-cdi-pio-stack-vader-icc-impi-O0.sh__
  Builds the whole stack on the DKRZ ML cluster with Intel compiler
  2021.1 and Intel MPI 2021.1 in a debugging configuration.

After some initial difficulties, most of the wrappers can now perform
the build in tmpfs filesystem for much improved speed.

## Invoking the main driver script

The main driver can be passed the following settings via
command-line arguments (preferred) or via environment variables:

+ __PAR_BUILD__
  Number of parallel make jobs to use. Default is to use 22 tasks.
+ __EXTRA_MAKE_ARGS__
  Extra arguments to pass to make. This can e.g. be used to set some
  common make variables not treated explicitly below.
110
111
112
+ __EXTRA_MAKE_CHECK_ARGS__
  Extra arguments to pass to make check. This can e.g. be used to set some
  common make variables not treated explicitly below.
Thomas Jahns's avatar
Thomas Jahns committed
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
+ __make__
  Name of make program to invoke. This defaults to the value of the
  MAKE variable, or, if unset, to make, or gmake when make can not be
  found in the PATH.
+ __build__
  A tag to represent the build target, usually by the name and version
  of compiler and MPI library.
+ __stages__
  build-cdi-pio-stack.sh runs the following steps for each package it
  considers to be built:
  + *download* Download the source package archives and/or git clones.
  + *unpack* Expands archived sources and applies patches. Note that
    patches are only applied at the initial unpacking. Once the source
    directory exists, no patches are applied.
  + *build* Performs the configure && make steps of the build for
    every package with unfinished installation.
  + *check* Runs make check in each build directory.
  + *install* Performs a `make install` for each package not already
    installed.
  Optionally the *recheck* stage re-runs the test suite of packages
  already successfully installed.

  Separating the stages is most useful for situations where the
  download must be carried out on another system because the HPC
  system blocks some network connections or for systems where the
  environment for running make check is significantly different from
  the build environment, e.g. because a batch allocation must be
  provided for testing.
Thomas Jahns's avatar
Thomas Jahns committed
141
+ __CC__, __FC__, __CXX__, __F77__
Thomas Jahns's avatar
Thomas Jahns committed
142
143
  Command to invoke the C, Fortran and C++ compilers
  respectively, defaults to mpicc, mpifort and mpic++.
Thomas Jahns's avatar
Thomas Jahns committed
144
  F77 is special in that it defaults to the value of FC.
Thomas Jahns's avatar
Thomas Jahns committed
145
146
147
148
149
+ __CPPFLAGS__, __CFLAGS__, __FCFLAGS__, __FFLAGS__, __CXXFLAGS__
  Initial flag variables for the C preprocessor and compiler, the Fortran and
  Fortran 77 compilers and the C++ compiler. These default to -g -O2
  but it is recommended to at least add flags to adjust code
  generation for the target architecture, e.g. -march=native for gcc.
150
151
152
153
154
155
156
157
+ __AR__
  Command to create standard Unix (library) archive, defaults to ar.
+ __RANLIB__
  Command to index standard Unix (library) archive, defaults to
  ranlib.
+ __CC_PIC_FLAGS__
  Flag(s) to __CC__ to produce object files that can be incorporated
  into dynamic shared objects. The default value is -fPIC.
Thomas Jahns's avatar
Thomas Jahns committed
158
159
160
161
162
163
164
165
166
167
168
+ __LIBS__ and __LDFLAGS__
  These variables are meant to hold flags and library specifications
  in case some special library is needed (e.g. a custom malloc
  library) or some parts of the library stack are already installed by
  other means.
+ __MPI_LAUNCH__
  Program to start MPI-parallelized programs, defaults to what the
  configuration scripts of the libraries pick up by automatic
  configuration, typically `mpirun`. When using the Slurm batch
  scheduler, `srun` is usually preferred, on Cray systems `aprun`
  might be needed.
Thomas Jahns's avatar
Thomas Jahns committed
169
170
171
+ __CMAKE_EXTRA_ARGS__
  The contents of this variable is expanded into multiple arguments to
  be added to the invocation of cmake for eccodes.
Thomas Jahns's avatar
Thomas Jahns committed
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
+ __libtype__
  Should take one of the values `static`,`shared`, or `both`. Default is to
  build shared objects only.
+ __packages_dl__
  Is an associative array mapping each package to its downloadable
  archive URL.
+ __CC_rpath_flag__, __FC_rpath_flag__
  This defaults to -Wl,-rpath, and is meant to be the prefix for
  additions to the RUNPATH and/or RPATH entries of shared objects via
  compiler link step flags. Notably, FC_rpath_flag must be set to
  -Wl,-Wl,,-rpath,, for the NAG Fortran compiler.
+ __package_git__, __package_git_branch__
  URL and branch to use for packages to retrieve via `git clone`.
+ __basedir__
  Path at which to root the following by default:
  + __archivedir__
  Where to store downloaded archive files, defaults to `$basedir/archive`.
  + __srcdir__
  Where to unpack the source archives to. When set to e.g.
  `/some/path`, netcdf-c 4.7.4 would be unpacked to
  `/some/path/netcdf-c-4.7.4`.
  Defaults to `$basedir/src`.
  + __builddir__
  Where to perform compilations. If at all possible, it is suggested
  to put this on tmpfs to reduce total build time significantly.
Thomas Jahns's avatar
Thomas Jahns committed
197
198
  Defaults to `$basedir/build/$build` with individual packages being
  built in a subdirectory corresponding to the package name.
Thomas Jahns's avatar
Thomas Jahns committed
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
  + __prefix__
  Where to install packages to, defaults to `${basedir}/opt/${build}`.
  If the `multi_installs` variable is set, each package is installed
  into an individual sub-directory, and the following substitutions
  are performed on `$prefix`:
    + %k is replaced by the package base name, e.g. libaec
    + %n is replaced by the package base name followed by the version or
      git commit hash, separated by a dash.
    + %b is replaced by `$build`.
    + %v is replaced by the package version
+ __NC_H5_CACHE_SIZE__, __NC_H5_CHUNK_CACHE_NELEMS__
  Tunables for netcdf-c default chunking parameters. The default
  values are 4194304 and 1009 respectively.
+ __SCRATCH__
  A directory with sufficient free space for the large file tests of
  the various test suites. Must be writable for programs started by
Thomas Jahns's avatar
Thomas Jahns committed
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
  __MPI_LAUNCH__.

Additionally, for each package variables can be provided
by prefixing the following suffices with the canonicalized package
name (non alphanumeric and underscore characters are replaced by
underscore and only lower case letters are used), e.g. NetCDF-C
becomes the `netcdf_c` prefix. Optionally, the prefix can be
extended with the canonicalized version to provide for fixes known to
be needed for a specific package version only.

+ *prefix*_configure
  Contains extra arguments to pass to the configure step of the
  package denoted by *prefix*
+ *prefix*_configure_env
  This variable undergoes variable expansion as a prefix of the
  configure command and can for example be used to run configure with
  a different shell or via `salloc`.
+ *prefix*_check_env
  These variables serve the same purpose as *prefix*_configure_env but
  for the stage running the test suite.
235
236
237
238
239

Presuming a package with canonical name pkg is already installed in
the system and the various settings adjusted to make use of it, adding
the following argument suppresses building the package:
--use-from-system=pkg