Author: Florian Prill, DWD (2012-06-19).
Case Setup “ICON_r2B06_10d”
Current restrictions
The ICON runs employ ECMWF's scheduler “SMS”, cf. http://www.ecmwf.int/publications/manuals/sms/ for detailed documentation.
deia
“. ecgate
and the compute cluster c1a
(IBM Power6 cluster1)). svn
version control system: trunk/icon-dev/schedulers/ecmwf
.zde
), see /home/ms/de/zde/BCeps/bceps.def
.
The SMS directory structure is necessary on the compute cluster file system, $PERM
as well as on ecgate
.
def/case_setup
.$HOME/ICON_R2B06_10d/def/smsfiles/icon.def
$HOME/sms_server
and $HOME/sms
.$TMPDIR
on ecgate and $TEMP/sms
on the cluster file system.
All script files and the model setup are located in a directory $HOME/ICON_R2B06_10d
of the following layout:
$HOME/ICON_R2B06_10d def ! scripts for operating the suite smsfiles include icon_scripts ! checkout from ICON SVN: trunk/icon-dev/scripts bin ! additional script files output model ! current model output log ! SMS suite logfiles doc ! this documentation cluster ! [only on compute cluster] icon-dev ! latest build of SVN trunk [only on compute cluster] input ! static input (time-independent) extpar grids radiation ifs2icon ! model input from IFS cellweights
Note: Not all necessary contents are stored in the Redmine repository
https://svn.zmaw.de/svn/icon/trunk/icon-dev/schedulers/ecmwf/ICON_r2B06_10d
After the initial check-out one must add
ecgate
and cluster c1a
: def/icon_scripts
(from https://svn.zmaw.de/svn/icon/trunk/icon-dev/scripts
)cluster/icon-dev
The complete run of the ICON model is split into different tasks. These tasks are grouped into families and sub-families, for example init and init/build.
Each task corresponds to a *.sms
file, stored in the subdirectory smsfiles/def/icon/…
, for example smsfiles/def/icon/init/get_data.sms
. The *.sms
files is an (almost) ordinary script file.
There a basically three types of *.sms
files:
ecgate
: Post-processing etc.hpc_serial
: Scripts running on the c1a
frontend: Data retrieval for the cluster c1a
.hpc_parallel
: Scripts running in parallel on the compute cluster.File names and output directories depend on several variables2):
%SMSDATE%
%SUITE_NAME%
(user-defined) $REQUEST_DATE
(user-defined) Objectives: | Dummy task, triggered by IFS, initiates ICON SMS suite. |
Platform: | ecgate |
Requires input: | None. |
Provides output: | None. |
The whole SMS suite is restarted (repeat date
keyword) on a regular basis.
The suite contains an artificial dummy task, fct_ifs
, whose completion is the starting condition for the whole suite.
There exists a small shell script def/smsfiles/job.trigger
which is triggered by the IFS framework itself, i.e. via
ecaccess-job-submit
This shell script calls a CDP command:
force complete fct_ifs
and thus activates the whole ICON SMS suite.
Currently, the task is triggered by the event ID 167, which means “at this stage, the analysis at 00UTC is complete”. The task has been registered with
ecaccess-job-submit -noDirectives -eventIds 167 -queueName ecgate $HOME/ICON_r2B06_10d/def/smsfiles/job.trigger
More events can be obtained from the list generated by the command
ecaccess-event-list
Finally, use
ecaccess-job-list ecaccess-job-delete <jobID>
to remove a job trigger from the list.
Objectives: | Set some initial variables and flags. |
Platform: | ecgate |
Requires input: | None. |
Provides output: | sets events “enable_build” and “enable_dumpstate” based on current date. |
Some parts of the SMS suite are not run in a daily fashion but less frequently. For example, dump states (coefficient tables) must be pre-computed only once and a new binary should be compiled only about once a week.
init/setup:enable_dumpstate
controls if a dump state is computed.init/setup:enable_build
controls if a new binary is compiled.
Currently (test phase of the SMS suite), the variable enable_dumpstate
is always enabled, while enable_build
is always disabled.
Objectives: | Retrieve IFS data from MARS database. |
Platform: | hpc_serial |
Requires input: | None. |
Provides output: | Retrieved *.grb file is stored in $PERM/%SUITE_NAME%/cluster/input/ifs2icon |
Objectives: | Run converter IFS2ICON . |
Platform: | hpc_serial |
Requires input: | GRIB file stored in $PERM/%SUITE_NAME%/cluster/input/ifs2icon (moved by this task) |
Requires input: | IFS2ICON config file in $PERM/%SUITE_NAME%/def/ifs2icon_1279.conf |
Requires input: | Grid files in $PERM/%SUITE_NAME%/cluster/input/grids |
Provides output: | Conversion result (NetCDF format) and the GRIB file in $TEMP/sms/input/%SUITE_NAME/ $REQUEST_DATE / |
init_data
comprises the computation of cell_weights – but only if they have not been previously computed. This is indicated by the existence of a filecluster/input/ifs2icon/cellweights/initialized.flag
init_data
loads the Climate data Operators cdo
version 1.5.0. Note that the default version 1.4.6 installed on c1a
cannot be used with ifs2icon
while for the newer 1.5.3 NetCDF support has not been compiled in.ifs2icon
script contains a call to cdo merge
which consumes a considerable amount of memory (for R2B06, the default of ~780MB is not sufficient). Objectives: | Update ICON sources. |
Platform: | ecgate |
Provides output: | Source code in directory c1a:$PERM/%SUITE_NAME%/cluster/icon-dev |
There are two modes, depending on user-defined %SVN_UPDATE%
flag:
Subversion update
Requires an existing source directory on compute cluster c1a
.
Apply only svn update
and gmake distclean
to retrieve the source code from the repository.
Tar-ball export
SVN export of a source copy in $TMPDIR/icon-dev
(on ecgate
), copy to the compute cluster c1a
, where the files will be extracted (in task init/build/init_build
).
Objectives: | Build ICON binary. |
Platform: | hpc_parallel |
Requires input: | Updated source directory, c1a:$PERM/%SUITE_NAME%/cluster/icon-dev |
Provides output: | Binary in c1a:$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/ |
The build process always starts from a clean directory, i.e. the complete is recompiled.
Note: The binary is configured with -O3
optimization enabled, which requires a considerable amount of time on the PWR6.
Objectives: | Clear directory with model run and output directory. |
Platform: | hpc_serial |
Provides output: | Clears output directory %TEMP%/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
Objectives: | Create dump state (containing coefficient tables for ICON run). |
Platform: | hpc_parallel |
Requires input: | Case setup in $PERM/%SUITE_NAME%/def . |
Requires input: | IFS data in $TEMP/sms/input/ . |
Requires input: | control_model binary in$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/ . |
Provides output: | Dump state in $TEMP/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
Objectives: | Model run. |
Platform: | hpc_parallel |
Requires input: | Case setup in $PERM/%SUITE_NAME%/def . |
Requires input: | IFS data in $TEMP/sms/input/ . |
Requires input: | control_model binary in$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/ . |
Requires input: | Dump state in $TEMP/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
Provides output: | Model output in $TEMP/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
On abort or completion: Terminates the SMS task “check_progress”
by calling the utility script cancel_check_progress
.
Objectives: | Check for available output files. |
Platform: | ecgate |
Provides output: | Increases SMS meter “progress” . |
When using the XCDP graphical user interface the current number of output files is displayed as a graphical progress bar. Please note that the timing might be inaccurate because the output directory is polled infrequently. The value of the progress meter is used to trigger the post-processing tasks of the suite.
Terminates, if all NetCDF output files are available in c1a:$TEMP/sms/output/%SUITE_NAME%/
$REQUEST_DATE
/ or if on ecgate
a file $HOME/sms/model_complete.flag
has been created.
This task is called for each output step separately.
Not yet implemented, only a “dummy” call to grib_ls
!
Objectives: | Extract output, generate plots. |
Platform: | hpc_serial |
Requires input: | Model output in $TEMP/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
The calling frequency of this post-processing task is probably too high: It is invoked each time after new output file has been written to disk.
That is, 80 post-processing tasks are launched on hpc_serial
(c1a
).
One could also design a post-processing mechanism which is invoked only once, after the final output file has been created.
For this, adjust the repeat
and trigger
keywords in the SMS suite definition file
$HOME/ICON_R2B06_10d/def/smsfiles/icon.def
See ECMWF's SMS documentation on how to modify this setting.
Not yet implemented!
Objectives: | Copy output and/or store in database and/or trigger transfer to DWDExtract output, generate plots. |
Platform: | ecgate |
Requires input: | Model output in $TEMP/sms/output/%SUITE_NAME%/ $REQUEST_DATE / |
llcancel
)! Check also for shell scripts running in the background on ecgate
(especially utility script cancel_check_progress
).
deia
's ICON SMS suite can be viewd and controlled from your own ECMWF account via the graphical frontend “xcdp”:
Launch
xcdp
and open the menu
Edit -> Preferences -> Servers
There, add a new server by typing
Name: deia Host: ecgate Number: 901730
(where “901730” denotes the SMS ID provided by sms_start). Then, close the preferences dialog and log into the SMS suite by the menu item
Servers -> deia
Note: Currently, the SMS suite is started at about 7:30 CEST and runs less than 60 minutes. Therefore, for most of the day, the XCDP status of the suite is “queued” (blue-gray color)!
command
sms_start
gives
User "3455" attempting to start sms server on "ecgate" using PROGNUM = "903455" and with SMSHOME of "$HOME/sms_server" Checking if the SMS is already running on ecgate ...
Then write host and number to .cdprc
:
alias myalias set SMS_PROG 903455 \; login ecgate UID 1
where ”903455
” should be replaced by the provided PROGNUM
value.
Initially create a directory structure for the SMS log files:
cd mkdir sms mkdir sms/icon mkdir sms/icon/init mkdir sms/icon/forecast mkdir sms/icon/post mkdir sms/icon/init/build mkdir sms/icon/forecast/prepare
Similarly, on the compute cluster c1a
:
cd $TEMP mkdir sms mkdir sms/icon mkdir sms/icon/init mkdir sms/icon/forecast mkdir sms/icon/post mkdir sms/icon/init/build mkdir sms/icon/forecast/prepare
In the $HOME/.cdprc
file, several utility routines have been defined:
Start the suite with
cdp CDP> myalias CDP> play icon.def CDP> begin icon
If there exists already a running SMS suite, you can relaunch the whole setup with
cdp CDP> total_restart
Check the status of the suite with
cdp CDP> myalias CDP> status -f /{sub} icon{sub} t1[sub]
On the compute cluster c1a
a special process is in charge of copying log files back to ecgate
s.t. they can be viewed in xcdp
:
cd $PERM/ICON_r2B06_10d/def/bin ./logserver
Now a log server daemon must be running: The command line
ps -ef | grep $USER
yield something like
... 418144 1 0 10:34:29 pts/172 0:00 /usr/bin/perl /usr/local/bin/logsvr.pl
Enable rcp
and rsh
between the compute cluster and ecgate
by setting the $HOME/.rhosts
file:
echo "ecga02 $USER" >> ~/.rhosts
For copying files between ecgate and compute cluster use the commands rcp
or ecrcp
.
SVN access is required
Please note: For direct access to ZMAW's subversion repository, changes in $HOME/.subversion/servers
are necessary:
http-proxy-host = ****** http-proxy-port = ******
(Inofficial access, ask L. Kornblueh, F. Prill for details).
Note that on c1a
no SSL support is available, therefore type
svn co http://svn.zmaw.de/svn/icon/trunk/icon-dev/schedulers/ecmwf/ICON_r2B06_10d
on the command line for c1a
.
Script language ruby
has to be installed locally (user account).
It is required by the IFS2ICON task.
Prerequisites:
$PERM/software
.$PERM/software/packages
.$PERM/software/packages
.Installation process:
tar xvf ruby-1.9.3-p125.tar cd ruby-1.9.3-p125 ./configure --prefix=$PERM/software/ruby-1.9.3-p125_build gmake -j6 gmake install cd ../packages $PERM/software/ruby-1.9.3-p125_build/bin/gem install extcsv $PERM/software/ruby-1.9.3-p125_build/bin/gem install gnuplot
Patch required: In $PERM/software/ruby-1.9.3-p125_build/lib/ruby/gems/1.9.1/gems/extcsv-0.12.0/lib/extcsv_diagram.rb
comment out require statement:
#require 'extcsv_units'
It is necessary to compile a local version of the Climate Data Operators, which is newer than v1.5.0,
because the system installation of the CDOs on c1a
has not been compiled with NetCDF support.
The configure script must be provided with the correct system paths for NetCDF, Jasper and GRIB_API:
ls -rlt gunzip cdo-1.5.4.tar.gz tar xvf cdo-1.5.4.tar cd cdo-1.5.4 CC=xlc_r CFLAGS="-g -O3 -q64 -qhot -qarch=auto -qtune=auto -qsmp=omp -DHAVE_MMAP" ./configure -with-netcdf=/usr/local/apps/netcdf/3.6.3/LP64 \ --prefix=$PERM/software/cdo-1.5.4_build \ --with-grib_api=/usr/local/lib/metaps/lib/grib_api/1.9.9 \ --with-jasper=/usr/local/apps/jasper/1.900.1/LP64 \ --with-threads=yes gmake -j2 gmake install
After successful compilation, all IFS2ICON processes must run with
$PERM/software/cdo-1.5.4_build/bin/cdo
To this end, the task init_data.sms
exports an environment variable CDOBIN
which is then used by the
Ruby script ifs2icon.rb
.
We want to use the same IFS2ICON configuration settings for ECMWF as well as for DWD, therefore the GRIB2 short names must be taken from DWD's definition files:
export GRIB_DEFINITION_PATH=%SCPERM%/software/usr/local/grib_api/release/share-1.9.9/definitions.edzw:/usr/local/lib/metaps/lib/grib_api/1.9.9/share/definitions
The directory ”%SCPERM%/software/usr/local/grib_api/release/share-1.9.9/definitions.edzw” can be tar'ed and copied from DWD's HPC file system.
Alternative approach: DWD GRIB definitions are already installed under
~dwd/grib_api/definitions.edzw-${my_api_version}
One can set them using the script /home/ms/de/dw7/bin/grib_def
.
The MARS4ICON script must be executable:
chmod +x $PERM/ICON_r2B06_10d/def/icon_scripts/preprocessing/mars4icon_smi
MARS4ICON requires DWD's date conversion script datconv
, which must be in PERL's search path:
cd $HOME/bin ln -s ~dfr/routfox/bin/datconv . ln -s ~dfr/routfox $HOME/ export PERL5LIB="$HOME/routfox/perl"
get_data.sms
/home/ms/de/dw7/bin/grib_def
for GRIB definition.$PERM
).-O3