Table of Contents

Documentation - Scheduled runs of the ICON model at ECMWF

Author: Florian Prill, DWD (2012-06-19).

General Notes

Case Setup “ICON_r2B06_10d”

Current restrictions

SMS suite ICON_R2B06_10d

The ICON runs employ ECMWF's scheduler “SMS”, cf. http://www.ecmwf.int/publications/manuals/sms/ for detailed documentation.

Directory layout

The SMS directory structure is necessary on the compute cluster file system, $PERM as well as on ecgate.

All script files and the model setup are located in a directory $HOME/ICON_R2B06_10d of the following layout:

$HOME/ICON_R2B06_10d
    def                    ! scripts for operating the suite
        smsfiles
        include
        icon_scripts       ! checkout from ICON SVN: trunk/icon-dev/scripts
        bin                ! additional script files
    output
        model              ! current model output
        log                ! SMS suite logfiles
    doc                    ! this documentation
    cluster                ! [only on compute cluster]
        icon-dev           ! latest build of SVN trunk [only on compute cluster]
        input              ! static input (time-independent)
            extpar
            grids
            radiation
            ifs2icon       ! model input from IFS
               cellweights

Note: Not all necessary contents are stored in the Redmine repository

 https://svn.zmaw.de/svn/icon/trunk/icon-dev/schedulers/ecmwf/ICON_r2B06_10d

After the initial check-out one must add

Suite definition

The complete run of the ICON model is split into different tasks. These tasks are grouped into families and sub-families, for example init and init/build. Each task corresponds to a *.sms file, stored in the subdirectory smsfiles/def/icon/…, for example smsfiles/def/icon/init/get_data.sms. The *.sms files is an (almost) ordinary script file. There a basically three types of *.sms files:

File names and output directories depend on several variables2):

Task ifs_fct/fct_ifs

Objectives: Dummy task, triggered by IFS, initiates ICON SMS suite.
Platform: ecgate
Requires input: None.
Provides output: None.

The whole SMS suite is restarted (repeat date keyword) on a regular basis. The suite contains an artificial dummy task, fct_ifs, whose completion is the starting condition for the whole suite.

There exists a small shell script def/smsfiles/job.trigger which is triggered by the IFS framework itself, i.e. via

 ecaccess-job-submit

This shell script calls a CDP command:

 force complete fct_ifs

and thus activates the whole ICON SMS suite.

Currently, the task is triggered by the event ID 167, which means “at this stage, the analysis at 00UTC is complete”. The task has been registered with

 ecaccess-job-submit -noDirectives -eventIds 167 -queueName ecgate $HOME/ICON_r2B06_10d/def/smsfiles/job.trigger 

More events can be obtained from the list generated by the command

 ecaccess-event-list 

Finally, use

 ecaccess-job-list
 ecaccess-job-delete <jobID>

to remove a job trigger from the list.

Task init/setup

Objectives: Set some initial variables and flags.
Platform: ecgate
Requires input: None.
Provides output: sets events “enable_build” and “enable_dumpstate” based on current date.

Some parts of the SMS suite are not run in a daily fashion but less frequently. For example, dump states (coefficient tables) must be pre-computed only once and a new binary should be compiled only about once a week.

Currently (test phase of the SMS suite), the variable enable_dumpstate is always enabled, while enable_build is always disabled.

Task init/get_data

Objectives: Retrieve IFS data from MARS database.
Platform: hpc_serial
Requires input: None.
Provides output: Retrieved *.grb file is stored in $PERM/%SUITE_NAME%/cluster/input/ifs2icon

Task init/init_data

Objectives: Run converter IFS2ICON.
Platform: hpc_serial
Requires input: GRIB file stored in $PERM/%SUITE_NAME%/cluster/input/ifs2icon (moved by this task)
Requires input: IFS2ICON config file in $PERM/%SUITE_NAME%/def/ifs2icon_1279.conf
Requires input: Grid files in $PERM/%SUITE_NAME%/cluster/input/grids
Provides output: Conversion result (NetCDF format) and the GRIB file in $TEMP/sms/input/%SUITE_NAME/$REQUEST_DATE/
        cluster/input/ifs2icon/cellweights/initialized.flag

Task init/build/init_svn

Objectives: Update ICON sources.
Platform: ecgate
Provides output: Source code in directory c1a:$PERM/%SUITE_NAME%/cluster/icon-dev

There are two modes, depending on user-defined %SVN_UPDATE% flag:

Subversion update

Requires an existing source directory on compute cluster c1a. Apply only svn update and gmake distclean to retrieve the source code from the repository.

Tar-ball export

SVN export of a source copy in $TMPDIR/icon-dev (on ecgate), copy to the compute cluster c1a, where the files will be extracted (in task init/build/init_build).

Task init/build/init_build

Objectives: Build ICON binary.
Platform: hpc_parallel
Requires input: Updated source directory, c1a:$PERM/%SUITE_NAME%/cluster/icon-dev
Provides output: Binary in c1a:$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/

The build process always starts from a clean directory, i.e. the complete is recompiled.

Note: The binary is configured with -O3 optimization enabled, which requires a considerable amount of time on the PWR6.

Task forecast/prepare/pre_clean

Objectives: Clear directory with model run and output directory.
Platform: hpc_serial
Provides output: Clears output directory %TEMP%/sms/output/%SUITE_NAME%/$REQUEST_DATE/

Task forecast/prepare/dumpstate

Objectives: Create dump state (containing coefficient tables for ICON run).
Platform: hpc_parallel
Requires input: Case setup in $PERM/%SUITE_NAME%/def.
Requires input: IFS data in $TEMP/sms/input/.
Requires input: control_model binary in$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/.
Provides output: Dump state in $TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/

Task forecast/model

Objectives: Model run.
Platform: hpc_parallel
Requires input: Case setup in $PERM/%SUITE_NAME%/def.
Requires input: IFS data in $TEMP/sms/input/.
Requires input: control_model binary in$PERM/%SUITE_NAME%/cluster/icon-dev/build/…/bin/.
Requires input: Dump state in $TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/
Provides output: Model output in $TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/

On abort or completion: Terminates the SMS task “check_progress” by calling the utility script cancel_check_progress.

Task forecast/check_progress

Objectives: Check for available output files.
Platform: ecgate
Provides output: Increases SMS meter “progress”.

When using the XCDP graphical user interface the current number of output files is displayed as a graphical progress bar. Please note that the timing might be inaccurate because the output directory is polled infrequently. The value of the progress meter is used to trigger the post-processing tasks of the suite.

Terminates, if all NetCDF output files are available in c1a:$TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/ or if on ecgate a file $HOME/sms/model_complete.flag has been created.

Task post/post_process

This task is called for each output step separately.

Not yet implemented, only a “dummy” call to grib_ls!

Objectives: Extract output, generate plots.
Platform: hpc_serial
Requires input: Model output in $TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/

The calling frequency of this post-processing task is probably too high: It is invoked each time after new output file has been written to disk. That is, 80 post-processing tasks are launched on hpc_serial (c1a). One could also design a post-processing mechanism which is invoked only once, after the final output file has been created. For this, adjust the repeat and trigger keywords in the SMS suite definition file

 $HOME/ICON_R2B06_10d/def/smsfiles/icon.def

See ECMWF's SMS documentation on how to modify this setting.

Task post/post_archive

Not yet implemented!

Objectives: Copy output and/or store in database and/or trigger transfer to DWDExtract output, generate plots.
Platform: ecgate
Requires input: Model output in $TEMP/sms/output/%SUITE_NAME%/$REQUEST_DATE/

Caveats

Technical infrastructure

Setting up your XCDP client

deia's ICON SMS suite can be viewd and controlled from your own ECMWF account via the graphical frontend “xcdp”: Launch

xcdp

and open the menu

Edit -> Preferences -> Servers

There, add a new server by typing

Name:   deia
Host:   ecgate
Number: 901730

(where “901730” denotes the SMS ID provided by sms_start). Then, close the preferences dialog and log into the SMS suite by the menu item

Servers -> deia

Note: Currently, the SMS suite is started at about 7:30 CEST and runs less than 60 minutes. Therefore, for most of the day, the XCDP status of the suite is “queued” (blue-gray color)!

Diagram of the ICON SMS suite

Installation of the SMS suite

sms_start

command

 sms_start

gives

User "3455" attempting to start sms server on "ecgate" using PROGNUM = "903455" and with SMSHOME of "$HOME/sms_server"

Checking if the SMS is already running on ecgate
...

Then write host and number to .cdprc:

 alias myalias set SMS_PROG 903455 \; login ecgate UID 1

where ”903455” should be replaced by the provided PROGNUM value.

Running the suite

Initially create a directory structure for the SMS log files:

 cd
 mkdir sms
 mkdir sms/icon
 mkdir sms/icon/init
 mkdir sms/icon/forecast
 mkdir sms/icon/post    
 mkdir sms/icon/init/build
 mkdir sms/icon/forecast/prepare

Similarly, on the compute cluster c1a:

 cd $TEMP
 mkdir sms
 mkdir sms/icon
 mkdir sms/icon/init
 mkdir sms/icon/forecast
 mkdir sms/icon/post    
 mkdir sms/icon/init/build
 mkdir sms/icon/forecast/prepare

In the $HOME/.cdprc file, several utility routines have been defined: Start the suite with

 cdp
 
 CDP> myalias    
 CDP> play icon.def
 CDP> begin icon   

If there exists already a running SMS suite, you can relaunch the whole setup with

 cdp
 
 CDP> total_restart

Check the status of the suite with

 cdp
 
 CDP> myalias 
 CDP> status -f
      /{sub}   icon{sub}   t1[sub]   

Logserver

On the compute cluster c1a a special process is in charge of copying log files back to ecgate s.t. they can be viewed in xcdp:

  cd $PERM/ICON_r2B06_10d/def/bin
  ./logserver

Now a log server daemon must be running: The command line

  ps -ef | grep $USER

yield something like

  ...  418144       1   0 10:34:29 pts/172  0:00 /usr/bin/perl /usr/local/bin/logsvr.pl 

.rhosts

Enable rcp and rsh between the compute cluster and ecgate by setting the $HOME/.rhosts file:

  echo "ecga02 $USER" >> ~/.rhosts

For copying files between ecgate and compute cluster use the commands rcp or ecrcp.

SVN access

SVN access is required

Please note: For direct access to ZMAW's subversion repository, changes in $HOME/.subversion/servers are necessary:

  http-proxy-host = ******
  http-proxy-port = ******

(Inofficial access, ask L. Kornblueh, F. Prill for details).

Note that on c1a no SSL support is available, therefore type

  svn co http://svn.zmaw.de/svn/icon/trunk/icon-dev/schedulers/ecmwf/ICON_r2B06_10d

on the command line for c1a.

Local Ruby installation

Script language ruby has to be installed locally (user account). It is required by the IFS2ICON task.

Prerequisites:

  1. Ruby source code downloaded from http://www.ruby-lang.org/de/downloads/ to $PERM/software.
  2. Package “extcsv” required; download from http://rubygems.org/gems/extcsv to $PERM/software/packages.
  3. Package “gnuplot” required; download from http://rubygems.org/gems/gnuplot to $PERM/software/packages.

Installation process:

tar xvf ruby-1.9.3-p125.tar
cd ruby-1.9.3-p125
./configure --prefix=$PERM/software/ruby-1.9.3-p125_build
gmake -j6
gmake install
cd ../packages
$PERM/software/ruby-1.9.3-p125_build/bin/gem install extcsv
$PERM/software/ruby-1.9.3-p125_build/bin/gem install gnuplot

Patch required: In $PERM/software/ruby-1.9.3-p125_build/lib/ruby/gems/1.9.1/gems/extcsv-0.12.0/lib/extcsv_diagram.rb comment out require statement:

 #require 'extcsv_units'
 

Local CDO installation

It is necessary to compile a local version of the Climate Data Operators, which is newer than v1.5.0, because the system installation of the CDOs on c1a has not been compiled with NetCDF support. The configure script must be provided with the correct system paths for NetCDF, Jasper and GRIB_API:

ls -rlt
gunzip cdo-1.5.4.tar.gz
tar xvf cdo-1.5.4.tar
cd cdo-1.5.4
CC=xlc_r CFLAGS="-g -O3 -q64 -qhot -qarch=auto -qtune=auto -qsmp=omp -DHAVE_MMAP" 
./configure -with-netcdf=/usr/local/apps/netcdf/3.6.3/LP64 \
 --prefix=$PERM/software/cdo-1.5.4_build \
 --with-grib_api=/usr/local/lib/metaps/lib/grib_api/1.9.9 \
 --with-jasper=/usr/local/apps/jasper/1.900.1/LP64 \
 --with-threads=yes
gmake -j2
gmake install

After successful compilation, all IFS2ICON processes must run with

 $PERM/software/cdo-1.5.4_build/bin/cdo

To this end, the task init_data.sms exports an environment variable CDOBIN which is then used by the Ruby script ifs2icon.rb.

GRIB_API settings

We want to use the same IFS2ICON configuration settings for ECMWF as well as for DWD, therefore the GRIB2 short names must be taken from DWD's definition files:

 export GRIB_DEFINITION_PATH=%SCPERM%/software/usr/local/grib_api/release/share-1.9.9/definitions.edzw:/usr/local/lib/metaps/lib/grib_api/1.9.9/share/definitions

The directory ”%SCPERM%/software/usr/local/grib_api/release/share-1.9.9/definitions.edzw” can be tar'ed and copied from DWD's HPC file system.

Alternative approach: DWD GRIB definitions are already installed under

 ~dwd/grib_api/definitions.edzw-${my_api_version}

One can set them using the script /home/ms/de/dw7/bin/grib_def.

Local settings

The MARS4ICON script must be executable:

   chmod +x $PERM/ICON_r2B06_10d/def/icon_scripts/preprocessing/mars4icon_smi

MARS4ICON requires DWD's date conversion script datconv, which must be in PERL's search path:

   cd $HOME/bin
   ln -s ~dfr/routfox/bin/datconv .
   ln -s ~dfr/routfox $HOME/
   export PERL5LIB="$HOME/routfox/perl"

Still TODO

1) Important note: The SMS suite is targeted at the compute cluster c1a which will go out of service in 2012.
2) Keep in mind that there exist two sets of variables: System environment variables are denoted by the dollar (”$”) sign, while SMS variables are denoted by the percent (”%”) sign.