Commit 2b00fe07 authored by Oliver Heidmann's avatar Oliver Heidmann
Browse files

update docu

parent 5f2e54d6
Pipeline #4965 passed with stages
in 17 minutes and 24 seconds
......@@ -21,7 +21,13 @@
\usepackage[pdftex]{hyperref}
\fi
\usepackage{subcaption}
\usepackage{textcomp}
\usepackage{float}
\usepackage{fancyvrb}
% \usepackage{minted}
%\usemintedstyle{borland}
\usepackage{transparent}
......
\subsection{Operator chaining}\label{generalChaining}
\emph{Operator chaining} allows to combine two or more operators on the command line into a single
CDO call. This allows the creation of complex operations out of more simple ones: reductions over
several dimensions, file merges and all kinds of analysis processes. All operators with a fixed
number of input streams and one output stream can pass the result directly to an other operator.
For differentiation between files and operators all operators must be written with a prepended "–"
when chaining.
\begin{verbatim}
cdo -monmean -add -mulc,2.0 infile1 -daymean infile2 outfile (CDO example call)
\end{verbatim}
Here \texttt{monmean} will have the output of \texttt{add} while \texttt{add} takes the output of
\texttt{mulc,2.0} and \texttt{daymean}. \texttt{infile 1} and \texttt{infile 2} are inputs for their predecessor.
When mixing operators with an arbitrary number of
input streams extra care needs to be taken. The following examples illustrates why.
\begin{enumerate}
\item \texttt{cdo info -timavg infile?}
\item \texttt{cdo info -timavg infile1 infile2}
\item \texttt{cdo timavg infile1 tmpfile} \\
\texttt{cdo info tmpfile infile2}
\end{enumerate}
All three examples produce identical results. The time average will be computed only on the first
input file.\\\\
% In the following example we want \emph{infileA} to be assigned to \emph{add}. The example
%will show another pitfall that the arbitrary inputs introduce.
%\begin{verbatim}
%cdo -add -merge infileB infileC infileA outfile (NOT OK)
%\end{verbatim}
%Here \emph{infileA} is assigned to \emph{merge} since the merge command greedily takes all inputs
%to its right.
%\begin{verbatim}
%cdo -add infileA -merge infileB infileC outfile (ok)
%\end{verbatim}
%Here \emph{infileA} is correctly assigned.\\\\
\textbf{Note(1):}
In section \ref{argGroups} we introduce argument groups which
will make this a lot easier and less error prone.
\\\\
\textbf{Note(2):}
Operator chaining is implemented over POSIX Threads (pthreads).
Therefore this {\CDO} feature is not available on operating systems without POSIX Threads
support!
\subsection{Chaining Benefits}
Combining operators can have several benefits. The most obvious is a
performance increase through reducing disk I/O:\\
\begin{verbatim}
cdo sub -dayavg infile2 -timavg infile1 outfile
\end{verbatim}
instead of
\begin{verbatim}
cdo timavg infile1 tmp1
cdo dayavg infile2 tmp2
cdo sub tmp2 tmp1 outfile
rm tmp1 tmp2
\end{verbatim}
Especially with large input files the reading and writing of intermediate
files can have a big influence on the overall performance.\\
A second aspect is the execution of operators: Limited by the algorythms potentially
all operators of a chain can run in parallel.
\section{Advanced Usage}
In this section we will introduce advanced features of CDO. These include operator grouping which
allows to write more complex CDO calls and the apply keyword which allows to shorten calls that
need an operator to be executed on multiple files as well as wildcards which allow to search paths
for file signatures. These features have several restrictions and follow rules that depend on the
input/output properties. These required properties of operators can be investigated with the
following commands:
\begin{verbatim}
cdo --attribs [obase/arbitrary/filesOnly/onlyFirst/noOutput] [operatorName]?
\end{verbatim}
\begin{itemize}
\item \emph{arbitrary} describes all operators where the number of inputs is not defined.
\item \emph{filesOnly} are operators that can have other operators as input.
OnlyFirst shows which operators can only be at the most left position of the polish notation
argument chain.
\item \emph{noOutput} are all operators that do not print to any file (e.g info)
\item \emph{obase} Here obase describes an operator that does not use the output argument as file but e.g as a file
name base (output base). This is almost exclusivly used for operators the split input files.
\begin{verbatim}
cdo -splithour baseName_
could result in: baseName_1 baseName_2 ... baseName_N
\end{verbatim}
\end{itemize}
\subsection{Wildcards}
Wildcards are a standard feature of command line interpreters (shells)
on many operating systems. They are placeholder characters used in file paths that are expanded by
the interpreter into file lists. For further information the
\href{https://tldp.org/LDP/abs/html}{Advance Bash Scripting Guide} is a
valuable source of information. Handling of input is a central issue for CDO
and in some circumstances it is not enough to use the wildcards from the shell.
That's why CDO can handle them on its own.\newline
\begin{tabular}{|l|l|}
\hline
\textbf{all files} &
2020-2-01.txt 2020-2-11.txt 2020-2-15.txt 2020-3-01.txt 2020-3-02.txt \\
& 2020-3-12.txt 2020-3-13.txt 2020-3-15.txt 2021.grb 2022.grb \\
\hline
\hline
\textbf{wildcard} & \textbf{filelist results} \\
\hline
2020-3* and 2020-3-??.txt & 2020-3-01.txt 2020-3-02.txt 2020-3-12.txt 2020-3-13.txt 2020-3-15.txt \\
\hline
2020-2-?1.txt & 2020-3-01.txt \\
\hline
*.grb & 2021.grb 2020.grb \\
\hline
\end{tabular}\newline
\\
Use single quotes if the input stream names matched to a single wildcard expression. In this case
CDO will do the pattern matching and the output can be combined with other operators. Here is an
example for this feature:
\begin{verbatim}
cdo timavg -select,name=temperature 'infile?' outfile
\end{verbatim}
In earlier versions of CDO this was necessary to have the right files parsed to the right
operator. Newer version support this with the argument grouping
feature (see \ref{argGroups}). We advice the use of the grouping mechanism instead of the single quoted wildcards since this
feature could be deprecated in future versions. \newline\newline
\textbf{Note:}
Wildcard expansion is not available on operating
systems without the \textit{glob()} function!\newline
\subsection{Argument Groups}\label{argGroups}
In section \ref{generalChaining} we described that it is not possible to chain operators
with an arbitrary number of inputs. In this section we want to show
how this can be achieved through the use of \emph{operator grouping} with
angled brackets \texttt{[]}. Using these brackets CDO can assigned the inputs
to their corresponding operators during the execution of the command line. The
ability to write operator combination in a parenthis-free way is partly given
up in favor of allowing operators with arbitrary number of inputs. This allows
a much more compact way to handle large number of input files.\\ The following
example shows an example which we will transform from a non-working solution to
a working one.
\begin{verbatim}
cdo -infov -div -fldmean -cat infile1 -mulc,-1 infile2 -fldmax infile3
\end{verbatim}
This exmple will throw the following error:
\begin{verbatim}
cdo -infov -div -fldmean -cat infile1 -mulc,2.0 infile2 -fldmax infile3
cdo (Warning): Did you forget to use '[' and/or ']' for multiple variable input operators?
cdo (Warning): use option --variableInput, for description
cdo (Abort): Too few streams specified! Operator div needs 2 input streams and 1 output stream!
\end{verbatim}
The error comes from -div. This operator needs two input streams and one output stream, but -cat has claimed all
possible streams on its right hand side as input and didn't leave anything for the remaining input
or output stream of -div.
For this we can declare a group which will be passed to the operator in front of the group.
\begin{verbatim}
cdo -infov -div -fldmean -cat [ infile1 -mulc,2.0 infile2 ] -fldmax infile3
\end{verbatim}
It is possible to have groups inside groups:
\begin{verbatim}
cdo -infov -div -fldmean -cat [ fileA1 infileC2 -merge [ infileB1 infileB2 ] ] -fldmax infileD
\end{verbatim}
The error is raised by the operator \emph{div}. This operator needs two input
streams and one output stream, but the \emph{cat} operator has claimed all
possible streams on its right hand side as input because it accepts an
arbitrary number of inputs. Hence it didn't leave anything for the remaining
input or output streams of \emph{div}. For this we can declare a group which will be
passed to the operator left of the group.
\begin{verbatim}
cdo -infov -div -fldmean -cat [ infile1 -mulc,2.0 infile2 ] -fldmax infile3
\end{verbatim}
For full flexibility it is possible to have groups inside groups:
\begin{verbatim}
cdo -infov -div -fldmean -cat [ fileA1 infileC2 -merge [ infileB1 infileB2 ] ] -fldmax infileD
\end{verbatim}
\subsection{Apply Keyword}\label{applykeyword}
When working with medium or large number of similar files there is a common
problem of a processing step (often a reduction) which needs to be performed on
all of them before a more
specific analysis can be applied. Ususally this can be done in two ways: One
options is to use
\texttt{merge} to glue everything together and chain the reduction step
after it. The second options is to write a for-loop over all inputs which perform
the basic processing on each of the files separately and call \texttt{merge} one
the results. Unfortunately both options
have side-effects: The first one needs a lot of memory because all files are
read in completely and reduced afterwards while the latter one creates a lot of
temporary files. Both memory and disk IO can be bottlenecks and should be
avoided.\\
The \emph{apply} keyword was introduced for that purpose. It can be used as an
operator, but it needs at least one operator as a parameter, which is applied in
parallel to all related input streams in a parallel way before all streams are
passed to operator next in the chain.\\
The following is an example with three input files:\\
\begin{figure}[H]
\begin{verbatim}
cdo -merge -apply,-daymean [ file1 file2 file3 ] outfile
\end{verbatim}
would result in:
\begin{verbatim}
cdo -merge -daymean file1 -daymean file2 -daymean file3 outfile
\end{verbatim}
\caption{Usage and result of apply keyword}
\end{figure}
Apply is especially useful when combined with wildcards. The previous example can be shortened
further.
\begin{verbatim}
cdo -merge -apply,-daymean [ file? ] outfile
\end{verbatim}
As shown this feature allows to simplify commands with medium amount of files and to move reductions further
back. This can also have a positive impact on the performance.
\begin{figure}[H]
An example where performance can take a hit.
\begin{verbatim}
cdo -yearmean -daymean -merge [ f1 ... f40 ]
\end{verbatim}
An improved but ugly to write example.
\begin{verbatim}
cdo -yearmean -merge [ -daymean f1 -daymean f2 ... -daymean f40 ]
\end{verbatim}
Apply saves the day. And creates the call above with much less typing.
\begin{verbatim}
cdo -yearmean -merge [ -apply,-daymean [ f1 ... f40 ] ]
\end{verbatim}
\caption{Apply keyword simplifies command and execution}
\label{simpApply}
\end{figure}
In the example in figure \ref{simpApply} the resulting call will dramatically save process
interaction as well as execution times since the reduction (daymean) is applied on the files first. That means
that the merge operator will receive the reduced files and the operations for merging the whole
data is saved. For other CDO calls further improvements can be made by adding more arguments to
apply (\ref{multiArgApply})
\begin{figure}[H]
A less performant example.
\begin{verbatim}
cdo -aReduction -anotherReduction -daymean -merge [ f1 ... f40 ]
\end{verbatim}
\begin{verbatim}
cdo -merge -apply,"-aReduction -anotherReduction -daymean" [ f1 ... f40 ]
\end{verbatim}
\caption{Multi argument apply}
\label{multiArgApply}
\end{figure}
\paragraph{Restrictions:}While the apply keyword can be extremely helpful it has several restrictions (for now!).
\begin{itemize}
\item Apply inputs can only be files, wildcards and operators that have 0 inputs and 1 output.
\item Apply can not be used as the first cdo operator.
\item Apply arguments can only be operators with 1 input and 1 output.
\item Grouping inside the Apply argument or input is not allowed.
\end{itemize}
pdflatex cdo_libdep.tex
pdflatex cdo.tex
pdflatex cdo.tex
pdflatex --shell-escape cdo.tex
pdflatex --shell-escape cdo.tex
cat > cdo.ist << 'EOF'
delim_0 "{\\idxdotfill} "
headings_flag 1
......@@ -9,6 +9,6 @@ heading_prefix "{\\centerline {\\Large \\textbf{ "
heading_suffix "}}}"
EOF
makeindex -s cdo.ist cdo.idx
pdflatex cdo
pdflatex --shell-escape cdo
#thumbpdf cdo
pdflatex cdo
pdflatex --shell-escape cdo
......@@ -155,66 +155,7 @@ There are more than 700 operators available.
A detailed description of all operators can be found in the
\textbf{\htmlref{Reference Manual}{refman}} section.
\subsection{Operator chaining}
All operators with a fixed number of input streams and one output stream can pipe the result directly to an other operator.
The operator must begin with "--", in order to combine it with others.
This can improve the performance by:
\begin{itemize}
\item reducing unnecessary disk I/O
\item parallel processing
\end{itemize}
Use
\begin{verbatim}
cdo sub -dayavg infile2 -timavg infile1 outfile
\end{verbatim}
instead of
\begin{verbatim}
cdo timavg infile1 tmp1
cdo dayavg infile2 tmp2
cdo sub tmp2 tmp1 outfile
rm tmp1 tmp2
\end{verbatim}
All operators with one input stream will process only one input stream!
You need to take care when mixing those operators with an operator with an arbitrary number of input streams.
The following examples illustrate this problem.
\begin{enumerate}
\item \texttt{cdo info -timavg infile?}
\item \texttt{cdo info -timavg infile1 infile2}
\item \texttt{cdo timavg infile1 tmpfile} \\
\texttt{cdo info tmpfile infile2}
\end{enumerate}
All three examples produce identical results.
The time average will be computed only on the first input file.
All operators with an arbitrary number of input streams (\texttt{infiles})
can't be combined with other operators if these operators are used
with more than one input stream.
Here is an incomplete list of these operators:
\textbf{\htmlref{copy}{copy}},
\textbf{\htmlref{cat}{cat}},
\textbf{\htmlref{merge}{merge}},
\textbf{\htmlref{mergetime}{mergetime}},
\textbf{\htmlref{select}{select}},
\textbf{\htmlref{ens$<\!STAT\!>$}{ENSSTAT}} \\
Use single quotes if the input stream names matched to a single wildcard expression.
In this case CDO will do the pattern matching and the
output can be combined with other operators. Here is an example for
this feature:
\begin{verbatim}
cdo timavg -select,name=temperature 'infile?' outfile
\end{verbatim}
The CDO internal wildcard expansion is using the \textit{glob()} function.
Therefore internal wildcard expansion is not available on operating
systems without the \textit{glob()} function!
The following wildcards are supported: \begin{math}*, ?, []\end{math}
\textbf{Note:}
Operator chaining is implemented over POSIX Threads (pthreads).
Therefore this {\CDO} feature is not available on operating systems without POSIX Threads support!
\input{cdo_advanced_usage}
\subsection{Parallelized operators}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment