Commit b9dd9ad8 authored by Uwe Schulzweida's avatar Uwe Schulzweida
Browse files

Docu update.

parent 2b00fe07
Pipeline #4970 passed with stages
in 16 minutes and 51 seconds
\subsection{Operator chaining}\label{generalChaining}
\emph{Operator chaining} allows to combine two or more operators on the command line into a single
CDO call. This allows the creation of complex operations out of more simple ones: reductions over
{\CDO} call. This allows the creation of complex operations out of more simple ones: reductions over
several dimensions, file merges and all kinds of analysis processes. All operators with a fixed
number of input streams and one output stream can pass the result directly to an other operator.
For differentiation between files and operators all operators must be written with a prepended "–"
......@@ -9,14 +9,15 @@ when chaining.
cdo -monmean -add -mulc,2.0 infile1 -daymean infile2 outfile (CDO example call)
\end{verbatim}
Here \texttt{monmean} will have the output of \texttt{add} while \texttt{add} takes the output of
\texttt{mulc,2.0} and \texttt{daymean}. \texttt{infile 1} and \texttt{infile 2} are inputs for their predecessor.
\texttt{mulc,2.0} and \texttt{daymean}. \texttt{infile1} and \texttt{infile2} are inputs for their predecessor.
When mixing operators with an arbitrary number of
input streams extra care needs to be taken. The following examples illustrates why.
\begin{enumerate}
\item \texttt{cdo info -timavg infile?}
\item \texttt{cdo info -timavg infile1 infile2}
\item \texttt{cdo info -timavg infile?}
\item \texttt{cdo timavg infile1 tmpfile} \\
\texttt{cdo info tmpfile infile2}
\texttt{cdo info tmpfile infile2} \\
\texttt{rm tmpfile}
\end{enumerate}
All three examples produce identical results. The time average will be computed only on the first
input file.\\\\
......@@ -43,7 +44,7 @@ support!
\subsection{Chaining Benefits}
Combining operators can have several benefits. The most obvious is a
performance increase through reducing disk I/O:\\
performance increase through reducing disk I/O:
\begin{verbatim}
cdo sub -dayavg infile2 -timavg infile1 outfile
\end{verbatim}
......@@ -62,22 +63,20 @@ all operators of a chain can run in parallel.
\section{Advanced Usage}
In this section we will introduce advanced features of CDO. These include operator grouping which
allows to write more complex CDO calls and the apply keyword which allows to shorten calls that
In this section we will introduce advanced features of {\CDO}. These include operator grouping which
allows to write more complex {\CDO} calls and the apply keyword which allows to shorten calls that
need an operator to be executed on multiple files as well as wildcards which allow to search paths
for file signatures. These features have several restrictions and follow rules that depend on the
input/output properties. These required properties of operators can be investigated with the
following commands:
\begin{verbatim}
cdo --attribs [obase/arbitrary/filesOnly/onlyFirst/noOutput] [operatorName]?
cdo --attribs [arbitrary/filesOnly/onlyFirst/noOutput/obase] [operatorName]?
\end{verbatim}
\begin{itemize}
\item \emph{arbitrary} describes all operators where the number of inputs is not defined.
\item \emph{filesOnly} are operators that can have other operators as input.
OnlyFirst shows which operators can only be at the most left position of the polish notation
argument chain.
\item \emph{onlyFirst} shows which operators can only be at the most left position of the polish notation argument chain.
\item \emph{noOutput} are all operators that do not print to any file (e.g info)
\item \emph{obase} Here obase describes an operator that does not use the output argument as file but e.g as a file
name base (output base). This is almost exclusivly used for operators the split input files.
......@@ -93,9 +92,9 @@ Wildcards are a standard feature of command line interpreters (shells)
on many operating systems. They are placeholder characters used in file paths that are expanded by
the interpreter into file lists. For further information the
\href{https://tldp.org/LDP/abs/html}{Advance Bash Scripting Guide} is a
valuable source of information. Handling of input is a central issue for CDO
valuable source of information. Handling of input is a central issue for {\CDO}
and in some circumstances it is not enough to use the wildcards from the shell.
That's why CDO can handle them on its own.\newline
That's why {\CDO} can handle them on its own.\newline
\begin{tabular}{|l|l|}
\hline
\textbf{all files} &
......@@ -107,19 +106,19 @@ That's why CDO can handle them on its own.\newline
\hline
2020-3* and 2020-3-??.txt & 2020-3-01.txt 2020-3-02.txt 2020-3-12.txt 2020-3-13.txt 2020-3-15.txt \\
\hline
2020-2-?1.txt & 2020-3-01.txt \\
2020-3-?1.txt & 2020-3-01.txt \\
\hline
*.grb & 2021.grb 2020.grb \\
\hline
\end{tabular}\newline
\\
Use single quotes if the input stream names matched to a single wildcard expression. In this case
CDO will do the pattern matching and the output can be combined with other operators. Here is an
{\CDO} will do the pattern matching and the output can be combined with other operators. Here is an
example for this feature:
\begin{verbatim}
cdo timavg -select,name=temperature 'infile?' outfile
\end{verbatim}
In earlier versions of CDO this was necessary to have the right files parsed to the right
In earlier versions of {\CDO} this was necessary to have the right files parsed to the right
operator. Newer version support this with the argument grouping
feature (see \ref{argGroups}). We advice the use of the grouping mechanism instead of the single quoted wildcards since this
feature could be deprecated in future versions. \newline\newline
......@@ -131,7 +130,7 @@ systems without the \textit{glob()} function!\newline
In section \ref{generalChaining} we described that it is not possible to chain operators
with an arbitrary number of inputs. In this section we want to show
how this can be achieved through the use of \emph{operator grouping} with
angled brackets \texttt{[]}. Using these brackets CDO can assigned the inputs
angled brackets \texttt{[]}. Using these brackets {\CDO} can assigned the inputs
to their corresponding operators during the execution of the command line. The
ability to write operator combination in a parenthis-free way is partly given
up in favor of allowing operators with arbitrary number of inputs. This allows
......@@ -139,27 +138,15 @@ a much more compact way to handle large number of input files.\\ The following
example shows an example which we will transform from a non-working solution to
a working one.
\begin{verbatim}
cdo -infov -div -fldmean -cat infile1 -mulc,-1 infile2 -fldmax infile3
cdo -infov -div -fldmean -cat infile1 -mulc,2.0 infile2 -fldmax infile3
\end{verbatim}
This exmple will throw the following error:
This example will throw the following error:
\begin{verbatim}
cdo -infov -div -fldmean -cat infile1 -mulc,2.0 infile2 -fldmax infile3
cdo (Warning): Did you forget to use '[' and/or ']' for multiple variable input operators?
cdo (Warning): use option --variableInput, for description
cdo (Abort): Too few streams specified! Operator div needs 2 input streams and 1 output stream!
\end{verbatim}
The error comes from -div. This operator needs two input streams and one output stream, but -cat has claimed all
possible streams on its right hand side as input and didn't leave anything for the remaining input
or output stream of -div.
For this we can declare a group which will be passed to the operator in front of the group.
\begin{verbatim}
cdo -infov -div -fldmean -cat [ infile1 -mulc,2.0 infile2 ] -fldmax infile3
\end{verbatim}
It is possible to have groups inside groups:
\begin{verbatim}
cdo -infov -div -fldmean -cat [ fileA1 infileC2 -merge [ infileB1 infileB2 ] ] -fldmax infileD
\end{verbatim}
The error is raised by the operator \emph{div}. This operator needs two input
streams and one output stream, but the \emph{cat} operator has claimed all
possible streams on its right hand side as input because it accepts an
......@@ -178,9 +165,9 @@ When working with medium or large number of similar files there is a common
problem of a processing step (often a reduction) which needs to be performed on
all of them before a more
specific analysis can be applied. Ususally this can be done in two ways: One
options is to use
option is to use
\texttt{merge} to glue everything together and chain the reduction step
after it. The second options is to write a for-loop over all inputs which perform
after it. The second option is to write a for-loop over all inputs which perform
the basic processing on each of the files separately and call \texttt{merge} one
the results. Unfortunately both options
have side-effects: The first one needs a lot of memory because all files are
......@@ -202,11 +189,10 @@ The following is an example with three input files:\\
\end{verbatim}
\caption{Usage and result of apply keyword}
\end{figure}
Apply is especially useful when combined with wildcards. The previous example can be shortened
further.
Apply is especially useful when combined with wildcards. The previous example can be shortened further.
\begin{verbatim}
cdo -merge -apply,-daymean [ file? ] outfile
\end{verbatim}
\end{verbatim}
As shown this feature allows to simplify commands with medium amount of files and to move reductions further
back. This can also have a positive impact on the performance.
\begin{figure}[H]
......@@ -230,7 +216,7 @@ back. This can also have a positive impact on the performance.
In the example in figure \ref{simpApply} the resulting call will dramatically save process
interaction as well as execution times since the reduction (daymean) is applied on the files first. That means
that the merge operator will receive the reduced files and the operations for merging the whole
data is saved. For other CDO calls further improvements can be made by adding more arguments to
data is saved. For other {\CDO} calls further improvements can be made by adding more arguments to
apply (\ref{multiArgApply})
\begin{figure}[H]
A less performant example.
......@@ -246,7 +232,7 @@ apply (\ref{multiArgApply})
\paragraph{Restrictions:}While the apply keyword can be extremely helpful it has several restrictions (for now!).
\begin{itemize}
\item Apply inputs can only be files, wildcards and operators that have 0 inputs and 1 output.
\item Apply can not be used as the first cdo operator.
\item Apply can not be used as the first {\CDO} operator.
\item Apply arguments can only be operators with 1 input and 1 output.
\item Grouping inside the Apply argument or input is not allowed.
\end{itemize}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment