'\" t
.\" Title: pegasus-cluster
.\" Author: [see the "Authors" section]
.\" Generator: DocBook XSL Stylesheets v1.79.1
.\" Date: 11/09/2018
.\" Manual: Pegasus Manual
.\" Source: Pegasus 4.4.0
.\" Language: English
.\"
.TH "PEGASUS\-CLUSTER" "1" "11/09/2018" "Pegasus 4\&.4\&.0" "Pegasus Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
pegasus-cluster \- run a list of applications
.SH "SYNOPSIS"
.sp
.nf
\fBpegasus\-cluster\fR [\fB\-d\fR] [\fB\-e\fR | \fB\-f\fR] [\fB\-S ec\fR] [\fB\-s fn\fR] [\fB\-R fn\fR] [\fB\-n nr\fR] [\fBinputfile\fR]
.fi
.SH "DESCRIPTION"
.sp
The \fBpegasus\-cluster\fR tool executes a list of application in the order specified (assuming sequential mode\&.) It is generally used to do horizontal clustering of independent application, and does not care about any application failures\&. Such failures should be caught by using \fBpegasus\-kickstart\fR to start application\&.
.sp
In vertical clustering mode, the \fIhard failure\fR mode is encouraged, ending execution as soon as one application fails\&. When running a complex workflow through \fBpegasus\-cluster\fR , the order of applications in the input file must be topologically sorted\&.
.sp
Applications are usually using \fBpegasus\-kickstart\fR to execute\&. In the \fBpegasus\-kickstart\fR case, all invocations of \fBpegasus\-kickstart\fR except the first should add the \fBpegasus\-kickstart\fR option \fI\-H\fR to supress repeating the XML preamble and certain other headers of no interest when repeated\&.
.sp
\fBpegasus\-cluster\fR permits shell\-style quoting\&. One level of quoting is removed from the arguments\&. Please note that \fBpegasus\-kickstart\fR will also remove one level of quoting\&.
.SH "ARGUMENTS"
.PP
\fB\-d\fR
.RS 4
This option increases the debug level\&. Debug message are generated on
\fIstdout\fR
\&. By default, debugging is minimal\&.
.RE
.PP
\fB\-e\fR
.RS 4
This flag turns on the old behavior of
\fBpegasus\-cluster\fR
to always run everything
\fIand\fR
return success no matter what\&. The
\fB\-e\fR
flag is mutually exclusive with the
\fB\-f\fR
flag\&. By default, all applications are executed regardles of failures\&. Any detected application failure results in a non\-zero exit status from
\fBpegasus\-cluster\fR\&.
.RE
.PP
\fB\-f\fR
.RS 4
In hard failure mode, as soon as one application fails, either through a non\-zero exit code, or by dying on a signal, further execution is stopped\&. In parallel execution mode, one or more other applications later in the sequence file may have been started already by the time failure is detected\&.
\fBPegasus\-cluster\fR
will wait for the completion of these applications, but not start new ones\&. The
\fB\-f\fR
flag is mutually exclusive with the
\fB\-e\fR
flag\&. By default, all applications are executed regardless of failures\&. Any detected application failure results in a non\-zero exit status from
\fBpegasus\-cluster\fR\&.
.RE
.PP
\fB\-h\fR
.RS 4
This option prints the help message and exits the program\&.
.RE
.PP
\fB\-s fn\fR
.RS 4
This option will send protocol message (for Mei) to the specified file\&. By default, all message are written to
\fIstdout\fR
\&.
.RE
.PP
\fB\-R fn\fR
.RS 4
The progress reporting feature, if turned on, will write one event record whenever an application is started, and one event record whenever an application finished\&. This is to enable tracking of jobs in progress\&. By default, track logs are not written, unless the environment variable
\fISEQEXEC_PROGRESS_REPORT\fR
is set\&. If set, progress reports are appended to the file pointed to by the environment variable\&.
.RE
.PP
\fB\-S ec\fR
.RS 4
This option is a multi\-option, which may be used multiple times\&. For each given non\-zero exit\-code of an application, mark it as a form of success\&. In
\fB\-f\fR
mode, this means that
\fBpegasus\-cluster\fR
will not fail when seeing this exit code from any application it runs\&. By default, all non\-zero exit code constitute failure\&.
.RE
.PP
\fB\-n nr\fR
.RS 4
This option determines the amount of parallel execution\&. Typically, parallel execution is only recommended on multi\-core systems, and must be deployed rather carefully, i\&.e\&. only completely independent jobs across of whole
\fIinputfile\fR
should ever be attempted to be run in parallel\&. The argument
\fBnr\fR
is the number of parallel jobs that should be used\&. In addition to a non\-negative integer, the word
\fIauto\fR
is also understood\&. When
\fIauto\fR
is specified,
\fBpegasus\-cluster\fR
will attempt to automatically determine the number of cores available in the system\&. Strictly sequential execution, as if
\fInr\fR
was 1, is the default\&. If the environment variable
\fISEQEXEC_CPUS\fR
is set, it will determine the default number of CPUs\&.
.RE
.PP
\fBinputfile\fR
.RS 4
The input file specifies a list of application to run, one per line\&. Comments and empty lines are permitted\&. The comment character is the octothorpe (#), and extends to the end of line\&. By default,
\fBpegasus\-cluster\fR
uses
\fIstdin\fR
to read the list of applications to execute\&.
.RE
.SH "RETURN VALUE"
.sp
The \fBpegasus\-cluster\fR tool returns 1, if an illegal option was used\&. It returns 2, if the status file from option \fB\-s\fR cannot be opened\&. It returns 3, if the input file cannot be opened\&. It does \fInot\fR return any failure for failed applications in old\-exit \fB\-e\fR mode\&. In \fIdefault\fR and hard failure \fB\-f\fR mode, it will return 5 for true failure\&. The determination of failure is modified by the \fB\-S\fR option\&.
.sp
All other internal errors being absent, \fBpegasus\-cluster\fR will always return 0 when run without \fB\-f\fR \&. Unlike shell, it will \fInot\fR return the last application\(cqs exit code\&. In \fIdefault\fR mode, it will return 5, if any application failed\&. Unlike shell, it will \fInot\fR return the last application\(cqs exit code\&. However, it will execute all applications\&. The determination of failure is modified by the \fB\-S\fR flag\&. In \fB\-f\fR mode, *pegasus\-cluster returns either 0 if all main sequence applications succeeded, or 5 if one failed; or more than one in parallel execution mode\&. It will run only as long as applications were successful\&. As before, the *\-S flag determines what constitutes a failure\&.
.sp
The \fBpegasus\-cluster\fR application will also create a small summary on \fIstdout\fR for each job, and one for itself, about the success and failure\&. The field \fBfailed\fR reports any exit code that was not zero or a signal of death termination\&. It does \fInot\fR include non\-zero exit codes that were marked as success using the \fB\-S\fR option\&.
.SH "TASK SUMMARY"
.sp
Each task executed by \fBpegasus\-cluster\fR generates a record bracketed by square brackets like this (each entry is broken over two lines for readability):
.sp
.if n \{\
.RS 4
.\}
.nf
[cluster\-task id=1, start="2011\-04\-27T14:31:25\&.340\-07:00", duration=0\&.521,
status=0, line=1, pid=18543, app="/bin/usleep"]
[cluster\-task id=2, start="2011\-04\-27T14:31:25\&.342\-07:00", duration=0\&.619,
status=0, line=2, pid=18544, app="/bin/usleep"]
[cluster\-task id=3, start="2011\-04\-27T14:31:25\&.862\-07:00", duration=0\&.619,
status=0, line=3, pid=18549, app="/bin/usleep"]
.fi
.if n \{\
.RE
.\}
.sp
Each record is introduced by the string \fIcluster\-task\fR with the following constituents, where strings are quoted:
.PP
\fBid\fR
.RS 4
This is a numerical value for main sequence application, indicating the application\(cqs place in the sequence file\&. The setup task uses the string
\fIsetup\fR
, and the cleanup task uses the string
\fIcleanup\fR
\&.
.RE
.PP
\fBstart\fR
.RS 4
is the ISO 8601 time stamp, with millisecond resolution, when the application was started\&. This string is quoted\&.
.RE
.PP
\fBduration\fR
.RS 4
is the application wall\-time duration in seconds, with millisecond resolution\&.
.RE
.PP
\fBstatus\fR
.RS 4
is the
\fIraw\fR
exit status as returned by the
\fIwait\fR
family of system calls\&. Typically, the exit code is found in the high byte, and the signal of death in the low byte\&. Typically, 0 indicates a successful execution, and any other value a problem\&. However, details could differ between systems, and exit codes are only meaningful on the same os and architecture\&.
.RE
.PP
\fBline\fR
.RS 4
is the line number where the task was found in the main sequence file\&. Setup\- and cleanup tasks don\(cqt have this attribute\&.
.RE
.PP
\fBpid\fR
.RS 4
is the process id under which the application had run\&.
.RE
.PP
\fBapp\fR
.RS 4
is the path to the application that was started\&. As with the progress record, any
\fBpegasus\-kickstart\fR
will be parsed out so that you see the true application\&.
.RE
.SH "PEGASUS\-CLUSTER SUMMARY"
.sp
The final summary of counts is a record bracketed by square brackets like this (broken over two lines for readability):
.sp
.if n \{\
.RS 4
.\}
.nf
[cluster\-summary stat="ok", lines=3, tasks=3, succeeded=3, failed=0, extra=0,
duration=1\&.143, start="2011\-04\-27T14:31:25\&.338\-07:00", pid=18542, app="\&./seqexec"]
.fi
.if n \{\
.RE
.\}
.sp
The record is introduced by the string \fIcluster\-summary\fR with the following constituents:
.PP
\fBstat\fR
.RS 4
The string
\fIfail\fR
when
\fBpegasus\-cluster\fR
would return with an exit status of 5\&. Concretely, this is any failure in
\fIdefault\fR
mode, and first failure in
\fB\-f\fR
mode\&. Otherwise, it will always be the string
\fIok\fR
, if the record is produced\&.
.RE
.PP
\fBlines\fR
.RS 4
is the stopping line number of the input sequence file, indicating how far processing got\&. Up to the number of cores additional lines may have been parsed in case of
\fB\-f\fR
mode\&.
.RE
.PP
\fBtasks\fR
.RS 4
is the number of tasks processed\&.
.RE
.PP
\fBsucceeded\fR
.RS 4
is the number of main sequence jobs that succeeded\&.
.RE
.PP
\fBfailed\fR
.RS 4
is the number of main sequence jobs that failed\&. The failure condition depends on the
\fB\-S\fR
settings, too\&.
.RE
.PP
\fBextra\fR
.RS 4
is 0, 1 or 2, depending on the existence of setup\- and cleanup jobs\&.
.RE
.PP
\fBduration\fR
.RS 4
is the duration in seconds, with millisecond resolution, how long *pegasus\-cluster ran\&.
.RE
.PP
\fBstart\fR
.RS 4
is the start time of
\fBpegasus\-cluster\fR
as ISO 8601 time stamp\&.
.RE
.SH "SEE ALSO"
.sp
\fBpegasus\-kickstart(1)\fR
.SH "CAVEATS"
.sp
The \fB\-S\fR option sets success codes globally\&. It is not possible to activate success codes only for one specific application, and doing so would break the shell compatibility\&. Due to the global nature, use success codes sparingly as last resort emergency handler\&. In better plannable environments, you should use an application wrapper instead\&.
.SH "EXAMPLE"
.sp
The following shows an example input file to \fBpegasus\-cluster\fR making use of \fBpegasus\-kickstart\fR to track applications\&.
.sp
.if n \{\
.RS 4
.\}
.nf
#
# mkdir
/path/to/pegasus\-kickstart \-R HPC \-n mkdir /bin/mkdir \-m 2755 \-p split\-corpus split\-ne\-corpus
#
# drop\-dian
/path/to/pegasus\-kickstart \-H \-R HPC \-n drop\-dian \-o \*(Aq^f\-new\&.plain\*(Aq /path/to/drop\-dian /path/to/f\-tok\&.plain /path/to/f\-tok\&.NE
#
# split\-corpus
/path/to/pegasus\-kickstart \-H \-R HPC \-n split\-corpus /path/to/split\-seq\-new\&.pl 23 f\-new\&.plain split\-corpus/corpus\&.
#
# split\-corpus
/path/to/pegasus\-kickstart \-H \-R HPC \-n split\-corpus /path/to/split\-seq\-new\&.pl 23 /path/to/f\-tok\&.NE split\-ne\-corpus/corpus\&.
.fi
.if n \{\
.RE
.\}
.SH "ENVIRONMENT VARIABLES"
.sp
A number of environment variables permits to influence the behavior of \fBpegasus\-cluster\fR during run\-time\&.
.PP
\fBSEQEXEC_PROGRESS_REPORT\fR
.RS 4
If this variable is set, and points to a writable file location, progress report records are appended to the file\&. While care is taken to atomically append records to the log file, in case concurrent instances of
\fBpegasus\-cluster\fR
are running, broken Linux NFS may still garble some content\&.
.RE
.PP
\fBSEQEXEC_CPUS\fR
.RS 4
If this variable is set to a non\-negative integer, that many CPUs are attempted to be used\&. The special value
\fIauto\fR
permits to auto\-detect the number of CPUs available to
\fBpegasus\-cluster\fR
on the system\&.
.RE
.PP
\fBSEQEXEC_SETUP\fR
.RS 4
If this variable is set, and contains a single fully\-qualified path to an executable and arguments, this executable will be run before any jobs are started\&. The exit code of this setup job will have no effect upon the main job sequence\&. Success or failure will not be counted towards the summary\&.
.RE
.PP
\fBSEQEXEC_CLEANUP\fR
.RS 4
If this variable is set, and contains a single fully\-qualified path to an executable and arguments, this executable will be before
\fBpegasus\-cluster\fR
quits\&. Failure of any previous job will have no effect on the ability to run this job\&. The exit code of the cleanup job will have no effect on the overall success or failure state\&. Success or failure will not be counted towards the summary\&.
.RE
.SH "HISTORY"
.sp
As you may have noticed, \fBpegasus\-cluster\fR had the name \fBseqexec\fR in previous incantations\&. We are slowly moving to the new name to avoid clashes in a larger OS installation setting\&. However, there is no pertinent need to change the internal name, too, as no name clashes are expected\&.
.SH "AUTHORS"
.sp
Jens\-S\&. Vöckler
.sp
Pegasus \fBhttp://pegasus\&.isi\&.edu/\fR