'\" t
.\" Title: pegasus-monitord
.\" Author: [see the "Authors" section]
.\" Generator: DocBook XSL Stylesheets v1.79.1
.\" Date: 11/09/2018
.\" Manual: Pegasus Manual
.\" Source: Pegasus 4.4.0
.\" Language: English
.\"
.TH "PEGASUS\-MONITORD" "1" "11/09/2018" "Pegasus 4\&.4\&.0" "Pegasus Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
pegasus-monitord \- tracks a workflow progress, mining information
.SH "SYNOPSIS"
.sp
.nf
\fBpegasus\-monitord\fR [\fB\-\-help\fR|\fB\-help\fR] [\fB\-\-verbose\fR|\fB\-v\fR]
[\fB\-\-adjust\fR|\fB\-a\fR \fIi\fR] [\fB\-\-foreground\fR|\fB\-N\fR]
[\fB\-\-no\-daemon\fR|\fB\-n\fR] [\fB\-\-job\fR|\fB\-j\fR \fIjobstate\&.log file\fR]
[\fB\-\-log\fR|\fB\-l\fR \fIlogfile\fR] [\fB\-\-conf\fR \fIproperties file\fR]
[\fB\-\-no\-recursive\fR] [\fB\-\-no\-database\fR | \fB\-\-no\-events\fR]
[\fB\-\-replay\fR|\fB\-r\fR] [\fB\-\-no\-notifications\fR]
[\fB\-\-notifications\-max\fR \fImax_notifications\fR]
[\fB\-\-notifications\-timeout\fR \fItimeout\fR]
[\fB\-\-sim\fR|\fB\-s\fR \fImillisleep\fR] [\fB\-\-db\-stats\fR]
[\fB\-\-skip\-stdout\fR] [\fB\-\-force\fR|\fB\-f\fR]
[\fB\-\-socket\fR] [\fB\-\-output\-dir\fR | \fB\-o\fR \fIdir\fR]
[\fB\-\-dest\fR|\fB\-d\fR \fIPATH\fR or \fIURL\fR] [\fB\-\-encoding\fR|\fB\-e\fR \fIbp\fR | \fIbson\fR]
\fIDAGMan output file\fR
.fi
.SH "DESCRIPTION"
.sp
This program follows a workflow, parsing the output of DAGMAN\(cqs dagman\&.out file\&. In addition to generating the jobstate\&.log file, \fBpegasus\-monitord\fR can also be used mine information from the workflow dag file and jobs\*(Aq submit and output files, and either populate a database or write a NetLogger events file with that information\&. \fBpegasus\-monitord\fR can also perform notifications when tracking a workflow\(cqs progress in real\-time\&.
.SH "OPTIONS"
.PP
\fB\-h\fR, \fB\-\-help\fR
.RS 4
Prints a usage summary with all the available command\-line options\&.
.RE
.PP
\fB\-v\fR, \fB\-\-verbose\fR
.RS 4
Sets the log level for
\fBpegasus\-monitord\fR\&. If omitted, the default
\fIlevel\fR
will be set to
\fIWARNING\fR\&. When this option is given, the log level is changed to
\fIINFO\fR\&. If this option is repeated, the log level will be changed to
\fIDEBUG\fR\&.
.sp
The log level in
\fBpegasus\-monitord\fR
can also be adjusted interactively, by sending the
\fIUSR1\fR
and
\fIUSR2\fR
signals to the process, respectively for incrementing and decrementing the log level\&.
.RE
.PP
\fB\-a\fR \fIi\fR, \fB\-\-adjust\fR \fIi\fR
.RS 4
For adjusting time zone differences by
\fIi\fR
seconds, default is 0\&.
.RE
.PP
\fB\-N\fR, \fB\-\-foreground\fR
.RS 4
Do not daemonize
\fBpegasus\-monitord\fR, go through the motions as if (Condor)\&.
.RE
.PP
\fB\-n\fR, \fB\-\-no\-daemon\fR
.RS 4
Do not daemonize
\fBpegasus\-monitord\fR, keep it in the foreground (for debugging)\&.
.RE
.PP
\fB\-j\fR \fIjobstate\&.log file\fR, \fB\-\-job\fR \fIjobstate\&.log file\fR
.RS 4
Alternative location for the
\fIjobstate\&.log\fR
file\&. The default is to write a
\fIjobstate\&.log\fR
in the workflow directory\&. An absolute file name should only be used if the workflow does not have any sub\-workflows, as each sub\-workflow will generate its own
\fIjobstate\&.log\fR
file\&. If an alternative, non\-absolute, filename is given with this option,
\fBpegasus\-monitord\fR
will create one file in each workflow (and sub\-workflow) directory with the filename provided by the user with this option\&. If an absolute filename is provided and sub\-workflows are found, a warning message will be printed and
\fBpegasus\-monitord\fR
will not track any sub\-workflows\&.
.RE
.PP
\fB\-\-log\fR \fIlogfile\fR, \fB\-\-log\-file\fR \fIlogfile\fR
.RS 4
Specifies an alternative
\fIlogfile\fR
to use instead of the
\fImonitord\&.log\fR
file in the main workflow directory\&. Differently from the
\fIjobstate\&.log\fR
file above,
\fBpegasus\-monitord\fR
only generates one
\fIlogfile\fR
per execution (and not one per workflow and sub\-workflow it tracks)\&.
.RE
.PP
\fB\-\-conf\fR \fIproperties_file\fR
.RS 4
is an alternative file containing properties in the
\fIkey=value\fR
format, and allows users to override values read from the
\fIbraindump\&.txt\fR
file\&. This option has precedence over the properties file specified in the
\fIbraindump\&.txt\fR
file\&. Please note that these properties will apply not only to the main workflow, but also to all sub\-workflows found\&.
.RE
.PP
\fB\-\-no\-recursive\fR
.RS 4
This options disables
\fBpegasus\-monitord\fR
to automatically follow any sub\-workflows that are found\&.
.RE
.PP
\fB\-\-nodatabase\fR, \fB\-\-no\-database\fR, \fB\-\-no\-events\fR
.RS 4
Turns off generating events (when this option is given,
\fBpegasus\-monitord\fR
will only generate the jobstate\&.log file)\&. The default is to automatically log information to a SQLite database (see the
\fB\-\-dest\fR
option below for more details)\&. This option overrides any parameter given by the
\fB\-\-dest\fR
option\&.
.RE
.PP
\fB\-r\fR, \fB\-\-replay\fR
.RS 4
This option is used to replay the output of an already finished workflow\&. It should only be used after the workflow is finished (not necessarily successfully)\&. If a
\fIjobstate\&.log\fR
file is found, it will be rotated\&. However, when using a database, all previous references to that workflow (and all its sub\-workflows) will be erased from it\&. When outputing to a bp file, the file will be deleted\&. When running in replay mode,
\fBpegasus\-monitord\fR
will always run with the
\fB\-\-no\-daemon\fR
option, and any errors will be output directly to the terminal\&. Also,
\fBpegasus\-monitord\fR
will not process any notifications while in replay mode\&.
.RE
.PP
\fB\-\-no\-notifications\fR
.RS 4
This options disables notifications completely, making
\fBpegasus\-monitord\fR
ignore all the \&.notify files for all workflows it tracks\&.
.RE
.PP
\fB\-\-notifications\-max\fR \fImax_notifications\fR
.RS 4
This option sets the maximum number of concurrent notifications that
\fBpegasus\-monitord\fR
will start\&. When the
\fImax_notifications\fR
limit is reached,
\fBpegasus\-monitord\fR
will queue notifications and wait for a pending notification script to finish before starting a new one\&. If
\fImax_notifications\fR
is set to 0, notifications will be disabled\&.
.RE
.PP
\fB\-\-notifications\-timeout\fR \fItimeout\fR
.RS 4
Normally,
\fBpegasus\-monitord\fR
will start a notification script and wait indefinitely for it to finish\&. This option allows users to set up a maximum
\fItimeout\fR
that
\fBpegasus\-monitord\fR
will wait for a notification script to finish before terminating it\&. If notification scripts do not finish in a reasonable amount of time, it can cause other notification scripts to be queued due to the maximum number of concurrent scripts allowed by
\fBpegasus\-monitord\fR\&. Additionally, until all notification scripts finish,
\fBpegasus\-monitord\fR
will not terminate\&.
.RE
.PP
\fB\-s\fR \fImillisleep\fR, \fB\-\-sim\fR \fImillisleep\fR
.RS 4
This option simulates delays between reads, by sleeping
\fImillisleep\fR
milliseconds\&. This option is mainly used by developers\&.
.RE
.PP
\fB\-\-db\-stats\fR
.RS 4
This option causes the database module to collect and print database statistics at the end of the execution\&. It has no effect if the
\fB\-\-no\-database\fR
option is given\&.
.RE
.PP
\fB\-\-skip\-stdout\fR
.RS 4
This option causes
\fBpegasus\-monitord\fR
not to populate jobs\*(Aq stdout and stderr into the BP file or the Stampede database\&. It should be used to avoid increasing the database size substantially in cases where jobs are very verbose in their output\&.
.RE
.PP
\fB\-f\fR, \fB\-\-force\fR
.RS 4
This option causes
\fBpegasus\-monitord\fR
to skip checking for another instance of itself already running on the same workflow directory\&. The default behavior prevents two or more
\fBpegasus\-monitord\fR
instances from starting and running simultaneously (which would cause the bp file and database to be left in an unstable state)\&. This option should noly be used when the user knows the previous instance of
\fBpegasus\-monitord\fR
is
\fBNOT\fR
running anymore\&.
.RE
.PP
\fB\-\-socket\fR
.RS 4
This option causes
\fBpegasus\-monitord\fR
to start a socket interface that can be used for advanced debugging\&. The port number for connecting to
\fBpegasus\-monitord\fR
can be found in the
\fImonitord\&.sock\fR
file in the workflow directory (the file is deleted when
\fBpegasus\-monitord\fR
finishes)\&. If not already started, the socket interface is also created when
\fBpegasus\-monitord\fR
receives a
\fIUSR1\fR
signal\&.
.RE
.PP
\fB\-o\fR \fIdir\fR, \fB\-\-ouput\-dir\fR \fIdir\fR
.RS 4
When this option is given,
\fBpegasus\-monitord\fR
will create all its output files in the directory specified by
\fIdir\&.\fR
This option is useful for allowing a user to debug a workflow in a directory the user does not have write permissions\&. In this case, all files generated by
\fBpegasus\-monitord\fR
will have the workflow
\fIwf_uuid\fR
as a prefix so that files from multiple sub\-workflows can be placed in the same directory\&. This option is mainly used by
\fBpegasus\-analyzer\fR\&. It is important to note that the location for the output BP file or database is not changed by this option and should be set via the
\fB\-\-dest\fR
option\&.
.RE
.PP
\fB\-d\fR \fIURL\fR \fIparams\fR, \fB\-\-dest\fR \fIURL\fR \fIparams\fR
.RS 4
This option allows users to specify the destination for the log events generated by
\fBpegasus\-monitord\fR\&. If this option is omitted,
\fBpegasus\-monitord\fR
will create a SQLite database in the workflow\(cqs run directory with the same name as the workflow, but with a
\fI\&.stampede\&.db\fR
prefix\&. For an
\fIempty\fR
scheme,
\fIparams\fR
are a file path with
\fB\-\fR
meaning standard output\&. For a
\fIx\-tcp\fR
scheme,
\fIparams\fR
are
\fITCP_host[:port=14380]\fR\&. For a database scheme,
\fIparams\fR
are a
\fISQLAlchemy engine URL\fR
with a database connection string that can be used to specify different database engines\&. Please see the examples section below for more information on how to use this option\&. Note that when using a database engine other than
\fBsqlite\fR, the necessary Python database drivers will need to be installed\&.
.RE
.PP
\fB\-e\fR \fIencoding\fR, \fB\-\-encoding\fR \fIencoding\fR
.RS 4
This option specifies how to encode log events\&. The two available possibilities are
\fIbp\fR
and
\fIbson\fR\&. If this option is not specified, events will be generated in the
\fIbp\fR
format\&.
.RE
.PP
\fIDAGMan_output_file\fR
.RS 4
The
\fIDAGMan_output_file\fR
is the only requires command\-line argument in
\fBpegasus\-monitord\fR
and must have the
\fI\&.dag\&.dagman\&.out\fR
extension\&.
.RE
.SH "RETURN VALUE"
.sp
If the plan could be constructed, \fBpegasus\-monitord\fR returns with an exit code of 0\&. However, in case of error, a non\-zero exit code indicates problems\&. In that case, the \fIlogfile\fR should contain additional information about the error condition\&.
.SH "ENVIRONMENT VARIABLES"
.sp
\fBpegasus\-monitord\fR does not require that any environmental variables be set\&. It locates its required Python modules based on its own location, and therefore should not be moved outside of Pegasus\*(Aq bin directory\&.
.SH "EXAMPLES"
.sp
Usually, \fBpegasus\-monitord\fR is invoked automatically by \fBpegasus\-run\fR and tracks the workflow progress in real\-time, producing the \fIjobstate\&.log\fR file and a corresponding SQLite database\&. When a workflow fails, and is re\-submitted with a rescue DAG, \fBpegasus\-monitord\fR will automatically pick up from where it left previously and continue the \fIjobstate\&.log\fR file and the database\&.
.sp
If users need to create the \fIjobstate\&.log\fR file after a workflow is already finished, the \fB\-\-replay | \-r\fR option should be used when running \fBpegasus\-monitord\fR manually\&. For example:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus_monitord \-r diamond\-0\&.dag\&.dagman\&.out
.fi
.if n \{\
.RE
.\}
.sp
will launch \fBpegasus\-monitord\fR in replay mode\&. In this case, if a \fIjobstate\&.log\fR file already exists, it will be rotated and a new file will be created\&. If a \fIdiamond\-0\&.stampede\&.db\fR SQLite database already exists, \fBpegasus\-monitord\fR will purge all references to the workflow id specified in the \fIbraindump\&.txt\fR file, including all sub\-workflows associated with that workflow id\&.
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus_monitord \-r \-\-no\-database diamond\-0\&.dag\&.dagman\&.out
.fi
.if n \{\
.RE
.\}
.sp
will do the same thing, but without generating any log events\&.
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus_monitord \-r \-\-dest `pwd`/diamond\-0\&.bp diamond\-0\&.dag\&.dagman\&.out
.fi
.if n \{\
.RE
.\}
.sp
will create the file \fIdiamond\-0\&.bp\fR in the current directory, containing NetLogger events with all the workflow data\&. This is in addition to the \fIjobstate\&.log\fR file\&.
.sp
For using a database, users should provide a database connection string in the format of:
.sp
.if n \{\
.RS 4
.\}
.nf
dialect://username:password@host:port/database
.fi
.if n \{\
.RE
.\}
.sp
Where \fIdialect\fR is the name of the underlying driver (\fImysql\fR, \fIsqlite\fR, \fIoracle\fR, \fIpostgres\fR) and \fIdatabase\fR is the name of the database running on the server at the \fIhost\fR computer\&.
.sp
If users want to use a different \fISQLite\fR database, \fBpegasus\-monitord\fR requires them to specify the absolute path of the alternate file\&. For example:
.sp
.if n \{\
.RS 4
.\}
.nf
$ pegasus_monitord \-r \-\-dest sqlite:////home/user/diamond_database\&.db diamond\-0\&.dag\&.dagman\&.out
.fi
.if n \{\
.RE
.\}
.sp
Here are docs with details for all of the supported drivers: \m[blue]\fBhttp://www\&.sqlalchemy\&.org/docs/05/reference/dialects/index\&.html\fR\m[]
.sp
Additional per\-database options that work into the connection strings are outlined there\&.
.sp
It is important to note that one will need to have the appropriate db interface library installed\&. Which is to say, \fISQLAlchemy\fR is a wrapper around the mysql interface library (for instance), it does not provide a \fIMySQL\fR driver itself\&. The \fBPegasus\fR distribution includes both \fBSQLAlchemy\fR and the \fBSQLite\fR Python driver\&.
.sp
As a final note, it is important to mention that unlike when using \fISQLite\fR databases, using \fBSQLAlchemy\fR with other database servers, e\&.g\&. \fIMySQL\fR or \fIPostgres\fR, the target database needs to exist\&. So, if a user wanted to connect to:
.sp
.if n \{\
.RS 4
.\}
.nf
mysql://pegasus\-user:supersecret@localhost:localport/diamond
.fi
.if n \{\
.RE
.\}
.sp
it would need to first connect to the server at \fIlocalhost\fR and issue the appropriate create database command before running \fBpegasus\-monitord\fR as \fBSQLAlchemy\fR will take care of creating the tables and indexes if they do not already exist\&.
.SH "SEE ALSO"
.sp
pegasus\-run(1)
.SH "AUTHORS"
.sp
Gaurang Mehta
.sp
Fabio Silva
.sp
Karan Vahi
.sp
Jens\-S\&. Vöckler
.sp
Pegasus Team \m[blue]\fBhttp://pegasus\&.isi\&.edu\fR\m[]