table of contents
PEGASUS-MONITORD(1) | Pegasus Manual | PEGASUS-MONITORD(1) |
NAME¶
pegasus-monitord - tracks a workflow progress, mining informationSYNOPSIS¶
pegasus-monitord [--help|-help] [--verbose|-v] [ --adjust|-a i] [--foreground|-N] [ --no-daemon|-n] [--job|-j jobstate.log file] [ --log|-l logfile] [--conf properties file] [ --no-recursive] [--no-database | --no-events] [ --replay|-r] [--no-notifications] [ --notifications-max max_notifications] [ --notifications-timeout timeout] [ --sim|-s millisleep] [--db-stats] [ --skip-stdout] [--force|-f] [ --socket] [--output-dir | -o dir] [ --dest|-d PATH or URL] [--encoding|-e bp | bson] DAGMan output file
DESCRIPTION¶
This program follows a workflow, parsing the output of DAGMAN’s dagman.out file. In addition to generating the jobstate.log file, pegasus-monitord can also be used mine information from the workflow dag file and jobs' submit and output files, and either populate a database or write a NetLogger events file with that information. pegasus-monitord can also perform notifications when tracking a workflow’s progress in real-time.OPTIONS¶
-h, --helpPrints a usage summary with all the available
command-line options.
-v, --verbose
Sets the log level for pegasus-monitord. If
omitted, the default level will be set to WARNING. When this
option is given, the log level is changed to INFO. If this option is
repeated, the log level will be changed to DEBUG.
The log level in pegasus-monitord can also be adjusted interactively, by
sending the USR1 and USR2 signals to the process, respectively
for incrementing and decrementing the log level.
-a i, --adjust i
For adjusting time zone differences by i seconds,
default is 0.
-N, --foreground
Do not daemonize pegasus-monitord, go through the
motions as if (Condor).
-n, --no-daemon
Do not daemonize pegasus-monitord, keep it in the
foreground (for debugging).
-j jobstate.log file, --job jobstate.log file
Alternative location for the jobstate.log file.
The default is to write a jobstate.log in the workflow directory. An
absolute file name should only be used if the workflow does not have any
sub-workflows, as each sub-workflow will generate its own jobstate.log
file. If an alternative, non-absolute, filename is given with this option,
pegasus-monitord will create one file in each workflow (and
sub-workflow) directory with the filename provided by the user with this
option. If an absolute filename is provided and sub-workflows are found, a
warning message will be printed and pegasus-monitord will not track any
sub-workflows.
--log logfile, --log-file logfile
Specifies an alternative logfile to use instead of
the monitord.log file in the main workflow directory. Differently from
the jobstate.log file above, pegasus-monitord only generates one
logfile per execution (and not one per workflow and sub-workflow it
tracks).
--conf properties_file
is an alternative file containing properties in the
key=value format, and allows users to override values read from the
braindump.txt file. This option has precedence over the properties file
specified in the braindump.txt file. Please note that these properties
will apply not only to the main workflow, but also to all sub-workflows
found.
--no-recursive
This options disables pegasus-monitord to
automatically follow any sub-workflows that are found.
--nodatabase, --no-database, --no-events
Turns off generating events (when this option is given,
pegasus-monitord will only generate the jobstate.log file). The default
is to automatically log information to a SQLite database (see the
--dest option below for more details). This option overrides any
parameter given by the --dest option.
-r, --replay
This option is used to replay the output of an already
finished workflow. It should only be used after the workflow is finished (not
necessarily successfully). If a jobstate.log file is found, it will be
rotated. However, when using a database, all previous references to that
workflow (and all its sub-workflows) will be erased from it. When outputing to
a bp file, the file will be deleted. When running in replay mode,
pegasus-monitord will always run with the --no-daemon option,
and any errors will be output directly to the terminal. Also,
pegasus-monitord will not process any notifications while in replay
mode.
--no-notifications
This options disables notifications completely, making
pegasus-monitord ignore all the .notify files for all workflows it
tracks.
--notifications-max max_notifications
This option sets the maximum number of concurrent
notifications that pegasus-monitord will start. When the
max_notifications limit is reached, pegasus-monitord will queue
notifications and wait for a pending notification script to finish before
starting a new one. If max_notifications is set to 0, notifications
will be disabled.
--notifications-timeout timeout
Normally, pegasus-monitord will start a
notification script and wait indefinitely for it to finish. This option allows
users to set up a maximum timeout that pegasus-monitord will
wait for a notification script to finish before terminating it. If
notification scripts do not finish in a reasonable amount of time, it can
cause other notification scripts to be queued due to the maximum number of
concurrent scripts allowed by pegasus-monitord. Additionally, until all
notification scripts finish, pegasus-monitord will not terminate.
-s millisleep, --sim millisleep
This option simulates delays between reads, by sleeping
millisleep milliseconds. This option is mainly used by
developers.
--db-stats
This option causes the database module to collect and
print database statistics at the end of the execution. It has no effect if the
--no-database option is given.
--skip-stdout
This option causes pegasus-monitord not to
populate jobs' stdout and stderr into the BP file or the Stampede database. It
should be used to avoid increasing the database size substantially in cases
where jobs are very verbose in their output.
-f, --force
This option causes pegasus-monitord to skip
checking for another instance of itself already running on the same workflow
directory. The default behavior prevents two or more pegasus-monitord
instances from starting and running simultaneously (which would cause the bp
file and database to be left in an unstable state). This option should noly be
used when the user knows the previous instance of pegasus-monitord is
NOT running anymore.
--socket
This option causes pegasus-monitord to start a
socket interface that can be used for advanced debugging. The port number for
connecting to pegasus-monitord can be found in the monitord.sock
file in the workflow directory (the file is deleted when
pegasus-monitord finishes). If not already started, the socket
interface is also created when pegasus-monitord receives a USR1
signal.
-o dir, --ouput-dir dir
When this option is given, pegasus-monitord will
create all its output files in the directory specified by dir. This
option is useful for allowing a user to debug a workflow in a directory the
user does not have write permissions. In this case, all files generated by
pegasus-monitord will have the workflow wf_uuid as a prefix so
that files from multiple sub-workflows can be placed in the same directory.
This option is mainly used by pegasus-analyzer. It is important to note
that the location for the output BP file or database is not changed by this
option and should be set via the --dest option.
-d URL params, --dest URL params
This option allows users to specify the destination for
the log events generated by pegasus-monitord. If this option is
omitted, pegasus-monitord will create a SQLite database in the
workflow’s run directory with the same name as the workflow, but with a
.stampede.db prefix. For an empty scheme, params are a
file path with - meaning standard output. For a x-tcp scheme,
params are TCP_host[:port=14380]. For a database scheme,
params are a SQLAlchemy engine URL with a database connection
string that can be used to specify different database engines. Please see the
examples section below for more information on how to use this option. Note
that when using a database engine other than sqlite, the necessary
Python database drivers will need to be installed.
-e encoding, --encoding encoding
This option specifies how to encode log events. The two
available possibilities are bp and bson. If this option is not
specified, events will be generated in the bp format.
DAGMan_output_file
The DAGMan_output_file is the only requires
command-line argument in pegasus-monitord and must have the
.dag.dagman.out extension.
RETURN VALUE¶
If the plan could be constructed, pegasus-monitord returns with an exit code of 0. However, in case of error, a non-zero exit code indicates problems. In that case, the logfile should contain additional information about the error condition.ENVIRONMENT VARIABLES¶
pegasus-monitord does not require that any environmental variables be set. It locates its required Python modules based on its own location, and therefore should not be moved outside of Pegasus' bin directory.EXAMPLES¶
Usually, pegasus-monitord is invoked automatically by pegasus-run and tracks the workflow progress in real-time, producing the jobstate.log file and a corresponding SQLite database. When a workflow fails, and is re-submitted with a rescue DAG, pegasus-monitord will automatically pick up from where it left previously and continue the jobstate.log file and the database. If users need to create the jobstate.log file after a workflow is already finished, the --replay | -r option should be used when running pegasus-monitord manually. For example:$ pegasus_monitord -r diamond-0.dag.dagman.out
$ pegasus_monitord -r --no-database diamond-0.dag.dagman.out
$ pegasus_monitord -r --dest `pwd`/diamond-0.bp diamond-0.dag.dagman.out
$ pegasus_monitord -r --dest sqlite:////home/user/diamond_database.db diamond-0.dag.dagman.out
SEE ALSO¶
pegasus-run(1)AUTHORS¶
Gaurang Mehta <gmehta at isi dot edu> Fabio Silva <fabio at isi dot edu> Karan Vahi <vahi at isi dot edu> Jens-S. Vöckler <voeckler at isi dot edu> Pegasus Team http://pegasus.isi.edu07/30/2014 | Pegasus 4.4.0 |