Name¶
condor_dagman meta - scheduler of the jobs submitted as the nodes of a DAG or
DAGs
Synopsis¶
condor_dagman -f-t-l .-help
condor_dagman-version
condor_dagman-f-l .[-debug level] [-maxidle numberOfJobs] [-maxjobs
numberOfJobs] [-maxpre NumberOfPREscripts] [-maxpost NumberOfPOSTscripts]
[-noeventchecks] [-allowlogerror] [-usedagdir] -lockfile
filename[-waitfordebug] [-autorescue 0|1] [-dorescuefrom number] -csdversion
version_string[-allowversionmismatch] [-DumpRescue] [-verbose] [-force]
[-notification value] [-suppress_notification] [-dont_suppress_notification]
[-dagman DagmanExecutable] [-outfile_dir directory] [-update_submit]
[-import_env] [-DontAlwaysRunPost] -dag dag_file[-dag dag_file_2... -dag
dag_file_n]
Description¶
condor_dagman is a meta scheduler for the HTCondor jobs within a DAG (directed
acyclic graph) (or multiple DAGs). In typical usage, a submitter of jobs that
are organized into a DAG submits the DAG using condor_submit_dag.
condor_submit_dagdoes error checking on aspects of the DAG and then submits
condor_dagman as an HTCondor job. condor_dagman uses log files to coordinate
the further submission of the jobs within the DAG.
All command line arguments to the DaemonCorelibrary functions work for
condor_dagman. When invoked from the command line, condor_dagmanrequires the
arguments -f -l .to appear first on the command line, to be processed by
DaemonCore. The -targument must also be present for the -helpoption, such that
output is sent to the terminal.
Arguments to condor_dagmanare either automatically set by condor_submit_dagor
they are specified as command-line arguments to condor_submit_dagand passed on
to condor_dagman. The method by which the arguments are set is given in their
description below.
condor_dagmancan run multiple, independent DAGs. This is done by specifying
multiple -dag arguments. Pass multiple DAG input files as command-line
arguments to condor_submit_dag.
Debugging output may be obtained by using the -debug leveloption. Level values
and what they produce is described as
-
- * level = 0; never produce output, except for usage info
-
- * level = 1; very quiet, output severe errors
-
- * level = 2; normal output, errors and warnings
-
- * level = 3; output errors, as well as all warnings
-
- * level = 4; internal debugging output
-
- * level = 5; internal debugging output; outer loop debugging
-
- * level = 6; internal debugging output; inner loop debugging; output DAG
input file lines as they are parsed
-
- * level = 7; internal debugging output; rarely used; output DAG input file
lines as they are parsed
Options¶
-debug level
-
- An integer level of debugging output. levelis an integer, with values of
0-7 inclusive, where 7 is the most verbose output. This command-line
option to condor_submit_dagis passed to condor_dagman or defaults to the
value 3.
-
-maxidle NumberOfJobs
-
- Sets the maximum number of idle jobs allowed before condor_dagman stops
submitting more jobs. If DAG nodes have a cluster with more than one job
in it, each job in the cluster is counted individually. Once idle jobs
start to run, condor_dagman will resume submitting jobs. NumberOfJobsis a
positive integer. This command-line option to condor_submit_dagis passed
to condor_dagman . If not specified, the number of idle jobs is unlimited.
Note that nothing special is done to the submit description file. Setting
queue 5000 in the submit description file, where -maxidleis set to 250
will result in a cluster of 5000 new jobs being submitted to the
condor_schedd. In this case, condor_dagman will resume submitting jobs
when the number of idle jobs falls below 250.
-
-maxjobs numberOfJobs
-
- Sets the maximum number of clusters within the DAG that will be submitted
to HTCondor at one time. numberOfJobsis a positive integer. This
command-line option to condor_submit_dagis passed to condor_dagman . If
not specified, the default number of clusters is unlimited. If a cluster
contains more than one job, only the cluster is counted for purposes of
maxjobs.
-
-maxpre NumberOfPREscripts
-
- Sets the maximum number of PRE scripts within the DAG that may be running
at one time. NumberOfPREScriptsis a positive integer. This command-line
option to condor_submit_dagis passed to condor_dagman . If not specified,
the default number of PRE scripts is unlimited.
-
-maxpost NumberOfPOSTscripts
-
- Sets the maximum number of POST scripts within the DAG that may be running
at one time. NumberOfPOSTScriptsis a positive integer. This command-line
option to condor_submit_dagis passed to condor_dagman . If not specified,
the default number of POST scripts is unlimited.
-
-noeventchecks
-
- This argument is no longer used; it is now ignored. Its functionality is
now implemented by the DAGMAN_ALLOW_EVENTS configuration variable.
-
-allowlogerror
-
- This optional argument has condor_dagman try to run the specified DAG,
even in the case of detected errors in the job event log specification. As
of version 7.3.2, this argument has an effect only on DAGs containing
Stork job nodes.
-
-usedagdir
-
- This optional argument causes condor_dagman to run each specified DAG as
if the directory containing that DAG file was the current working
directory. This option is most useful when running multiple DAGs in a
single condor_dagman .
-
-lockfile filename
-
- Names the file created and used as a lock file. The lock file prevents
execution of two of the same DAG, as defined by a DAG input file. A
default lock file ending with the suffix .dag.lock is passed to
condor_dagman by condor_submit_dag.
-
-waitfordebug
-
- This optional argument causes condor_dagman to wait at startup until
someone attaches to the process with a debugger and sets the
wait_for_debug variable in main_init() to false.
-
-autorescue 0|1
-
- Whether to automatically run the newest rescue DAG for the given DAG file,
if one exists (0 = false , 1 = true ).
-
-dorescuefrom number
-
- Forces condor_dagman to run the specified rescue DAG number for the given
DAG. A value of 0 is the same as not specifying this option. Specifying a
nonexistent rescue DAG is a fatal error.
-
-csdversion version_string
-
- version_stringis the version of the condor_submit_dagprogram. At startup,
condor_dagman checks for a version mismatch with the
condor_submit_dagversion in this argument.
-
-allowversionmismatch
-
- This optional argument causes condor_dagman to allow a version mismatch
between condor_dagman itself and the .condor.sub file produced by
condor_submit_dag(or, in other words, between condor_submit_dagand
condor_dagman ). WARNING! This option should be used only if absolutely
necessary. Allowing version mismatches can cause subtle problems when
running DAGs. (Note that, starting with version 7.4.0, condor_dagman no
longer requires an exact version match between itself and the .condor.sub
file. Instead, a "minimum compatible version" is defined, and
any .condor.sub file of that version or newer is accepted.)
-
-DumpRescue
-
- This optional argument causes condor_dagman to immediately dump a Rescue
DAG and then exit, as opposed to actually running the DAG. This feature is
mainly intended for testing. The Rescue DAG file is produced whether or
not there are parse errors reading the original DAG input file. The name
of the file differs if there was a parse error.
-
-verbose
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) Cause condor_submit_dagto
give verbose error messages.
-
-force
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) Require
condor_submit_dagto overwrite the files that it produces, if the files
already exist. Note that dagman.out will be appended to, not overwritten.
If new-style rescue DAG mode is in effect, and any new-style rescue DAGs
exist, the -forceflag will cause them to be renamed, and the original DAG
will be run. If old-style rescue DAG mode is in effect, any existing
old-style rescue DAGs will be deleted, and the original DAG will be run.
See the HTCondor manual section on Rescue DAGs for more information.
-
-notification value
-
- This argument is only included to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs. Sets the e-mail
notification for DAGMan itself. This information will be used within the
HTCondor submit description file for DAGMan. This file is produced by
condor_submit_dag. The notificationoption is described in the
condor_submitmanual page.
-
-dagman DagmanExecutable
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) Allows the specification
of an alternate condor_dagman executable to be used instead of the one
found in the user's path. This must be a fully qualified path.
-
-outfile_dir directory
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) Specifies the directory
in which the .dagman.out file will be written. The directorymay be
specified relative to the current working directory as condor_submit_dagis
executed, or specified with an absolute path. Without this option, the
.dagman.out file is placed in the same directory as the first DAG input
file listed on the command line.
-
-update_submit
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) This optional argument
causes an existing .condor.sub file to not be treated as an error; rather,
the .condor.sub file will be overwritten, but the existing values of
-maxjobs, -maxidle, -maxpre, and -maxpostwill be preserved.
-
-import_env
-
- (This argument is included only to be passed to condor_submit_dagif lazy
submit file generation is used for nested DAGs.) This optional argument
causes condor_submit_dagto import the current environment into the
environmentcommand of the .condor.sub file it generates.
-
-dag filename
-
- filenameis the name of the DAG input file that is set as an argument to
condor_submit_dag, and passed to condor_dagman .
-
-DontAlwaysRunPost
-
- This option causes condor_dagman to observe the exit status of the PRE
script when deciding whether or not to run the POST script. Versions of
condor_dagman previous to HTCondor version 7.7.2 would not run the POST
script if the PRE script exited with a nonzero status, but this default
has been changed such that the POST script will run, regardless of the
exit status of the PRE script. Using this option restores the previous
behavior, in which condor_dagman will not run the POST script if the PRE
script fails.
-
-suppress_notification
-
- Causes jobs submitted by condor_dagman to not send email notification for
events. The same effect can be achieved by setting the configuration
variable DAGMAN_SUPPRESS_NOTIFICATION to True . This command line option
is independent of the -notificationcommand line option, which controls
notification for the condor_dagman job itself. This flag is generally
superfluous, as DAGMAN_SUPPRESS_NOTIFICATION defaults to True .
-
-dont_suppress_notification
-
- Causes jobs submitted by condor_dagman to defer to content within the
submit description file when deciding to send email notification for
events. The same effect can be achieved by setting the configuration
variable DAGMAN_SUPPRESS_NOTIFICATION to False . This command line flag is
independent of the -notificationcommand line option, which controls
notification for the condor_dagman job itself. If both
-dont_suppress_notificationand -suppress_notificationare specified within
the same command line, the last argument is used.
-
-
Exit Status¶
condor_dagmanwill exit with a status value of 0 (zero) upon success, and it will
exit with the value 1 (one) upon failure.
Examples¶
condor_dagmanis normally not run directly, but submitted as an HTCondor job by
running condor_submit_dag. See the condor_submit_dag manual page for examples.
Author¶
Center for High Throughput Computing, University of Wisconsin-Madison
Copyright¶
Copyright (C) 1990-2014 Center for High Throughput Computing, Computer Sciences
Department, University of Wisconsin-Madison, Madison, WI. All Rights Reserved.
Licensed under the Apache License, Version 2.0.