NAME¶
pbs_mom - start a pbs batch execution mini-server
SYNOPSIS¶
pbs_mom [-a alarm] [-C chkdirectory] [-c config]
[-d directory] [-H hostname] [-L logfile] [-M MOMport]
[-R RPPport] [-p|-q|-r] [-x]
DESCRIPTION¶
The
pbs_mom command starts the operation of a batch
Machine
Oriented
Mini-server, MOM, on the local host. Typically, this
command will be in a local boot file such as
/etc/rc.local . To insure
that the pbs_mom command is not runnable by the general user community, the
server will only execute if its real and effective uid is zero.
One function of pbs_mom is to place jobs into execution as directed by the
server, establish resource usage limits, monitor the job's usage, and notify
the server when the job completes. If they exist, pbs_mom will execute a
prologue script before executing a job and an epilogue script after executing
the job. The next function of pbs_mom is to respond to resource monitor
requests. This was done by a separate process in previous versions of PBS but
has now been combined into one process. The resource monitor function is
provided mainly for the PBS scheduler. It provides information about the
status of running jobs, memory available etc. The next function of pbs_mom is
to respond to task manager requests. This involves communicating with running
tasks over a tcp socket as well as communicating with other MOMs within a job
(aka a "sisterhood").
Pbs_mom will record a diagnostic message in a log file for any error occurrence.
The log files are maintained in the
mom_logs directory below the home
directory of the server. If the log file cannot be opened, the diagnostic
message is written to the system console.
OPTIONS¶
- -a alarm
- Used to specify the alarm timeout in seconds for computing
a resource. Every time a resource request is processed, an alarm is set
for the given amount of time. If the request has not completed before the
given time, an alarm signal is generated. The default is 5 seconds.
- -C chkdirectory
- Specifieds the path of the directory used to hold
checkpoint files. [Currently this is only valid on Cray systems.] The
default directory is PBS_HOME/spool/checkpoint, see the -d option. The
directory specified with the -C option must be owned by root and
accessible (rwx) only by root to protect the security of the checkpoint
files.
- -c config
- Specify a alternative configuration file, see description
below. If this is a relative file name it will be relative to
PBS_HOME/mom_priv, see the -d option. If the specified file cannot be
opened, pbs_mom will abort. If the -c option is not supplied, pbs_mom will
attempt to open the default
configuration file "config" in PBS_HOME/mom_priv. If this file is
not present, pbs_mom will log the fact and continue.
- -H hostname
- Set MOM's hostname. This can be useful on multi-homed
networks.
- -d directory
- Specifies the path of the directory which is the home of
the servers working files, PBS_HOME. This option is typically used along
with -M when debugging MOM. The default directory is given by
$PBS_SERVER_HOME which is typically
- -L logfile
- Specify an absolute path name for use as the log file. If
not specified, MOM will open a file named for the current date in the
PBS_HOME/mom_logs directory, see the option.
- -M port
- Specifies the port number on which the mini-server (MOM)
will listen for batch requests.
- -R port
- Specifies the port number on which the mini-server (MOM)
will listen for resource monitor requests, task manager requests and
inter-MOM messages. Both a UDP and a TCP port of this number will be
used.
- -p
- (Default after version 2.4.0) (Preserve running jobs) --
Specifies the impact on jobs which were in execution when the mini-server
shut-down. The -p option tries to preserve any running jobs when the MOM
restarts. The new mini-server will not be the parent of any running jobs,
MOM has lost control of her offspring (not a new situation for a mother).
The MOM will allow the jobs to continue to run and monitor them indirectly
via polling. All recovered jobs will report an exit code of 0 when they
are complete. The -p option is mutually exclusive with the -r, -P and -q
options.
- -P
- (Terminate all jobs and remove them from the queue) --
Specifies the impact on jobs which were in execution when the mini-server
shut-down. With the -P option, it is assumed that either the entire system
has been restarted or the MOM has been down so long that it can no longer
guarantee that the pid of any running process is the same as the recorded
job process pid of a recovering job. Unlike the -p option no attempt is
made to try and preserve or recover running jobs. All jobs are terminated
and removed from the queue. The -q option is mutually exclusive with the
-p, -q and -r options.
- -q
- (Requeue all jobs - This is the default behavior in
versions prior to 2.4.0) -- Specifies the impact on jobs which were in
execution when the mini-servershut-down. Do not terminate running
processes. With the -q option, it is assumed that either the entire system
has been restarted or the MOM has been down so long that it can no longer
guarantee that the pid of any running process is the same as the recorded
job process pid of a recovering job. No attempt is made to kill job
processes. The MOM will mark the jobs as terminated and notify the batch
server which owns the job. Re-runnable jobs will be requeued. The -q
option is mutually exclusive with the -p, -P and -r options.
- -r
- (Terminate running processes and requeue all jobs) --
Specifies the impact on jobs which were in execution when the mini-server
shut-down. With the -r option, MOM will kill any processes belonging to
running jobs, mark the jobs as terminated and notify the batch server that
owns the job. Re-runnable jobs are reset to a queued state so they can be
run again. The -r option is mutually exclusive with the -p, -P and -q
options.
- If the -r option is used following a reboot, process IDs
(pids) may be reused and MOM may kill a process that is not a batch
session.
- -S port
- Specifies the port number on which the pbs_server is
listening for requests. If pbs_server is started with a -p option, pbs_mom
will need to use the -S option and match the port value which was used to
start pbs_server.
- -x
- Disables the check for privileged port resource monitor
connections. This is used mainly for testing since the privileged port is
the only mechanism used to prevent any ordinary user from connecting.
CONFIGURATION FILE¶
The configuration file may be specified on the command line at program start
with the -c flag. The use of this file is to provide several types of run time
information to pbs_mom: static resource names and values, external resources
provided by a program to be run on request via a shell escape, and values to
pass to internal set up functions at initialization (and re-initialization).
Each item type is on a single line with the component parts separated by white
space. If the line starts with a hash mark (pound sign, #), the line is
considered to be a comment and is skipped.
- Static Resources
- For static resource names and values, the configuration
file contains a list of resource names/values pairs, one pair per line and
separated by white space. An Example of static resource names and values
could be the number of tape drives of different types and could be
specified by
-
- Shell Commands
- If the first character of the value is an exclamation mark
(!), the entire rest of the line is saved to be executed through the
services of the system(3) standard library routine.
- The shell escape provides a means for the resource monitor
to yield arbitrary information to the scheduler. Parameter substitution is
done such that the value of any qualifier sent with the query, as
explained below, replaces a token with a percent sign (%) followed by the
name of the qualifier. For example, here is a configuration file line
which gives a resource name of "escape":
- If a query for "escape" is sent with no
qualifiers, the command executed would be "echo %xxx %yyy". If
one qualifier is sent, "escape[xxx=hi there]", the command
executed would be "echo hi there %yyy". If two qualifiers are
sent, "escape[xxx=hi][yyy=there]", the command executed would be
"echo hi there". If a qualifier is sent with no matching token
in the command line, "escape[zzz=snafu]", an error is
reported.
- size[fs=<FS>]
- Specifies that the available and configured disk space in
the <FS> filesystem is to be reported to the pbs_server and
scheduler. NOTE: To request disk space on a per job basis, specify the
file resource as in 'qsub -l nodes=1,file=1000kb' For example, the
available and configured disk space in the /localscratch filesystem will
be reported:
-
- Initialization Value
- An initialization value directive has a name which starts
with a dollar sign ($) and must be known to MOM via an internal table. The
entries in this table now are:
- pbsserver
- which defines hostnames running pbs_server that will be
allowed to submit jobs, issue Resource Monitor (RM) requests, and get
status updates. MOM will continually attempt to contact all server hosts
for node status and state updates. Like $PBS_SERVER_HOME/server_name, the
hostname may be followed by a colon and a port number. This parameter
replaces the oft-confused $clienthost parameter from TORQUE 2.0.0p0 and
earlier. Note that the hostname in $PBS_SERVER_HOME/server_name is used if
no $pbsserver parameters are found
- pbsclient
- which causes a host name to be added to the list of hosts
which will be allowed to connect to MOM as long as they are using a
privilaged port for the purposes of resource monitor requests. For
example, here are two configuration file lines which will allow the hosts
"fred" and "wilma" to connect:
-
- Two host name are always allowed to connection to pbs_mom,
"localhost" and the name returned to pbs_mom by the system call
gethostname(). These names need not be specified in the configuration
file. The hosts listed as "clients" can issue Resource Monitor
(RM) requests. Other MOM nodes and servers do not need to be listed as
clients.
- restricted
- which causes a host name to be added to the list of hosts
which will be allowed to connect to MOM without needing to use a
privilaged port. These names allow for wildcard matching. For example,
here is a configuration file line which will allow queries from any host
from the domain "ibm.com".
- The restriction which applies to these connections is that
only internal queries may be made. No resources from a config file will be
found. This is to prevent any shell commands from being run by a non-root
process.
This parameter is generally not required except for some versions of
OSX.
- logevent
- which sets the mask that determines which event types are
logged by pbs_mom. For example:
-
- The first example would set the log event mask to 0x1ff
(511) which enables logging of all events including debug events. The
second example would set the mask to 0x0ff (255) which enables all events
except debug events.
- cputmult
- which sets a factor used to adjust cpu time used by a job.
This is provided to allow adjustment of time charged and limits enforced
where the job might run on systems with different cpu performance. If
Mom's system is faster than the reference system, set cputmult to a
decimal value greater than 1.0. If Mom's system is slower, set cputmult to
a value between 1.0 and 0.0. For example:
-
- usecp
- specifies which directories should be staged with cp
instead of rcp/scp. If a shared filesystem is available on all hosts in a
cluster, this directive is used to make these filesystems known to MOM.
For example, if /home is NFS mounted on all nodes in a cluster:
-
- wallmult
- which sets a factor used to adjust wall time usage by to
job to a common reference system. The factor is used for walltime
calculations and limits the same as cputmult is used for cpu time.
- configversion
- specifies the version of the config file data, a
string.
- check_poll_time
- specifies the MOM interval in seconds. MOM checks each job
for updated resource usages, exited processes, over-limit conditions, etc.
once per interval. This value should be equal or lower to pbs_server's
job_stat_rate. High values result in stale information reported to
pbs_server. Low values result in increased system usage by MOM. Default is
45 seconds.
- down_on_error
- causes MOM to report itself as state "down" to
pbs_server in the event of a failed health check. This feature is
EXPERIMENTAL and likely to be removed in the future. See HEALTH CHECK
below.
- ideal_load
- ideal processor load. Represents a low water mark for the
load average. Nodes that are currently busy will consider itself free
after falling below ideal_load.
- auto_ideal_load
- if jobs are running, sets idea_load based on a simple
expression. The expressions start with the variable 't' (total assigned
CPUs) or 'c' (existing CPUs), an operator (+ - / *), and followed by a
float constant.
-
- loglevel
- specifies the verbosity of logging with higher numbers
specifying more verbose logging. Values may range between 0 and 7.
- log_file_max_size
- If this is set to a value > 0 then pbs_mom will roll the
current log file to log-file-name.1 when its size is greater than or equal
to the value of log_file_max_size. This value is interpreted as
kilobytes.
- log_file_roll_depth
- If this is set to a value >=1 and log_file_max_size is
set then pbs_mom will continue rolling the log files to
log-file-name.log_file_roll_depth.
- max_load
- maximum processor load. Nodes over this load average are
considered busy (see ideal_load above).
- auto_max_load
- if jobs are running, sets max_load based on a simple
expression. The expressions start with the variable 't' (total assigned
CPUs) or 'c' (existing CPUs), an operator (+ - / *), and followed by a
float constant.
- enablemomrestart
- enable automatic restarts of MOM. If enabled, MOM will
check if its binary has been updated and restart itself at a safe point
when no jobs are running; thus making upgrades easier. The check is made
by comparing the mtime of the pbs_mom executable. Command-line args, the
process name, and the PATH env variable are preserved across restarts. It
is recommended that this not be enabled in the config file, but enabled
when desired with momctl (see RESOURCES for more information.)
- node_check_script
- specifies the fully qualified pathname of the health check
script to run (see HEALTH CHECK for more information).
- node_check_interval
- specifies when to run the MOM health check. The check can
be either periodic, event-driver, or both. The value starts with an
integer specifying the number of MOM intervals between subsequent
executions of the specified health check. After the integer is an optional
comma-separated list of event names. Currently supported are
"jobstart" and "jobend". This value defaults to 1 with
no events indicating the check is run every MOM interval. (see HEALTH
CHECK for more information)
-
- prologalarm
- Specifies maximum duration (in seconds) which the MOM will
wait for the job prolog or job job epilog to complete. This parameter
default to 300 seconds (5 minutes)
- rcpcmd
- Specify the the full path and argument to be used for
remote file copies. This overrides the compile-time default found in
configure. This must contain 2 words: the full path to the command and the
switches. The copy command must be able to recursively copy files to the
remote host and accept arguments of the form "user@host:files"
For example:
-
- remote_checkpoint_dirs
- Specifies what server checkpoint directories are remotely
mounted. This directive is used to tell the MOM which directories are
shared with the server. Using remote checkpoint directories eliminates the
need to copy the checkpoint files back and forth between the MOM and the
server. This parameter is available in 2.4.1 and later.
- remote_reconfig
- Enables the ability to remotely reconfigure pbs_mom with a
new config file. Default is disabled. This parameter accepts various forms
of true, yes, and 1.
- timeout
- Specifies the number of seconds before TCP messages will
time out. TCP messages include job obituaries, and TM requests if RPP is
disabled. Default is 60 seconds.
- tmpdir
- Sets the directory basename for a per-job temporary
directory. Before job launch, MOM will append the jobid to the tmpdir
basename and create the directory. After the job exit, MOM will
recursively delete it. The env variable TMPDIR will be set for all
pro/epilog scripts, the job script, and TM tasks.
Directory creation and removal is done as the job owner and group, so the
owner must have write permission to create the directory. If the directory
already exists and is owned by the job owner, it will not be deleted after
the job. If the directory already exists and is NOT owned by the job
owner, the job start will be rejected.
- status_update_time
- Specifies (in seconds) how often MOM updates its status
information to pbs_server. This value should correlate with the server's
scheduling interval. High values increase the load of pbs_server and the
network. Low values cause pbs_server to report stale information. Default
is 45 seconds.
- varattr
- This is similar to a shell escape above, but includes a
TTL. The command will only be run every TTL seconds. A TTL of -1 will
cause the command to be executed only once. A TTL of 0 will cause the
command to be run everytime varattr is requested. This parameter may be
used multiple times, but all output will be grouped into a single
"varattr" attribute in the request and status output. The
command should output data in the form of
The configuration file must be executable and "secure". It must be
owned by a user id and group id less than 10 and not be world writable. Output
from this file must be in the format $VAR=$VAL, i.e.,
-
- xauthpath
- Specifies the path to the xauth binary to enable X11
fowarding.
- ignvmem
- If set to true, then pbs_mom will ignore vmem/pvmem limit
enforcement.
- ignwalltime
- If set to true, then pbs_mom will ignore walltime limit
enforcement.
- mom_host
- Sets the local hostname as used by pbs_mom.
RESOURCES¶
Resource Monitor queries can be made with momctl's -q option to retrieve and set
pbs_mom options. Any configured static resource may be retrieved with a
request of the same name. These are resource requests not otherwise documented
in the PBS ERS.
- cycle
- forces an immediate MOM cycle
- status_update_time
- retrieve or set the $status_update_time parameter
- check_poll_time
- retrieve or set the $check_poll_time parameter
- configversion
- retrieve the config version
- jobstartblocktime
- retrieve or set the $jobstartblocktime parameter
- enablemomrestart
- retrieve or set the $enablemomrestart parameter
- loglevel
- retrieve or set the $loglevel parameter
- down_on_error
- retrieve or set the EXPERIMENTAL $down_on_error
parameter
- diag0 - diag4
- retrieves various diagnostic information
- rcpcmd
- retrieve or set the $rcpcmd parameter
- version
- retrieves the pbs_mom version
HEALTH CHECK¶
The health check script is executed directly by the pbs_mom daemon under the
root user id. It must be accessible from the compute node and may be a script
or compiled executable program. It may make any needed system calls and
execute any combination of system utilities but should not execute resource
manager client commands. Also, as of TORQUE 1.0.1, the pbs_mom daemon blocks
until the health check is completed and does not possess a built-in timeout.
Consequently, it is advisable to keep the launch script execution time short
and verify that the script will not block even under failure conditions.
If the script detects a failure, it should return the keyword 'ERROR' to stdout
followed by an error message. The message (up to 256 characters) immediately
following the ERROR string will be assigned to the node attribute 'message' of
the associated node.
If the script detects a failure when run from "jobstart", then the job
will be rejected. This should probably only be used with advanced schedulers
like Moab so that the job can be routed to another node.
TORQUE currently ignores ERROR messages by default, but advanced schedulers like
moab can be configured to react appropriately.
If the experimental $down_on_error MOM setting is enabled, MOM will set itself
to state down and report to pbs_server; and pbs_server will report the node as
"down". Additionally, the experimental "down_on_error"
server attribute can be enabled which has the same effect but moves the
decision to pbs_server. It is redundant to have MOM's $down_on_error and
pbs_server's down_on_error features enabled. See "down_on_error" in
pbs_server_attributes(7B).
FILES¶
- $PBS_SERVER_HOME/server_name
- contains the hostname running pbs_server.
- $PBS_SERVER_HOME/mom_priv
- the default directory for configuration files, typically
(/usr/spool/pbs)/mom_priv.
- $PBS_SERVER_HOME/mom_logs
- directory for log files recorded by the server.
- $PBS_SERVER_HOME/mom_priv/prologue
- the administrative script to be run before job
execution.
- $PBS_SERVER_HOME/mom_priv/epilogue
- the administrative script to be run after job
execution.
SIGNAL HANDLING¶
pbs_mom handles the following signals:
- SIGHUP
- causes pbs_mom to re-read its configuration file, close and
reopen the log file, and reinitialize resource structures.
- SIGALRM
- results in a log file entry. The signal is used to limit
the time taken by certain children processes, such as the prologue and
epilogue.
- SIGINT and SIGTERM
- results in pbs_mom exiting without terminating any running
jobs. This is the action for the following signals as well: SIGXCPU,
SIGXFSZ, SIGCPULIM, and SIGSHUTDN.
- SIGUSR1, SIGUSR2
- causes MOM to increase and decrease logging levels,
respectively.
- SIGPIPE, SIGINFO
-
are ignored.
- SIGBUS, SIGFPE, SIGILL, SIGTRAP, and SIGSYS
- cause a core dump if the PBSCOREDUMP environmental variable
is defined.
All other signals have their default behavior installed.
EXIT STATUS¶
If the mini-server command fails to begin operation, the server exits with a
value greater than zero.
SEE ALSO¶
pbs_server(8B), pbs_scheduler_basl(8B), pbs_scheduler_tcl(8B), the PBS External
Reference Specification, and the PBS Administrator's Guide.