table of contents
SCONTROL(1) | Slurm components | SCONTROL(1) |
NAME¶
scontrol - Used view and modify Slurm configuration and state.SYNOPSIS¶
scontrol [OPTIONS...] [COMMAND...]DESCRIPTION¶
scontrol is used to view or modify Slurm configuration including: job, job step, node, partition, reservation, and overall system configuration. Most of the commands can only be executed by user root. If an attempt to view or modify configuration information is made by an unauthorized user, an error message will be printed and the requested action will not occur. If no command is entered on the execute line, scontrol will operate in an interactive mode and prompt for input. It will continue prompting for input and executing commands until explicitly terminated. If a command is entered on the execute line, scontrol will execute that command and terminate. All commands and options are case-insensitive, although node names, partition names, and reservation names are case-sensitive (node names "LX" and "lx" are distinct). All commands and options can be abbreviated to the extent that the specification is unique.OPTIONS¶
- -a, --all
- When the show command is used, then display all partitions, their jobs and jobs steps. This causes information to be displayed about partitions that are configured as hidden and partitions that are unavailable to user's group.
- -d, --details
- Causes the show command to provide additional details where available.
- -h, --help
- Print a help message describing the usage of scontrol.
- --hide
- Do not display information about hidden partitions, their jobs and job steps. By default, neither partitions that are configured as hidden nor those partitions unavailable to user's group will be displayed (i.e. this is the default behavior).
- -M, --clusters=<string>
- The cluster to issue commands to. Only one cluster name may
be specified.
- -o, --oneliner
- Print information one line per record.
- -Q, --quiet
- Print no warning or informational messages, only fatal error messages.
- -v, --verbose
- Print detailed event logging. Multiple -v's will
further increase the verbosity of logging. By default only errors will be
displayed.
- -V , --version
- Print version information and exit.
- COMMANDS
-
- all
- Show all partitions, their jobs and jobs steps. This causes
information to be displayed about partitions that are configured as hidden
and partitions that are unavailable to user's group.
- abort
- Instruct the Slurm controller to terminate immediately and
generate a core file. See "man slurmctld" for information about
where the core file will be written.
- checkpoint CKPT_OP ID
- Perform a checkpoint activity on the job step(s) with the specified identification. ID can be used to identify a specific job (e.g. "<job_id>", which applies to all of its existing steps) or a specific job step (e.g. "<job_id>.<step_id>"). Acceptable values for CKPT_OP include:
- able
- Test if presently not disabled, report start time if checkpoint in progress
- create
- Create a checkpoint and continue the job or job step
- disable
- Disable future checkpoints
- enable
- Enable future checkpoints
- error
- Report the result for the last checkpoint request, error code and message
- restart
- Restart execution of the previously checkpointed job or job step
- requeue
- Create a checkpoint and requeue the batch job, combines vacate and restart operations
- vacate
- Create a checkpoint and terminate the job or job step
- MaxWait=<seconds>
- Maximum time for checkpoint to be written. Default value is 10 seconds. Valid with create and vacate options only.
- ImageDir=<directory_name>
- Location of checkpoint file. Valid with create, vacate and restart options only. This value takes precedent over any --checkpoint-dir value specified at job submission time.
- StickToNodes
- If set, resume job on the same nodes are previously used. Valid with the restart option only.
- cluster CLUSTER_NAME
- The cluster to issue commands to. Only one cluster name may
be specified.
- create SPECIFICATION
- Create a new partition or reservation. See the full list of
parameters below. Include the tag "res" to create a reservation
without specifying a reservation name.
- completing
- Display all jobs in a COMPLETING state along with
associated nodes in either a COMPLETING or DOWN state.
- delete SPECIFICATION
- Delete the entry with the specified SPECIFICATION.
The two SPECIFICATION choices are PartitionName=<name>
and Reservation=<name>. On Dynamically laid out Bluegene
systems BlockName=<name> also works. Reservations and
partitions should have no associated jobs at the time of their deletion
(modify the job's first). If the specified partition is in use, the
request is denied.
- details
- Causes the show command to provide additional
details where available. Batch job information will include the batch
script for jobs the user is authorized to view. Job information will
include CPUs and NUMA memory allocated on each node. Note that on
computers with hyperthreading enabled and SLURM configured to allocate
cores, each listed CPU represents one physical core. Each hyperthread on
that core can be allocated a separate task, so a job's CPU count and task
count may differ. See the --cpu_bind and --mem_bind option
descriptions in srun man pages for more information. The details
option is currently only supported for the show job command.
- exit
- Terminate the execution of scontrol. This is an independent
command with no options meant for use in interactive mode.
- help
- Display a description of scontrol options and commands.
- hide
- Do not display partition, job or jobs step information for
partitions that are configured as hidden or partitions that are
unavailable to the user's group. This is the default behavior.
- hold job_id
- Prevent a pending job from beginning started (sets it's
priority to 0). Use the release command to permit the job to be
scheduled. Note that when a job is held by a system administrator using
the hold command, only a system administrator may release the job
for execution (also see the uhold command). When the job is held by
its owner, it may also be released by the job's owner.
- notify job_id message
- Send a message to standard error of the salloc or srun
command or batch job associated with the specified job_id.
- oneliner
- Print information one line per record.
- pidinfo proc_id
- Print the Slurm job id and scheduled termination time
corresponding to the supplied process id, proc_id, on the current
node. This will work only with processes on node on which scontrol is run,
and only for those processes spawned by SLURM and their descendants.
- listpids [job_id[.step_id]] [NodeName]
- Print a listing of the process IDs in a job step (if
JOBID.STEPID is provided), or all of the job steps in a job (if
job_id is provided), or all of the job steps in all of the jobs on
the local node (if job_id is not provided or job_id is
"*"). This will work only with processes on the node on which
scontrol is run, and only for those processes spawned by SLURM and their
descendants. Note that some SLURM configurations ( ProctrackType
value of pgid or aix) are unable to identify all processes
associated with a job or job step.
- ping
- Ping the primary and secondary slurmctld daemon and report
if they are responding.
- quiet
- Print no warning or informational messages, only fatal
error messages.
- quit
- Terminate the execution of scontrol.
- reconfigure
- Instruct all Slurm daemons to re-read the configuration
file. This command does not restart the daemons. This mechanism would be
used to modify configuration parameters (Epilog, Prolog, SlurmctldLogFile,
SlurmdLogFile, etc.) register the physical addition or removal of nodes
from the cluster or recognize the change of a node's configuration, such
as the addition of memory or processors. The Slurm controller (slurmctld)
forwards the request all other daemons (slurmd daemon on each compute
node). Running jobs continue execution. Most configuration parameters can
be changed by just running this command, however, SLURM daemons should be
shutdown and restarted if any of these parameters are to be changed:
AuthType, BackupAddr, BackupController, ControlAddr, ControlMach,
PluginDir, StateSaveLocation, SlurmctldPort or SlurmdPort.
- release job_id
- Release a previously held job to begin execution. Also see
hold.
- requeue job_id
- Requeue a running or pending SLURM batch job.
- resume job_id
- Resume a previously suspended job. Also see suspend.
- schedloglevel LEVEL
- Enable or disable scheduler logging. LEVEL may be
"0", "1", "disable" or "enable".
"0" has the same effect as "disable". "1"
has the same effect as "enable". This value is temporary and
will be overwritten when the slurmctld daemon reads the slurm.conf
configuration file (e.g. when the daemon is restarted or scontrol
reconfigure is executed) if the SlurmSchedLogLevel parameter is
present.
- setdebug LEVEL
- Change the debug level of the slurmctld daemon.
LEVEL may be an integer value between zero and nine (using the same
values as SlurmctldDebug in the slurm.conf file) or the name
of the most detailed message type to be printed: "quiet",
"fatal", "error", "info",
"verbose", "debug", "debug2",
"debug3", "debug4", or "debug5". This value
is temporary and will be overwritten whenever the slurmctld daemon reads
the slurm.conf configuration file (e.g. when the daemon is restarted or
scontrol reconfigure is executed).
- setdebugflags [+|-]FLAG
- Add or remove DebugFlags of the slurmctld daemon. See
"man slurm.conf" for a list of supported DebugFlags. NOTE:
Changing the value of some DebugFlags will have no effect without
restarting the slurmctld daemon, which would set DebugFlags based upon the
contents of the slurm.conf configuration file.
- show ENTITY ID
- Display the state of the specified entity with the
specified identification. ENTITY may be aliases,
config, daemons, frontend, job, node,
partition, reservation, slurmd, step,
topology, hostlist or hostnames (also block or
subbp on BlueGene systems). ID can be used to identify a
specific element of the identified entity: the configuration parameter
name, job ID, node name, partition name, reservation name, or job step ID
for config, job, node, partition, or
step respectively. For an ENTITY of topology, the
ID may be a node or switch name. If one node name is specified, all
switches connected to that node (and their parent switches) will be shown.
If more than one node name is specified, only switches that connect to all
named nodes will be shown. aliases will return all NodeName
values associated to a given NodeHostname (useful to get the list
of virtual nodes associated with a real node in a configuration where
multiple slurmd daemons execute on a single compute node). config
displays parameter names from the configuration files in mixed case (e.g.
SlurmdPort=7003) while derived parameters names are in upper case only
(e.g. SLURM_VERSION). hostnames takes an optional hostlist
expression as input and writes a list of individual host names to standard
output (one per line). If no hostlist expression is supplied, the contents
of the SLURM_NODELIST environment variable is used. For example
"tux[1-3]" is mapped to "tux1","tux2" and
"tux3" (one hostname per line). hostlist takes a list of
host names and prints the hostlist expression for them (the inverse of
hostnames). hostlist can also take the absolute pathname of
a file (beginning with the character '/') containing a list of hostnames.
Multiple node names may be specified using simple node range expressions
(e.g. "lx[10-20]"). All other ID values must identify a
single element. The job step ID is of the form "job_id.step_id",
(e.g. "1234.1"). slurmd reports the current status of the
slurmd daemon executing on the same node from which the scontrol command
is executed (the local host). It can be useful to diagnose problems. By
default, all elements of the entity type specified are printed. For an
ENTITY of job, if the job does not specify socket-per-node,
cores-per-socket or threads-per-core then it will display '*' in
ReqS:C:T=*:*:* field.
- shutdown OPTION
- Instruct Slurm daemons to save current state and terminate.
By default, the Slurm controller (slurmctld) forwards the request all
other daemons (slurmd daemon on each compute node). An OPTION of
slurmctld or controller results in only the slurmctld daemon
being shutdown and the slurmd daemons remaining active.
- suspend job_id
- Suspend a running job. Use the resume command to
resume its execution. User processes must stop on receipt of SIGSTOP
signal and resume upon receipt of SIGCONT for this operation to be
effective. Not all architectures and configurations support job
suspension.
- takeover
- Instruct SLURM's backup controller (slurmctld) to take over
system control. SLURM's backup controller requests control from the
primary and waits for its termination. After that, it switches from backup
mode to controller mode. If primary controller can not be contacted, it
directly switches to controller mode. This can be used to speed up the
SLURM controller fail-over mechanism when the primary node is down. This
can be used to minimize disruption if the computer executing the primary
SLURM controller is scheduled down. (Note: SLURM's primary controller will
take the control back at startup.)
- uhold job_id
- Prevent a pending job from being started (sets it's
priority to 0). Use the release command to permit the job to be
scheduled. This command is designed for a system administrator to hold a
job so that the job owner may release it rather than requiring the
interventon of a system administrator (also see the hold command).
- update SPECIFICATION
- Update job, step, node, partition, or reservation
configuration per the supplied specification. SPECIFICATION is in
the same format as the Slurm configuration file and the output of the
show command described above. It may be desirable to execute the
show command (described above) on the specific entity you which to
update, then use cut-and-paste tools to enter updated configuration values
to the update. Note that while most configuration values can be
changed using this command, not all can be changed using this mechanism.
In particular, the hardware configuration of a node or the physical
addition or removal of nodes from the cluster may only be accomplished
through editing the Slurm configuration file and executing the
reconfigure command (described above).
- verbose
- Print detailed event logging. This includes time-stamps on
data structures, record counts, etc.
- version
- Display the version number of scontrol being executed.
- wait_job job_id
- Wait until a job andall of its nodes are ready for use or
the job has entered some termination state. This option is particularly
useful in the SLURM Prolog or in the batch script itself if nodes are
powered down and restarted automatically as needed.
- !!
- Repeat the last command executed.
- SPECIFICATIONS FOR UPDATE COMMAND, JOBS
- Account=<account>
- Account name to be changed for this job's resource use. Value may be cleared with blank data value, "Account=".
- Conn-Type=<type>
- Reset the node connection type. Possible values on Blue Gene are "MESH", "TORUS" and "NAV" (mesh else torus).
- Contiguous=<yes|no>
- Set the job's requirement for contiguous (consecutive) nodes to be allocated. Possible values are "YES" and "NO".
- Dependency=<dependency_list>
- Defer job's initiation until specified job dependency specification is satisfied. Cancel dependency with an empty dependency_list (e.g. "Dependency="). < dependency_list> is of the form < type:job_id[:job_id][,type:job_id[:job_id]]>. Many jobs can share the same dependency and these jobs may even belong to different users.
- after:job_id[:jobid...]
- This job can begin execution after the specified jobs have begun execution.
- afterany:job_id[:jobid...]
- This job can begin execution after the specified jobs have terminated.
- afternotok:job_id[:jobid...]
- This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc).
- afterok:job_id[:jobid...]
- This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero).
- singleton
- This job can begin execution after any previously launched jobs sharing the same job name and user have terminated.
- EligibleTime=<time_spec>
- See StartTime.
- ExcNodeList=<nodes>
- Set the job's list of excluded node. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). Value may be cleared with blank data value, "ExcNodeList=".
- Features=<features>
- Set the job's required node features. The list of features
may include multiple feature names separated by ampersand (AND) and/or
vertical bar (OR) operators. For example:
Features="opteron&video" or
Features="fast|faster". In the first example, only nodes
having both the feature "opteron" AND the feature
"video" will be used. There is no mechanism to specify that you
want one node with feature "opteron" and another node with
feature "video" in case no node has both features. If only one
of a set of possible options should be used for all allocated nodes, then
use the OR operator and enclose the options within square brackets. For
example: " Features=[rack1|rack2|rack3|rack4]" might be
used to specify that all nodes must be allocated on a single rack of the
cluster, but any of those four racks can be used. A request can also
specify the number of nodes needed with some feature by appending an
asterisk and count after the feature name. For example "
Features=graphics*4" indicates that at least four allocated
nodes must have the feature "graphics." Constraints with node
counts may only be combined with AND operators. Value may be cleared with
blank data value, for example "Features=".
- Geometry=<geo>
- Reset the required job geometry. On Blue Gene the value
should be three digits separated by "x" or ",". The
digits represent the allocation size in X, Y and Z dimensions (e.g.
"2x3x4").
- Gres=<list>
- Specifies a comma delimited list of generic consumable
resources. The format of each entry on the list is
"name[:count[*cpu]]". The name is that of the consumable
resource. The count is the number of those resources with a default value
of 1. The specified resources will be allocated to the job on each node
allocated unless "*cpu" is appended, in which case the resources
will be allocated on a per cpu basis. The available generic consumable
resources is configurable by the system administrator. A list of available
generic consumable resources will be printed and the command will exit if
the option argument is "help". Examples of use include
"Gres=gpus:2*cpu,disk=40G" and "Gres=help".
- JobId=<id>
- Identify the job to be updated. This specification is required.
- Licenses=<name>
- Specification of licenses (or other resources available on all nodes of the cluster) as described in salloc/sbatch/srun man pages.
- MinCPUsNode=<count>
- Set the job's minimum number of CPUs per node to the specified value.
- MinMemoryCPU=<megabytes>
- Set the job's minimum real memory required per allocated CPU to the specified value. Either MinMemoryCPU or MinMemoryNode may be set, but not both.
- MinMemoryNode=<megabytes>
- Set the job's minimum real memory required per node to the specified value. Either MinMemoryCPU or MinMemoryNode may be set, but not both.
- MinTmpDiskNode=<megabytes>
- Set the job's minimum temporary disk space required per node to the specified value.
- Name=<name>
- Set the job's name to the specified value.
- Nice[=delta]
- Adjust job's priority by the specified value. Default value is 100. The adjustment range is from -10000 (highest priority) to 10000 (lowest priority). Nice value changes are not additive, but overwrite any prior nice value and are applied to the job's base priority. Only privileged users can specify a negative adjustment.
- NodeList=<nodes>
- Change the nodes allocated to a running job to shrink it's size. The specified list of nodes must be a subset of the nodes currently allocated to the job. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). After a job's allocation is reduced, subsequent srun commands must explicitly specify node and task counts which are valid for the new allocation.
- NumCPUs=<min_count>[-<max_count>]
- Set the job's minimum and optionally maximum count of CPUs to be allocated.
- NumNodes=<min_count>[-<max_count>]
- Set the job's minimum and optionally maximum count of nodes to be allocated. If the job is already running, use this to specify a node count less than currently allocated and resources previously allocated to the job will be relinquished. After a job's allocation is reduced, subsequent srun commands must explicitly specify node and task counts which are valid for the new allocation. Also see the NodeList parameter above.
- NumTasks=<count>
- Set the job's count of required tasks to the specified value.
- Partition=<name>
- Set the job's partition to the specified value.
- Priority=<number>
- Set the job's priority to the specified value. Note that a job priority of zero prevents the job from ever being scheduled. By setting a job's priority to zero it is held. Set the priority to a non-zero value to permit it to run. Explicitly setting a job's priority clears any previously set nice value.
- QOS=<name>
- Set the job's QOS (Quality Of Service) to the specified value. Value may be cleared with blank data value, "QOS=".
- ReqCores=<count>
- Set the job's count of cores per socket to the specified value.
- ReqNodeList=<nodes>
- Set the job's list of required node. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). Value may be cleared with blank data value, "ReqNodeList=".
- ReqSockets=<count>
- Set the job's count of sockets per node to the specified value.
- ReqThreads=<count>
- Set the job's count of threads per core to the specified value.
- Requeue=<0|1>
- Stipulates whether a job should be requeued after a node failure: 0 for no, 1 for yes.
- ReservationName=<name>
- Set the job's reservation to the specified value. Value may be cleared with blank data value, "ReservationName=".
- Rotate=<yes|no>
- Permit the job's geometry to be rotated. Possible values are "YES" and "NO".
- Shared=<yes|no>
- Set the job's ability to share nodes with other jobs. Possible values are "YES" and "NO".
- StartTime=<time_spec>
- Set the job's earliest initiation time. It accepts times of the form HH:MM:SS to run a job at a specific time of day (seconds are optional). (If that time is already past, the next day is assumed.) You may also specify midnight, noon, or teatime (4pm) and you can have a time-of-day suffixed with AM or PM for running in the morning or the evening. You can also say what day the job will be run, by specifying a date of the form MMDDYY or MM/DD/YY or MM.DD.YY, or a date and time as YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now + count time-units, where the time-units can be minutes, hours, days, or weeks and you can tell SLURM to run the job today with the keyword today and to run the job tomorrow with the keyword tomorrow.
Notes on date/time specifications:
- although the 'seconds' field of the HH:MM:SS time specification is allowed by the code, note that the poll time of the SLURM scheduler is not precise enough to guarantee dispatch of the job on the exact second. The job will be eligible to start on the next poll following the specified time. The exact poll interval depends on the SLURM scheduler (e.g., 60 seconds with the default sched/builtin).
- if no time (HH:MM:SS) is specified, the default is (00:00:00).
- if a date is specified without a year (e.g., MM/DD) then the current year is assumed, unless the combination of MM/DD and HH:MM:SS has already passed for that year, in which case the next year is used.
- although the 'seconds' field of the HH:MM:SS time specification is allowed by the code, note that the poll time of the SLURM scheduler is not precise enough to guarantee dispatch of the job on the exact second. The job will be eligible to start on the next poll following the specified time. The exact poll interval depends on the SLURM scheduler (e.g., 60 seconds with the default sched/builtin).
- if no time (HH:MM:SS) is specified, the default is (00:00:00).
- if a date is specified without a year (e.g., MM/DD) then the current year is assumed, unless the combination of MM/DD and HH:MM:SS has already passed for that year, in which case the next year is used.
- Switches=<count>[@<max-time-to-wait>]
- When a tree topology is used, this defines the maximum
count of switches desired for the job allocation. If SLURM finds an
allocation containing more switches than the count specified, the job
remain pending until it either finds an allocation with desired switch
count or the time limit expires. By default there is no switch count limit
and no time limit delay. Set the count to zero in order to clean any
previously set count (disabling the limit). The job's maximum time delay
may be limited by the system administrator using the
SchedulerParameters configuration parameter with the
max_switch_wait parameter option. Also see wait-for-switch.
- TimeLimit=<time>
- The job's time limit. Output format is
[days-]hours:minutes:seconds or "UNLIMITED". Input format (for
update command) set is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and second
values are rounded up to the next minute. If changing the time limit of a
job, either specify a new time limit value or preceed the time with a
"+" or "-" to increment or decrement the current time
limit (e.g. "TimeLimit=+30"). In order to increment or decrement
the current time limit, the JobId specification must preceed the
TimeLimit specification.
- wait-for-switch=<max-time-to-wait>
- When a tree topology is used, this defines the maximum time to wait for the desired count of switches. If SLURM finds an allocation containing more switches than the count specified, the job remain pending until it either finds an allocation with desired switch count or the time limit expires. By default there is no switch count limit and there is not time delay. Set the time to zero in order to clean any previously set time limit (disabling the limit). The job's maximum time delay may be limited by the system administrator using the SchedulerParameters configuration parameter with the max_switch_wait parameter option. Also see Switches.
- WCKey=<key>
- Set the job's workload characterization key to the
specified value.
- NOTE: The "show" command, when used with the "job" or "job <jobid>"
- entity displays detailed information about a job or jobs.
Much of this information may be modified using the "update job"
command as described above. However, the following fields displayed by the
show job command are read-only and cannot be modified:
- AllocNode:Sid
- Local node and system id making the resource allocation.
- EndTime
- The time the job is expected to terminate based on the job's time limit. When the job ends sooner, this field will be updated with the actual end time.
- ExitCode=<exit>:<sig>
- Exit status reported for the job by the wait() function. The first number is the exit code, typically as set by the exit() function. The second number of the signal that caused the process to terminate if it was terminated by a signal.
- JobState
- The current state of the job.
- NodeList
- The list of nodes allocated to the job.
- NodeListIndices
- The NodeIndices expose the internal indices into the node table associated with the node(s) allocated to the job.
- PreemptTime
- Time at which job was signaled that it was selected for preemption. (Meaningful only for PreemptMode=CANCEL and the partition or QOS with which the job is associated has a GraceTime value designated.)
- PreSusTime
- Time the job ran prior to last suspend.
- Reason
- The reason job is not running: e.g., waiting "Resources".
- SubmitTime
- The time and date stamp (in Universal Time Coordiated, UTC)
the job was submitted. The format of the output is identical to that of
the EndTime field.
- SuspendTime
- Time the job was last suspended or resumed.
- UserId GroupId
- The user and group under which the job was submitted.
- NOTE on information displayed for various job states:
- When you submit a request for the "show job"
function the scontrol process makes an RPC request call to slurmctld with
a REQUEST_JOB_INFO message type. If the state of the job is PENDING, then
it returns some detail information such as: min_nodes, min_procs,
cpus_per_task, etc. If the state is other than PENDING the code assumes
that it is in a further state such as RUNNING, COMPLETE, etc. In these
cases the code explicitly returns zero for these values. These values are
meaningless once the job resources have been allocated and the job has
started.
- SPECIFICATIONS FOR UPDATE COMMAND, STEPS
- StepId=<job_id>[.<step_id>]
- Identify the step to be updated. If the job_id is given, but no step_id is specified then all steps of the identified job will be modified. This specification is required.
- TimeLimit=<time>
- The job's time limit. Output format is
[days-]hours:minutes:seconds or "UNLIMITED". Input format (for
update command) set is minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes or
days-hours:minutes:seconds. Time resolution is one minute and second
values are rounded up to the next minute. If changing the time limit of a
step, either specify a new time limit value or preceed the time with a
"+" or "-" to increment or decrement the current time
limit (e.g. "TimeLimit=+30"). In order to increment or decrement
the current time limit, the StepId specification must preceed the
TimeLimit specification.
- SPECIFICATIONS FOR UPDATE COMMAND, NODES
- NodeName=<name>
- Identify the node(s) to be updated. Multiple node names may be specified using simple node range expressions (e.g. "lx[10-20]"). This specification is required.
- Features=<features>
- Identify feature(s) to be associated with the specified
node. Any previously defined feature(s) will be overwritten with the new
value. Features assigned via scontrol will only persist across the
restart of the slurmctld daemon with the -R option and state files
preserved or slurmctld's receipt of a SIGHUP. Update slurm.conf with any
changes meant to be persistent across normal restarts of slurmctld or the
execution of scontrol reconfig.
- Gres=<gres>
- Identify generic resources to be associated with the
specified node. Any previously defined generic resources will be
overwritten with the new value. Specifications for multiple generic
resources should be comma separated. Each resource specification consists
of a name followed by an optional colon with a numeric value (default
value is one) (e.g. "Gres=bandwidth:10000,gpus"). Generic
resources assigned via scontrol will only persist across the
restart of the slurmctld daemon with the -R option and state files
preserved or slurmctld's receipt of a SIGHUP. Update slurm.conf with any
changes meant to be persistent across normal restarts of slurmctld or the
execution of scontrol reconfig.
- Reason=<reason>
- Identify the reason the node is in a "DOWN".
"DRAINED", "DRAINING", "FAILING" or
"FAIL" state. Use quotes to enclose a reason having more than
one word.
- State=<state>
- Identify the state to be assigned to the node. Possible
values are "NoResp", "ALLOC", "ALLOCATED",
"DOWN", "DRAIN", "FAIL",
"FAILING", "IDLE", "MIXED",
"MAINT", "POWER_DOWN", "POWER_UP", or
"RESUME". If a node is in a "MIXED" state it usually
means the node is in multiple states. For instance if only part of the
node is "ALLOCATED" and the rest of the node is "IDLE"
the state will be "MIXED". If you want to remove a node from
service, you typically want to set it's state to "DRAIN".
"FAILING" is similar to "DRAIN" except that some
applications will seek to relinquish those nodes before the job completes.
"RESUME" is not an actual node state, but will return a
"DRAINED", "DRAINING", or "DOWN" node to
service, either "IDLE" or "ALLOCATED" state as
appropriate. Setting a node "DOWN" will cause all running and
suspended jobs on that node to be terminated. "POWER_DOWN" and
"POWER_UP" will use the configured SuspendProg and
ResumeProg programs to explicitly place a node in or out of a power
saving mode. The "NoResp" state will only set the
"NoResp" flag for a node without changing its underlying state.
While all of the above states are valid, some of them are not valid new
node states given their prior state. Generally only "DRAIN",
"FAIL" and "RESUME" should be used. NOTE: The scontrol
command should not be used to change node state on Cray systems. Use Cray
tools such as xtprocadmin instead.
- Weight=<weight>
- Identify weight to be associated with specified nodes. This
allows dynamic changes to weight associated with nodes, which will be used
for the subsequent node allocation decisions. Weight assigned via
scontrol will only persist across the restart of the slurmctld
daemon with the -R option and state files preserved or slurmctld's
receipt of a SIGHUP. Update slurm.conf with any changes meant to be
persistent across normal restarts of slurmctld or the execution of
scontrol reconfig.
- SPECIFICATIONS FOR UPDATE COMMAND, FRONTEND
-
- FrontendName=<name>
- Identify the front end node to be updated. This
specification is required.
- Reason=<reason>
- Identify the reason the node is in a "DOWN" or
"DRAIN" state. Use quotes to enclose a reason having more than
one word.
- State=<state>
- Identify the state to be assigned to the front end node.
Possible values are "DOWN", "DRAIN" or
"RESUME". If you want to remove a front end node from service,
you typically want to set it's state to "DRAIN".
"RESUME" is not an actual node state, but will return a
"DRAINED", "DRAINING", or "DOWN" front end
node to service, either "IDLE" or "ALLOCATED" state as
appropriate. Setting a front end node "DOWN" will cause all
running and suspended jobs on that node to be terminated.
- SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, PARTITIONS
- AllowGroups=<name>
- Identify the user groups which may use this partition.
Multiple groups may be specified in a comma separated list. To permit all
groups to use the partition specify "AllowGroups=ALL".
- AllocNodes=<name>
- Comma separated list of nodes from which users can execute
jobs in the partition. Node names may be specified using the node range
expression syntax described above. The default value is "ALL".
- Alternate=<partition name>
- Alternate partition to be used if the state of this
partition is "DRAIN" or "INACTIVE." The value
"NONE" will clear a previously set alternate partition.
- Default=<yes|no>
- Specify if this partition is to be used by jobs which do
not explicitly identify a partition to use. Possible output values are
"YES" and "NO". In order to change the default
partition of a running system, use the scontrol update command and set
Default=yes for the partition that you want to become the new default.
- DefaultTime=<time>
- Run time limit used for jobs that don't specify a value. If
not set then MaxTime will be used. Format is the same as for MaxTime.
- DefMemPerCPU=<MB>
- Set the default memory to be allocated per CPU for jobs in this partition. The memory size is specified in megabytes.
- DefMemPerCNode=<MB>
- Set the default memory to be allocated per node for jobs in
this partition. The memory size is specified in megabytes.
- DisableRootJobs=<yes|no>
- Specify if jobs can be executed as user root. Possible
values are "YES" and "NO".
- GraceTime=<seconds>
- Specifies, in units of seconds, the preemption grace time
to be extended to a job which has been selected for preemption. The
default value is zero, no preemption grace time is allowed on this
partition or qos. (Meaningful only for PreemptMode=CANCEL)
- Hidden=<yes|no>
- Specify if the partition and its jobs should be hidden from
view. Hidden partitions will by default not be reported by SLURM APIs or
commands. Possible values are "YES" and "NO".
- MaxMemPerCPU=<MB>
- Set the maximum memory to be allocated per CPU for jobs in this partition. The memory size is specified in megabytes.
- MaxMemPerCNode=<MB>
- Set the maximum memory to be allocated per node for jobs in
this partition. The memory size is specified in megabytes.
- MaxNodes=<count>
- Set the maximum number of nodes which will be allocated to
any single job in the partition. Specify a number, "INFINITE" or
"UNLIMITED". (On a Bluegene type system this represents a c-node
count.) Changing the MaxNodes of a partition has no effect upon
jobs that have already begun execution.
- MaxTime=<time>
- The maximum run time for jobs. Output format is
[days-]hours:minutes:seconds or "UNLIMITED". Input format (for
update command) is minutes, minutes:seconds, hours:minutes:seconds,
days-hours, days-hours:minutes or days-hours:minutes:seconds. Time
resolution is one minute and second values are rounded up to the next
minute. Changing the MaxTime of a partition has no effect upon jobs
that have already begun execution.
- MinNodes=<count>
- Set the minimum number of nodes which will be allocated to
any single job in the partition. (On a Bluegene type system this
represents a c-node count.) Changing the MinNodes of a partition
has no effect upon jobs that have already begun execution.
- Nodes=<name>
- Identify the node(s) to be associated with this partition.
Multiple node names may be specified using simple node range expressions
(e.g. "lx[10-20]"). Note that jobs may only be associated with
one partition at any time. Specify a blank data value to remove all nodes
from a partition: "Nodes=". Changing the Nodes in a
partition has no effect upon jobs that have already begun execution.
- PartitionName=<name>
- Identify the partition to be updated. This specification is
required.
- PreemptMode=<mode>
- Reset the mechanism used to preempt jobs in this partition
if PreemptType is configured to preempt/partition_prio. The
default preemption mechanism is specified by the cluster-wide
PreemptMode configuration parameter. Possible values are
"OFF", "CANCEL", "CHECKPOINT",
"REQUEUE" and "SUSPEND".
- Priority=<count>
- Jobs submitted to a higher priority partition will be
dispatched before pending jobs in lower priority partitions and if
possible they will preempt running jobs from lower priority partitions.
Note that a partition's priority takes precedence over a job's priority.
The value may not exceed 65533.
- RootOnly=<yes|no>
- Specify if only allocation requests initiated by user root
will be satisfied. This can be used to restrict control of the partition
to some meta-scheduler. Possible values are "YES" and
"NO".
- Shared=<yes|no|exclusive|force>[:<job_count>]
- Specify if nodes in this partition can be shared by
multiple jobs. Possible values are "YES", "NO",
"EXCLUSIVE" and "FORCE". An optional job count
specifies how many jobs can be allocated to use each resource.
- State=<up|down|drain|inactive>
- Specify if jobs can be allocated nodes or queued in this partition. Possible values are "UP", "DOWN", "DRAIN" and "INACTIVE".
- UP
- Designates that new jobs may queued on the partition, and that jobs may be allocated nodes and run from the partition.
- DOWN
- Designates that new jobs may be queued on the partition, but queued jobs may not be allocated nodes and run from the partition. Jobs already running on the partition continue to run. The jobs must be explicitly canceled to force their termination.
- DRAIN
- Designates that no new jobs may be queued on the partition (job submission requests will be denied with an error message), but jobs already queued on the partition may be allocated nodes and run. See also the "Alternate" partition specification.
- INACTIVE
- Designates that no new jobs may be queued on the partition, and jobs already queued may not be allocated nodes and run. See also the "Alternate" partition specification.
- SPECIFICATIONS FOR CREATE, UPDATE, AND DELETE COMMANDS, RESERVATIONS
- Reservation=<name>
- Identify the name of the reservation to be created,
updated, or deleted. This parameter is required for update and is the only
parameter for delete. For create, if you do not want to give a reservation
name, use "scontrol create res ..." and a name will be created
automatically.
- Accounts=<account list>
- List of accounts permitted to use the reserved nodes. E.g.
Accounts=physcode1,physcode2. A user in any of the accounts may use the
reserved nodes. A new reservation must specify Users and/or Accounts.
- Licenses=<license>
- Specification of licenses (or other resources available on
all nodes of the cluster) which are to be reserved. License names can be
followed by an asterisk and count (the default count is one). Multiple
license names should be comma separated (e.g.
"Licenses=foo*4,bar"). A new reservation must specify one or
more resource to be included: NodeCnt, Nodes and/or Licenses.
- NodeCnt=<num>
- Identify number of nodes to be reserved. On BlueGene
systems, this number represents a cnode (compute node) count and will be
rounded up as needed to represent whole nodes (midplanes). A new
reservation must specify one or more resource to be included: NodeCnt,
Nodes and/or Licenses.
- Nodes=<name>
- Identify the node(s) to be reserved. Multiple node names
may be specified using simple node range expressions (e.g.
"Nodes=lx[10-20]"). Specify a blank data value to remove all
nodes from a reservation: "Nodes=". A new reservation must
specify one or more resource to be included: NodeCnt, Nodes and/or
Licenses.
- StartTime=<time_spec>
- The start time for the reservation. A new reservation must
specify a start time. It accepts times of the form HH:MM:SS for a
specific time of day (seconds are optional). (If that time is already
past, the next day is assumed.) You may also specify midnight,
noon, or teatime (4pm) and you can have a time-of-day
suffixed with AM or PM for running in the morning or the
evening. You can also say what day the job will be run, by specifying a
date of the form MMDDYY or MM/DD/YY or MM.DD.YY, or a
date and time as YYYY-MM-DD[THH:MM[:SS]]. You can also give times
like now + count time-units, where the time-units can be
minutes, hours, days, or weeks and you can
tell SLURM to run the job today with the keyword today and to run
the job tomorrow with the keyword tomorrow.
- EndTime=<time_spec>
- The end time for the reservation. A new reservation must
specify an end time or a duration. Valid formats are the same as for
StartTime.
- Duration=<time>
- The length of a reservation. A new reservation must specify
an end time or a duration. Valid formats are minutes, minutes:seconds,
hours:minutes:seconds, days-hours, days-hours:minutes,
days-hours:minutes:seconds, or UNLIMITED. Time resolution is one minute
and second values are rounded up to the next minute. Output format is
always [days-]hours:minutes:seconds.
- PartitionName=<name>
- Identify the partition to be reserved.
- Flags=<flags>
- Flags associated with the reservation. In order to remove a flag with the update option, precede the name with a minus sign. For example: Flags=-DAILY (NOTE: this option is not supported for all flags). Currently supported flags include:
- LICENSE_ONLY
- This is a reservation for licenses only and not compute nodes. If this flag is set, a job using this reservation may use the associated licenses and any compute nodes. If this flag is not set, a job using this reservation may use only the nodes and licenses associated with the reservation.
- MAINT
- Maintenance mode, receives special accounting treatment. This partition is permitted to use resources that are already in another reservation.
- OVERLAP
- This reservation can be allocated resources that are already in another reservation.
- IGNORE_JOBS
- Ignore currently running jobs when creating the reservation. This can be especially useful when reserving all nodes in the system for maintenance.
- DAILY
- Repeat the reservation at the same time every day
- WEEKLY
- Repeat the reservation at the same time every week
- SPEC_NODES
- Reservation is for specific nodes (output only)
- Features=<features>
- Set the reservation's required node features. Multiple
values may be "&" separated if all features are required
(AND operation) or separated by "|" if any of the specified
features are required (OR operation). Value may be cleared with blank data
value, "Features=".
- Users=<user list>
- List of users permitted to use the reserved nodes. E.g.
Users=jones1,smith2. A new reservation must specify Users and/or Accounts.
- SPECIFICATIONS FOR UPDATE, BLOCK
- Bluegene systems only!
- BlockName=<name>
- Identify the bluegene block to be updated. This specification is required.
- State=<free|error|remove>
- This will update the state of a bluegene block to either FREE or ERROR. (i.e. update BlockName=RMP0 STATE=ERROR) State error will not allow jobs to run on the block. WARNING!!!! This will cancel any running job on the block! On dynamically laid out systems REMOVE will free and remove the block from the system. If the block is smaller than a midplane every block on that midplane will be removed.
- SubBPName=<name>
- Identify the bluegene ionodes to be updated (i.e. bg000[0-3]). This specification is required.
ENVIRONMENT VARIABLES¶
Some scontrol options may be set via environment variables. These environment variables, along with their corresponding options, are listed below. (Note: Commandline options will always override these settings.)- SCONTROL_ALL
- -a, --all
- SLURM_CLUSTERS
- Same as --clusters
- SLURM_CONF
- The location of the SLURM configuration file.
- SLURM_TIME_FORMAT
- Specify the format used to report time stamps. A value of
standard, the default value, generates output in the form
"year-month-dateThour:minute:second". A value of relative
returns only "hour:minute:second" if the current day. For other
dates in the current year it prints the "hour:minute" preceded
by "Tomorr" (tomorrow), "Ystday" (yesterday), the name
of the day for the coming week (e.g. "Mon", "Tue",
etc.), otherwise the date (e.g. "25 Apr"). For other years it
returns a date month and year without a time (e.g. "6 Jun
2012"). Another suggested value is "%a %T" for a day of
week and time stamp (e.g. "Mon 12:34:56"). All of the time
stamps use a 24 hour format.
AUTHORIZATION¶
When using the SLURM db, users who have AdminLevel's defined (Operator or Admin) and users who are account coordinators are given the authority to view and modify jobs, reservations, nodes, etc., as defined in the following table - regardless of whether a PrivateData restriction has been defined in the slurm.conf file.EXAMPLES¶
# scontrolAllocNodes=ALL AllowGroups=ALL Default=YES
DefaultTime=NONE DisableRootJobs=NO Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
Nodes=snowflake[0-48]
Priority=1 RootOnly=NO Shared=YES:4
State=UP TotalCPUs=694 TotalNodes=49
UserId=da(1000) GroupId=da(1000)
Priority=66264 Account=none QOS=normal WCKey=*123
JobState=COMPLETED Reason=None Dependency=(null)
TimeLimit=UNLIMITED Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
SubmitTime=2010-01-05T10:58:40 EligibleTime=2010-01-05T10:58:40
StartTime=2010-01-05T10:58:40 EndTime=2010-01-05T10:58:40
SuspendTime=None SecsPreSuspend=0
Partition=debug AllocNode:Sid=snowflake:4702
ReqNodeList=(null) ExcNodeList=(null)
NodeList=snowflake0
NumNodes=1 NumCPUs=10 CPUs/Task=2 ReqS:C:T=1:1:1
MinCPUsNode=2 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
COPYING¶
Copyright (C) 2002-2007 The Regents of the University of California. Copyright (C) 2008-2010 Lawrence Livermore National Security. Portions Copyright (C) 2010 SchedMD <http://www.schedmd.com>. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All rights reserved. This file is part of SLURM, a resource management program. For details, see <http://www.schedmd.com/slurmdocs/>. SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.FILES¶
/etc/slurm.confSEE ALSO¶
scancel(1), sinfo(1), squeue(1), slurm_checkpoint (3), slurm_create_partition (3), slurm_delete_partition (3), slurm_load_ctl_conf (3), slurm_load_jobs (3), slurm_load_node (3), slurm_load_partitions (3), slurm_reconfigure (3), slurm_requeue (3), slurm_resume (3), slurm_shutdown (3), slurm_suspend (3), slurm_takeover (3), slurm_update_job (3), slurm_update_node (3), slurm_update_partition (3), slurm.conf(5), slurmctld(8)July 2011 | scontrol 2.3 |