NAME¶
cgroup.conf - Slurm configuration file for the cgroup support
DESCRIPTION¶
cgroup.conf is an ASCII file which defines parameters used by Slurm's
Linux cgroup related plugins. The file location can be modified at system
build time using the DEFAULT_SLURM_CONF parameter or at execution time by
setting the SLURM_CONF environment variable. The file will always be located
in the same directory as the
slurm.conf file.
Parameter names are case insensitive. Any text following a "#" in the
configuration file is treated as a comment through the end of that line. The
size of each line in the file is limited to 1024 characters. Changes to the
configuration file take effect upon restart of SLURM daemons, daemon receipt
of the SIGHUP signal, or execution of the command "scontrol
reconfigure" unless otherwise noted.
Two cgroup plugins are currently available in SLURM. The first one is a
proctrack plugin, the second one a task plugin.
The following cgroup.conf parameters are defined to control the general behavior
of Slurm cgroup plugins.
- CgroupMountpoint=PATH
- Specify the PATH under which cgroups should be
mounted. This should be a writable directory which will contain cgroups
mounted one per subsystem. The default PATH is /cgroup.
- CgroupAutomount=<yes|no>
- Slurm cgroup plugins require valid and functional cgroup
subsystem to be mounted under /cgroup/<subsystem_name>. When
launched, plugins check their subsystem availability. If not available,
the plugin launch fails unless CgroupAutomount is set to yes. In that
case, the plugin will first try to mount the required subsystems.
- CgroupReleaseAgentDir=<path_to_release_agent_directory>
- Used to tune the cgroup system behavior. This parameter
identifies the location of the directory containing Slurm cgroup
release_agent files. A release_agent file is required for each mounted
subsystem. The release_agent file name must have the following format:
release_<subsystem_name>. For instance, the release_agent file for
the cpuset subsystem must be named release_cpuset. See also CLEANUP OF
CGROUPS below.
PROCTRACK/CGROUP PLUGIN¶
Slurm
proctrack/cgroup plugin is used to track processes using the
freezer control group subsystem. It creates a hierarchical set of directories
for each step, putting the step tasks into the leaf.
This directory structure is like the following:
/cgroup/freezer/uid_%uid/job_%jobid/step_%stepid
Slurm cgroup proctrack plugin is enabled with the following parameter in
slurm.conf:
ProctrackType=proctrack/cgroup
No particular cgroup.conf parameter is defined to control the behavior of this
particular plugin.
TASK/CGROUP PLUGIN¶
Slurm
task/cgroup plugin is used to enforce allocated resources
constraints, thus avoiding tasks to use unallocated resources. It currently
only uses cpuset subsystem but could use memory and devices subsystems in a
near future too.
It creates a hierarchical set of directories for each task and subsystem. The
directory structure is like the following:
/cgroup/%subsys/uid_%uid/job_%jobid/step_%stepid/task_%taskid
Slurm cgroup task plugin is enabled with the following parameter in slurm.conf:
TaskPlugin=task/cgroup
The following cgroup.conf parameters are defined to control the behavior of this
particular plugin:
- ConstrainCores=<yes|no>
- If configured to "yes" then constrain allowed
cores to the subset of allocated resources. It uses the cpuset subsystem.
The default value is "no".
- TaskAffinity=<yes|no>
- If configured to "yes" then set a default task
affinity to bind each step task to a subset of the allocated cores using
sched_setaffinity. The default value is "no".
The following cgroup.conf parameters could be defined to control the behavior of
this particular plugin in a next version where memory and devices support
would be added :
- AllowedRAMSpace=<number>
- Constrain the job cgroup RAM to this percentage of the
allocated memory. The default value is 100. If SLURM is not allocating
memory to jobs, The percentage supplied may be expressed as floating point
number, e.g. 98.5. If the AllowedRAMSpace limit is exceeded, the
job steps will be killed and a warning message will be written to standard
error. Also see ConstrainRAMSpace.
- AllowedSwapSpace=<number>
- Constrain the job cgroup swap space to this percentage of
the allocated memory. The default value is 0, which means that RAM+Swap
will be limited to AllowedRAMSpace. The supplied percentage may be
expressed as a floating point number, e.g. 50.5. If the limit is exceeded,
the job steps will be killed and a warning message will be written to
standard error. Also see ConstrainSwapSpace.
- ConstrainRAMSpace=<yes|no>
- If configured to "yes" then constrain the job's
RAM usage. The default value is "no". Also see
AllowedRAMSpace.
- ConstrainSwapSpace=<yes|no>
- If configured to "yes" then constrain the job's
swap space usage. The default value is "no". Also see
AllowedSwapSpace.
- MaxRAMPercent=PERCENT
- Set an upper bound in percent of total RAM on the RAM
constraint for a job. This will be the memory constraint applied to jobs
that are not explicitly allocated memory by SLURM. The PERCENT may
be an arbitrary floating point number. The default value is 100.
- MaxSwapPercent=PERCENT
- Set an upper bound (in percent of total RAM) on the amount
of RAM+Swap that may be used for a job. This will be the swap limit
applied to jobs on systems where memory is not being explicitly allocated
to job. The PERCENT may be an arbitrary floating point number
between 0 and 100. The default value is 100.
- MinRAMSpace=<number>
- Set a lower bound (in MB) on the memory limits defined by
AllowedRAMSpace and AllowedSwapSpace. This prevents
accidentally creating a memory cgroup with such a low limit that
slurmstepd is immediately killed due to lack of RAM. The default limit is
30M.
- ConstrainDevices=<yes|no>
- If configured to "yes" then constrain the job's
allowed devices based on GRES allocated resources. It uses the devices
subsystem for that. The default value is "no".
- AllowedDevicesFile=<path_to_allowed_devices_file>
- If the ConstrainDevices field is set to "yes"
then this file has to be used to declare the devices that need to be
allowed by default for all the jobs. The current implementation of cgroup
devices subsystem works as a whitelist of entries, which means that in
order to isolate the access of a job upon particular devices we need to
allow the access on all the devices, supported by default and then deny on
those that the job does not have the permission to use. The default value
is "/etc/slurm/cgroup_allowed_devices_file.conf". The syntax of
the file accepts one device per line and it permits lines like /dev/sda*
or /dev/cpu/*/*. See also an example of this file in
etc/allowed_devices_file.conf.example.
EXAMPLE¶
###
# Slurm cgroup support configuration file
###
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
#
NOTES¶
Only one instance of a cgroup subsystem is valid at a time in the kernel. If you
try to mount another cgroup hierarchy that uses the same cpuset subsystem it
will fail. However you can mount another cgroup hierarchy for a different
cpuset subsystem.
CLEANUP OF CGROUPS¶
To allow cgroups to be removed automatically when they are no longer in use the
notify_on_release flag is set in each cgroup when the cgroup is instantiated.
The release_agent file for each subsystem is set up when the subsystem is
mounted. The name of each release_agent file is release_<subsystem
name>. The directory is specified via the CgroupReleaseAgentDir parameter
in cgroup.conf. A simple release agent mechanism to remove slurm cgroups when
they become empty may be set up by creating the release agent files for each
required subsystem as symbolic links to a common release agent script, as
shown in the example below:
[sulu] (slurm) etc> cat cgroup.conf | grep CgroupReleaseAgentDir
CgroupReleaseAgentDir="/etc/slurm/cgroup"
[sulu] (slurm) etc> ls -al /etc/slurm/cgroup
total 12
drwxr-xr-x 2 root root 4096 2010-04-23 14:55 .
drwxr-xr-x 4 root root 4096 2010-07-22 14:48 ..
-rwxrwxrwx 1 root root 234 2010-04-23 14:52 release_common
lrwxrwxrwx 1 root root 32 2010-04-23 11:04 release_cpuset ->
/etc/slurm/cgroup/release_common
lrwxrwxrwx 1 root root 32 2010-04-23 11:03 release_freezer ->
/etc/slurm/cgroup/release_common
[sulu] (slurm) etc> cat /etc/slurm/cgroup/release_common
#!/bin/bash
base_path=/cgroup
progname=$(basename $0)
subsystem=${progname##*_}
rmcg=${base_path}/${subsystem}$@
uidcg=${rmcg%/job*}
if [[ -d ${base_path}/${subsystem} ]]
then
flock -x ${uidcg} -c "rmdir ${rmcg}"
fi
[sulu] (slurm) etc>
COPYING¶
Copyright (C) 2010 Lawrence Livermore National Security. Produced at Lawrence
Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All rights
reserved.
This file is part of SLURM, a resource management program. For details, see
<
http://www.schedmd.com/slurmdocs/>.
SLURM is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
SEE ALSO¶
slurm.conf(5)