NAME¶
gres.conf - Slurm configuration file for generic resource management.
DESCRIPTION¶
gres.conf is an ASCII file which describes the configuration of generic
resources on each compute node. Each node must contain a gres.conf file if
generic resources are to be scheduled by SLURM. The file location can be
modified at system build time using the DEFAULT_SLURM_CONF parameter or at
execution time by setting the SLURM_CONF environment variable. The file will
always be located in the same directory as the
slurm.conf file. If
generic resource counts are set by the gres plugin function
node_config_load(), this file may be optional.
Parameter names are case insensitive. Any text following a "#" in the
configuration file is treated as a comment through the end of that line.
Changes to the configuration file take effect upon restart of SLURM daemons,
daemon receipt of the SIGHUP signal, or execution of the command
"scontrol reconfigure" unless otherwise noted.
The overall configuration parameters available include:
- Count
- Number of resources of this type available on this node. The default value
is set to the number of File values specified (if any), otherwise
the default value is one. A suffix of "K", "M" or
"G" may be used to multiply the number by 1024, 1048576 or
1073741824 respectively. Note that Count is a 32-bit field and the maximum
value is 4,294,967,295.
- CPUs
- Specify the CPU index numbers for the specific CPUs which can use this
resource. For example, it may be strongly preferable to use specific CPUs
with specific devices (e.g. on a NUMA architecture). Multiple CPUs may be
specified using a comma delimited list or a range may be specified using a
"-" separator (e.g. "0,1,2,3" or "0-3"). If
specified, then only the identified CPUs can be allocated with each
generic resource; an attempt to use other CPUs will not be honored. If not
specified, then any CPU can be used with the resources. If any CPU can be
effectively used with the resources, then do not specify the CPUs
option for improved speed in the SLURM scheduling logic.
Since SLURM must be able to perform resource management on heterogeneous
clusters having various CPU ID numbering schemes, use the SLURM CPU index
numbers here (CPU_ID = Board_ID x threads_per_board + Socket_ID x
threads_per_socket + Core_ID x threads_per_core + Thread_ID).
- File
- Fully qualified pathname of the device files associated with a resource.
The file name parsing logic does not include any support for regular
expressions. Each file must be explicitly named, one per line. This field
is generally required if enforcement of generic resource allocations is to
be supported (i.e. prevents a users from making use of resources allocated
to a different user). If File is specified then Count must
be either set to the number of file names specified or not set (the
default value is the number of files specified). Slurm must track the
utilization of each individual device If device file names are specified,
which involves more overhead than just tracking the device counts. Use the
File parameter only if the Count is not sufficient for
tracking purposes. NOTE: If you specify the File parameter for a
resource on some node, the option must be specified on all nodes and SLURM
will track the assignment of each specific resource on each node.
Otherwise SLURM will only track a count of allocated resources rather than
the state of each individual device file.
- Name
- Name of the generic resource. Any desired name may be used. Each generic
resource has an optional plugin which can provide resource-specific
options. Generic resources that currently include an optional plugin
are:
- gpu
- Graphics Processing Unit
- nic
- Network Interface Card
- mic
- Intel Many Integrated Core (MIC) processor
- NodeName
- An optional NodeName specification can be used to permit one gres.conf
file to be used for all compute nodes in a cluster by specifying the
node(s) that each line should apply to. The NodeName specification can use
a Slurm hostlist specification as shown in the example below.
EXAMPLES¶
##################################################################
# SLURM's Generic Resource (GRES) configuration file
##################################################################
# Configure support for our four GPUs
Name=gpu File=/dev/nvidia0 CPUs=0,1
Name=gpu File=/dev/nvidia1 CPUs=0,1
Name=gpu File=/dev/nvidia2 CPUs=2,3
Name=gpu File=/dev/nvidia3 CPUs=2,3
Name=bandwidth Count=20M
##################################################################
# SLURM's Generic Resource (GRES) configuration file
# Use a single gres.conf file for all compute nodes
##################################################################
NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
NodeName=tux[16-31] Name=gpu File=/dev/nvidia[0-7]
COPYING¶
Copyright (C) 2010 The Regents of the University of California. Produced at
Lawrence Livermore National Laboratory (cf, DISCLAIMER).
Copyright (C) 2010-2013 SchedMD LLC.
This file is part of SLURM, a resource management program. For details, see
<
http://slurm.schedmd.com/>.
SLURM is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
SEE ALSO¶
slurm.conf(5)