NAME¶
slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file
DESCRIPTION¶
slurmdb.conf is an ASCII file which describes Slurm Database Daemon
(SlurmDBD) configuration information. The file location can be modified at
system build time using the DEFAULT_SLURM_CONF parameter or at execution time
by setting the SLURM_CONF environment variable.
The contents of the file are case insensitive except for the names of nodes and
files. Any text following a "#" in the configuration file is treated
as a comment through the end of that line. The size of each line in the file
is limited to 1024 characters. Changes to the configuration file take effect
upon restart of SlurmDbd or daemon receipt of the SIGHUP signal unless
otherwise noted.
This file should be only on the computer where SlurmDBD executes and should only
be readable by the user which executes SlurmDBD (e.g. "slurm"). This
file should be protected from unauthorized access since it contains a database
password. The overall configuration parameters available include:
- ArchiveDir
- If ArchiveScript is not set the slurmdbd will generate a
file that can be read in anytime with sacctmgr load filename. This
directory is where the file will be placed archive has ran. Default is
/tmp. The format for this files name is
$ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeStamp_$endTimeStamp
- ArchiveEvents
- When purging events also achive them. Boolean, yes to
archive event data, no other wise. Default is no.
- ArchiveJobs
- When purging jobs also achive them. Boolean, yes to archive
job data, no other wise. Default is no.
- ArchiveScript
- This script can be executed every time a rollup happens
(every hour, day and month), depending on the Purge*After options. This
script is used to transfer accounting records out of the database into an
archive. It is used in place of the internal process used to acrhive
objects. The script is executed with a no arguments, The following
environment variables are set.
- SLURM_ARCHIVE_EVENTS
- 1 for archive events 0 otherwise.
- SLURM_ARCHIVE_LAST_EVENT
- Time of last event start to archive.
- SLURM_ARCHIVE_JOBS
- 1 for achive jobs 0 otherwise.
- SLURM_ARCHIVE_LAST_JOB
- Time of last job submit to archive.
- SLURM_ARCHIVE_STEPS
- 1 for archive steps 0 otherwise.
- SLURM_ARCHIVE_LAST_STEP
- Time of last step start to archive.
- SLURM_ARCHIVE_SUSPEND
- 1 for archive suspend data 0 otherwise.
- SLURM_ARCHIVE_LAST_SUSPEND
- Time of last suspend start to archive.
- ArchiveSteps
- When purging steps also achive them. Boolean, yes to
archive step data, no other wise. Default is no.
- ArchiveSuspend
- When purging suspend data also achive it. Boolean, yes to
archive suspend data, no other wise. Default is no.
- AuthInfo
- Additional information to be used for authentication of
communications with the Slurm control daemon (slurmctld) on each cluster.
The interpretation of this option is specific to the configured
AuthType. In the case of auth/munge, this can be configured
to use a Munge daemon specifically configured to provide authentication
between clusters while the default Munge daemon provides authentication
within a cluster. In that case, this will specify the pathname of the
socket to use. Per default this value is left unspecified, which results
in the default authentication mechanism being used.
- AuthType
- Define the authentication method for communications between
SLURM components. Acceptable values at present include
"auth/none", "auth/authd", and "auth/munge".
The default value is "auth/none", which means the UID included
in communication messages is not verified. This may be fine for testing
purposes, but do not use "auth/none" if you desire any
security. "auth/authd" indicates that Brett Chun's authd is
to be used (see "http://www.theether.org/authd/" for more
information). "auth/munge" indicates that LLNL's Munge system is
to be used (this is the best supported authentication mechanism for SLURM,
see "http://home.gna.org/munge/" for more information). SlurmDbd
must be terminated prior to changing the value of AuthType and
later restarted.
- DbdBackupHost
- The name of the machine where the backup Slurm Database
Daemon is executed. This host must have access to the same underlying
database specified by the 'Storage' options mentioned below. This should
be a node name without the full domain name. I.e., the hostname returned
by the gethostname() function cut at the first dot (e.g. use
"tux001" rather than "tux001.my.com").
- DbdHost
- The name of the machine where the Slurm Database Daemon is
executed. This should be a node name without the full domain name. I.e.,
the hostname returned by the gethostname() function cut at the
first dot (e.g. use "tux001" rather than
"tux001.my.com"). This value must be specified.
- DbdPort
- The port number that the Slurm Database Daemon (slurmdbd)
listens to for work. The default value is SLURMDBD_PORT as established at
system build time. If none is explicitly specified, it will be set to
6819. This value must be equal to the AccountingStoragePort
parameter in the slurm.conf file.
- DebugLevel
- The level of detail to provide the Slurm Database Daemon's
logs. Values from 0 to 9 are legal, with `0' being "quiet"
operation and `9' being insanely verbose. The default value is 3.
- DefaultQOS
- When adding a new cluster this will be used as the qos for
the cluster unless something is explicitly set by the admin with the
create.
- LogFile
- Fully qualified pathname of a file into which the Slurm
Database Daemon's logs are written. The default value is none (performs
logging via syslog).
See the section LOGGING in the slurm.conf man page if a pathname is
specified.
- MessageTimeout
- Time permitted for a round-trip communication to complete
in seconds. Default value is 10 seconds.
- PidFile
- Fully qualified pathname of a file into which the Slurm
Database Daemon may write its process ID. This may be used for automated
signal processing. The default value is "/var/run/slurmdbd.pid".
- PluginDir
- Identifies the places in which to look for SLURM plugins.
This is a colon-separated list of directories, like the PATH environment
variable. The default value is "/usr/local/lib/slurm".
- PrivateData
- This controls what type of information is hidden from
regular users. By default, all information is visible to all users. User
SlurmUser, root, and users with AdminLevel=Admin can always
view all information. Multiple values may be specified with a comma
separator. Acceptable values include:
- accounts
- prevents users from viewing any account definitions unless
they are coordinators of them.
- jobs
- prevents users from viewing job records belonging to other
users unless they are coordinators of the association running the job when
using sacct.
- reservations
- restricts getting reservation information to users with
operator status and above.
- usage
- prevents users from viewing usage of any other user. This
applys to sreport.
- users
- prevents users from viewing information of any user other
than themselves, this also makes it so users can only see associations
they deal with. Coordinators can see associations of all users they are
coordinator of, but can only see themselves when listing users.
- PurgeEventAfter
- Events happening on the cluster over this age are purged
from the database. This includes node down times and such. The time is a
numeric value and is a number of months. If you want to purge more often
you can include hours, or days behind the numeric value to get those more
frequent purges. (i.e. a value of '12hours' would purge everything older
than 12 hours.) If not set (default), then job step records are never
purged.
- PurgeJobAfter
- Individual job records over this age are purged from the
database. Aggregated information will be preserved indefinitely. The time
is a numeric value and is a number of months. If you want to purge more
often you can include hours, or days behind the numeric value to get those
more frequent purges. (i.e. a value of '12hours' would purge everything
older than 12 hours.) If not set (default), then job records are never
purged.
- PurgeStepAfter
- Individual job step records over this age are purged from
the database. Aggregated information will be preserved indefinitely. The
time is a numeric value and is a number of months. If you want to purge
more often you can include hours, or days behind the numeric value to get
those more frequent purges. (i.e. a value of '12hours' would purge
everything older than 12 hours.) If not set (default), then job step
records are never purged.
- PurgeSuspendAfter
- Records of individual suspend times for jobs over this age
are purged from the database. Aggregated information will be preserved
indefinitely. The time is a numeric value and is a number of months. If
you want to purge more often you can include hours, or days behind the
numeric value to get those more frequent purges. (i.e. a value of
'12hours' would purge everything older than 12 hours.) If not set
(default), then job step records are never purged.
- SlurmUser
- The name of the user that the slurmctld daemon
executes as. This user must exist on the machine executing the Slurm
Database Daemon and have the same user ID as the hosts on which
slurmctld execute. For security purposes, a user other than
"root" is recommended. The default value is "root".
- StorageHost
- Define the name of the host the database is running where
we are going to store the data. Ideally this should be the host on which
slurmdbd executes.
- StorageBackupHost
- Define the name of the backup host the database is running
where we are going to store the data. This can be viewed as a backup
solution when the StorageHost is not responding. It is up to the backup
solution to enforce the coherency of the accounting information between
the two hosts. With clustered database solutions (active/passive HA), you
would not need to use this feature. Default is none.
- StorageLoc
- Specify the name of the database as the location where
accounting records are written.
- StoragePass
- Define the password used to gain access to the database to
store the job accounting data.
- StoragePort
- The port number that the Slurm Database Daemon (slurmdbd)
communicates with the database.
- StorageType
- Define the accounting storage mechanism type. Acceptable
values at present include "accounting_storage/gold",
"accounting_storage/mysql", and
"accounting_storage/pgsql". The value
"accounting_storage/gold" indicates that account records will be
written to Gold
(http://www.clusterresources.com/pages/products/gold-allocation-manager.php),
which maintains its own database. The value
"accounting_storage/mysql" indicates that accounting records
should be written to a MySQL database specified by the torageLoc
parameter. The value "accounting_storage/pgsql" indicates that
accounting records should be written to a PostgreSQL database specified by
the StorageLoc parameter. This plugin is not complete and should
not be used if wanting to use associations. It will however work with
basic accounting of jobs and job steps. If interested in completing please
email slurm-dev@lists.llnl.gov. This value must be specified.
- StorageUser
- Define the name of the user we are going to connect to the
database with to store the job accounting data.
- TrackWCKey
- Boolean yes or no. Used to set display and track of the
Workload Characterization Key. Must be set to track wckey usage.
- TrackSlurmctldDown
- Boolean yes or no. If set the slurmdbd will mark all idle
resources on the cluster as down when a slurmctld disconnects or is no
longer reachable. The default is no.
EXAMPLE¶
#
# Sample /etc/slurmdbd.conf
#
ArchiveEvents=yes
ArchiveJobs=yes
ArchiveSteps=no
ArchiveSuspend=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost=db_host
DebugLevel=4
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
LogFile=/var/log/slurmdbd.log
PidFile=/var/tmp/jette/slurmdbd.pid
SlurmUser=slurm_mgr
StoragePass=shazaam
StorageType=accounting_storage/mysql
StorageUser=database_mgr
COPYING¶
Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced at
Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All
rights reserved.
This file is part of SLURM, a resource management program. For details, see
<
http://www.schedmd.com/slurmdocs/>.
SLURM is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
FILES¶
/etc/slurmdbd.conf
SEE ALSO¶
slurm.conf(5),
slurmctld(8),
slurmdbd(8) syslog
(2)