NAME¶
SPANK - SLURM Plug-in Architecture for Node and job (K)control
DESCRIPTION¶
This manual briefly describes the capabilities of the SLURM Plug-in architecture
for Node and job Kontrol (
SPANK) as well as the
SPANK
configuration file: (By default:
plugstack.conf.)
SPANK provides a very generic interface for stackable plug-ins which may
be used to dynamically modify the job launch code in SLURM.
SPANK
plugins may be built without access to SLURM source code. They need only be
compiled against SLURM's
spank.h header file, added to the
SPANK
config file
plugstack.conf, and they will be loaded at runtime during
the next job launch. Thus, the
SPANK infrastructure provides
administrators and other developers a low cost, low effort ability to
dynamically modify the runtime behavior of SLURM job launch.
SPANK PLUGINS¶
SPANK plugins are loaded in up to three separate contexts during a
SLURM job. Briefly, the three contexts are:
- local
- In local context, the plugin is loaded by
srun. (i.e. the "local" part of a parallel job).
- remote
- In remote context, the plugin is loaded by
slurmd. (i.e. the "remote" part of a parallel job).
- allocator
- In allocator context, the plugin is loaded in one of
the job allocation utilities sbatch or salloc.
In local context, only the
init,
exit,
init_post_opt, and
user_local_init functions are called. In allocator context, only the
init,
exit, and
init_post_opt functions are called.
Plugins may query the context in which they are running with the
spank_context and
spank_remote functions defined in
<slurm/spank.h>.
SPANK plugins may be called from multiple points during the SLURM job
launch. A plugin may define the following functions:
- slurm_spank_init
- Called just after plugins are loaded. In remote context,
this is just after job step is initialized. This function is called before
any plugin option processing.
- slurm_spank_init_post_opt
- Called at the same point as slurm_spank_init, but
after all user options to the plugin have been processed. The reason that
the init and init_post_opt callbacks are separated is so
that plugins can process system-wide options specified in plugstack.conf
in the init callback, then process user options, and finally take
some action in slurm_spank_init_post_opt if necessary.
- slurm_spank_local_user_init
- Called in local (srun) context only after all
options have been processed. This is called after the job ID and step IDs
are available. This happens in srun after the allocation is made,
but before tasks are launched.
- slurm_spank_user_init
- Called after privileges are temporarily dropped. (remote
context only)
- slurm_spank_task_init_privileged
- Called for each task just after fork, but before all
elevated privileges are dropped. (remote context only)
- slurm_spank_task_init
- Called for each task just before execve (2). (remote
context only)
- slurm_spank_task_post_fork
- Called for each task from parent process after fork (2) is
complete. Due to the fact that slurmd does not exec any tasks until
all tasks have completed fork (2), this call is guaranteed to run before
the user task is executed. (remote context only)
- slurm_spank_task_exit
- Called for each task as its exit status is collected by
SLURM. (remote context only)
- slurm_spank_exit
- Called once just before slurmstepd exits in remote
context. In local context, called before srun exits.
All of these functions have the same prototype, for example:
int slurm_spank_init (spank_t spank, int ac, char *argv[])
Where
spank is the
SPANK handle which must be passed back to SLURM
when the plugin calls functions like
spank_get_item and
spank_getenv. Configured arguments (See
CONFIGURATION below) are
passed in the argument vector
argv with argument count
ac.
SPANK plugins can query the current list of supported slurm_spank symbols
to determine if the current version supports a given plugin hook. This may be
useful because the list of plugin symbols may grow in the future. The query is
done using the
spank_symbol_supported function, which has the following
prototype:
int spank_symbol_supported (const char *sym);
The return value is 1 if the symbol is supported, 0 if not.
SPANK plugins do not have direct access to internally defined SLURM data
structures. Instead, information about the currently executing job is obtained
via the
spank_get_item function call.
spank_err_t spank_get_item (spank_t spank, spank_item_t item, ...);
The
spank_get_item call must be passed the current
SPANK handle as
well as the item requested, which is defined by the passed
spank_item_t. A variable number of pointer arguments are also passed,
depending on which item was requested by the plugin. A list of the valid
values for
item is kept in the
spank.h header file. Some
examples are:
- S_JOB_UID
- User id for running job. (uid_t *) is third arg of
spank_get_item
- S_JOB_STEPID
- Job step id for running job. (uint32_t *) is third arg of
spank_get_item.
- S_TASK_EXIT_STATUS
- Exit status for exited task. Only valid from
slurm_spank_task_exit. (int *) is third arg of
spank_get_item.
- S_JOB_ARGV
- Complete job command line. Third and fourth args to
spank_get_item are (int *, char ***).
See
spank.h for more details, and
EXAMPLES below for an example of
spank_get_item usage.
SPANK plugins may also use the
spank_getenv,
spank_setenv,
and
spank_unsetenv functions to view and modify the job's environment.
spank_getenv searches the job's environment for the environment
variable
var and copies the current value into a buffer
buf of
length
len.
spank_setenv allows a
SPANK plugin to set or
overwrite a variable in the job's environment, and
spank_unsetenv
unsets an environment variable in the job's environment. The prototypes are:
spank_err_t spank_getenv (spank_t spank, const char *var,
char *buf, int len);
spank_err_t spank_setenv (spank_t spank, const char *var,
const char *val, int overwrite);
spank_err_t spank_unsetenv (spank_t spank, const char *var);
These are only necessary in remote context since modifications of the standard
process environment using
setenv (3),
getenv (3), and
unsetenv (3) may be used in local context.
Functions are also available from within the
SPANK plugins to establish
environment variables to be exported to the SLURM
PrologSlurmctld,
Prolog,
Epilog and
EpilogSlurmctld programs (the
so-called
job control environment). The name of environment variables
established by these calls will be prepended with the string
SPANK_ in
order to avoid any security implications of arbitrary environment variable
control. (After all, the job control scripts do run as root or the SLURM
user.).
These functions are available from
local context only.
spank_err_t spank_job_control_getenv(spank_t spank, const char *var,
char *buf, int len);
spank_err_t spank_job_control_setenv(spank_t spank, const char *var,
const char *val, int overwrite);
spank_err_t spank_job_control_unsetenv(spank_t spank, const char *var);
See
spank.h for more information, and
EXAMPLES below for an
example for
spank_getenv usage.
Many of the described
SPANK functions available to plugins return errors
via the
spank_err_t error type. On success, the return value will be
set to
ESPANK_SUCCESS, while on failure, the return value will be set
to one of many error values defined in slurm/spank.h. The
SPANK
interface provides a simple function
const char * spank_strerror(spank_err_t err);
which may be used to translate a
spank_err_t value into its string
representation.
SPANK OPTIONS¶
SPANK plugins also have an interface through which they may define and implement
extra job options. These options are made available to the user through SLURM
commands such as
srun(1),
salloc(1), and
sbatch(1). if
the option is specified by the user, its value is forwarded and registered
with the plugin in slurmd when the job is run. In this way,
SPANK
plugins may dynamically provide new options and functionality to SLURM.
Each option registered by a plugin to SLURM takes the form of a
struct
spank_option which is declared in
<slurm/spank.h> as
struct spank_option {
char * name;
char * arginfo;
char * usage;
int has_arg;
int val;
spank_opt_cb_f cb;
};
Where
- name
- is the name of the option. Its length is limited to
SPANK_OPTION_MAXLEN defined in <slurm/spank.h>.
- arginfo
- is a description of the argument to the option, if the
option does take an argument.
- usage
- is a short description of the option suitable for --help
output.
- has_arg
- 0 if option takes no argument, 1 if option takes an
argument, and 2 if the option takes an optional argument. (See
getopt_long (3)).
- val
- A plugin-local value to return to the option callback
function.
- cb
- A callback function that is invoked when the plugin option
is registered with SLURM. spank_opt_cb_f is typedef'd in
<slurm/spank.h> as
typedef int (*spank_opt_cb_f) (int val, const char *optarg,
int remote);
Where val is the value of the val field in the
spank_option struct, optarg is the supplied argument if
applicable, and remote is 0 if the function is being called from
the "local" host (e.g. srun) or 1 from the
"remote" host ( slurmd).
Plugin options may be registered with SLURM using the
spank_option_register function. This function is only valid when called
from the plugin's
slurm_spank_init handler, and registers one option at
a time. The prototype is
spank_err_t spank_option_register (spank_t sp,
struct spank_option *opt);
This function will return
ESPANK_SUCCESS on successful registration of an
option, or
ESPANK_BAD_ARG for errors including invalid spank_t handle,
or when the function is not called from the
slurm_spank_init function.
All options need to be registered from all contexts in which they will be
used. For instance, if an option is only used in local (srun) and remote
(slurmd) contexts, then
spank_option_register should only be called
from within those contexts. For example:
if (spank_context() != S_CTX_ALLOCATOR)
spank_option_register (sp, opt);
If, however, the option is used in all contexts, the
spank_option_register needs to be called everywhere.
In addition to
spank_option_register, plugins may also export options to
SLURM by defining a table of
struct spank_option with the symbol name
spank_options. This method, however, is not supported for use with
sbatch and
salloc (allocator context), thus the use of
spank_option_register is preferred. When using the
spank_options
table, the final element in the array must be filled with zeros. A
SPANK_OPTIONS_TABLE_END macro is provided in
<slurm/spank.h> for this purpose.
When an option is provided by the user on the local side,
SLURM will
immediately invoke the option's callback with
remote=0. This is meant
for the plugin to do local sanity checking of the option before the value is
sent to the remote side during job launch. If the argument the user specified
is invalid, the plugin should issue an error and issue a non-zero return code
from the callback.
On the remote side, options and their arguments are registered just after
SPANK plugins are loaded and before the
spank_init handler is
called. This allows plugins to modify behavior of all plugin functionality
based on the value of user-provided options. (See EXAMPLES below for a plugin
that registers an option with
SLURM).
CONFIGURATION¶
The default
SPANK plug-in stack configuration file is
plugstack.conf in the same directory as
slurm.conf(5), though
this may be changed via the SLURM config parameter
PlugStackConfig.
Normally the
plugstack.conf file should be identical on all nodes of
the cluster. The config file lists
SPANK plugins, one per line, along
with whether the plugin is
required or
optional, and any global
arguments that are to be passed to the plugin for runtime configuration.
Comments are preceded with '#' and extend to the end of the line. If the
configuration file is missing or empty, it will simply be ignored.
The format of each non-comment line in the configuration file is:
required/optional plugin arguments
For example:
optional /usr/lib/slurm/test.so
Tells
slurmd to load the plugin
test.so passing no arguments. If a
SPANK plugin is
required, then failure of any of the plugin's
functions will cause
slurmd to terminate the job, while
optional
plugins only cause a warning.
If a fully-qualified path is not specified for a plugin, then the currently
configured
PluginDir in
slurm.conf(5) is searched.
SPANK plugins are stackable, meaning that more than one plugin may be
placed into the config file. The plugins will simply be called in order, one
after the other, and appropriate action taken on failure given that state of
the plugin's
optional flag.
Additional config files or directories of config files may be included in
plugstack.conf with the
include keyword. The
include
keyword must appear on its own line, and takes a glob as its parameter, so
multiple files may be included from one
include line. For example, the
following syntax will load all config files in the /etc/slurm/plugstack.conf.d
directory, in local collation order:
include /etc/slurm/plugstack.conf.d/*
which might be considered a more flexible method for building up a spank plugin
stack.
The
SPANK config file is re-read on each job launch, so editing the
config file will not affect running jobs. However care should be taken so that
a partially edited config file is not read by a launching job.
EXAMPLES¶
Simple
SPANK config file:
#
# SPANK config file
#
# required? plugin args
#
optional renice.so min_prio=-10
required /usr/lib/slurm/test.so
The following is a simple
SPANK plugin to modify the nice value of job
tasks. This plugin adds a --renice=[prio] option to
srun which users
can use to set the priority of all remote tasks. Priority may also be
specified via a SLURM_RENICE environment variable. A minimum priority may be
established via a "min_prio" parameter in
plugstack.conf (See
above for example).
/*
* To compile:
* gcc -shared -o renice.so renice.c
*
*/
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/resource.h>
#include <slurm/spank.h>
/*
* All spank plugins must define this macro for the SLURM plugin loader.
*/
SPANK_PLUGIN(renice, 1);
#define PRIO_ENV_VAR "SLURM_RENICE"
#define PRIO_NOT_SET 42
/*
* Minimum allowable value for priority. May be set globally
* via plugin option min_prio=<prio>
*/
static int min_prio = -20;
static int prio = PRIO_NOT_SET;
static int _renice_opt_process (int val, const char *optarg, int remote);
static int _str2prio (const char *str, int *p2int);
/*
* Provide a --renice=[prio] option to srun:
*/
struct spank_option spank_options[] =
{
{ "renice", "[prio]", "Re-nice job tasks to priority [prio].", 2, 0,
(spank_opt_cb_f) _renice_opt_process
},
SPANK_OPTIONS_TABLE_END
};
/*
* Called from both srun and slurmd.
*/
int slurm_spank_init (spank_t sp, int ac, char **av)
{
int i;
/* Don't do anything in sbatch/salloc
*/
if (spank_context () == S_CTX_ALLOCATOR)
return (0);
for (i = 0; i < ac; i++) {
if (strncmp ("min_prio=", av[i], 9) == 0) {
const char *optarg = av[i] + 9;
if (_str2prio (optarg, &min_prio) < 0)
slurm_error ("Ignoring invalid min_prio value: %s", av[i]);
}
else {
slurm_error ("renice: Invalid option: %s", av[i]);
}
}
if (!spank_remote (sp))
slurm_verbose ("renice: min_prio = %d", min_prio);
return (0);
}
int slurm_spank_task_post_fork (spank_t sp, int ac, char **av)
{
pid_t pid;
int taskid;
if (prio == PRIO_NOT_SET) {
/*
* See if SLURM_RENICE env var is set by user
*/
char val [1024];
if (spank_getenv (sp, PRIO_ENV_VAR, val, 1024) != ESPANK_SUCCESS)
return (0);
if (_str2prio (val, &prio) < 0) {
slurm_error ("Bad value for %s: %s", PRIO_ENV_VAR, optarg);
return (-1);
}
if (prio < min_prio)
slurm_error ("%s=%d not allowed, using min=%d",
PRIO_ENV_VAR, prio, min_prio);
}
if (prio < min_prio)
prio = min_prio;
spank_get_item (sp, S_TASK_GLOBAL_ID, &taskid);
spank_get_item (sp, S_TASK_PID, &pid);
slurm_info ("re-nicing task%d pid %ld to %ld", taskid, pid, prio);
if (setpriority (PRIO_PROCESS, (int) pid, (int) prio) < 0) {
slurm_error ("setpriority: %m");
return (-1);
}
return (0);
}
static int _str2prio (const char *str, int *p2int)
{
long int l;
char *p;
l = strtol (str, &p, 10);
if ((*p != ' ') || (l < -20) || (l > 20))
return (-1);
*p2int = (int) l;
return (0);
}
static int _renice_opt_process (int val, const char *optarg, int remote)
{
if (optarg == NULL) {
slurm_error ("renice: invalid argument!");
return (-1);
}
if (_str2prio (optarg, &prio) < 0) {
slurm_error ("Bad value for --renice: %s", optarg);
return (-1);
}
if (prio < min_prio)
slurm_error ("--renice=%d not allowed, will use min=%d",
prio, min_prio);
return (0);
}
COPYING¶
Copyright (C) 2006 The Regents of the University of California. Produced at
Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All
rights reserved.
This file is part of SLURM, a resource management program. For details, see
<
http://www.schedmd.com/slurmdocs/>.
SLURM is free software; you can redistribute it and/or modify it under the terms
of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
SLURM is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
A PARTICULAR PURPOSE. See the GNU General Public License for more details.
FILES¶
/etc/slurm/slurm.conf - SLURM configuration file.
/etc/slurm/plugstack.conf - SPANK configuration file.
/usr/include/slurm/spank.h - SPANK header file.
SEE ALSO¶
srun(1),
slurm.conf(5)