table of contents
SACCT(1) | Slurm components | SACCT(1) |
NAME¶
sacct - displays accounting data for all jobs and job steps in the SLURM job accounting log or SLURM databaseSYNOPSIS¶
sacct [OPTIONS...]DESCRIPTION¶
Accounting information for jobs invoked with SLURM are either logged in the job accounting log file or saved to the SLURM database. The sacct command displays job accounting data stored in the job accounting log file or SLURM database in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. You can tailor the output with the use of the --format= option to specify the fields to be shown. For the root user, the sacct command displays job accounting data for all users, although there are options to filter the output to report only the jobs from a specified user or group. For the non-root user, the sacct command limits the display of job accounting data to jobs that were launched with their own user identifier (UID) by default. Data for other users can be displayed with the --allusers, --user, or --uid options.- Note:
- If the AccountingStorageType is set to "accounting_storage/filetxt", space characters embedded within account names, job names, and step names will be replaced by underscores. If account names with embedded spaces are needed, it is recommended that a database type of accounting storage be configured.
- Note:
- The content's of SLURM's database are maintained in lower case. This may result in some sacct output differing from that of other SLURM commands.
- Note:
- Much of the data reported by sacct has been generated by the wait3() and getrusage() system calls. Some systems gather and report incomplete information for these calls; sacct reports values of 0 for this missing data. See your systems getrusage (3) man page for information about which data are actually available on your system.
- If --dump is specified, the field selection options (--brief, --format, ...) have no effect.
- Elapsed time fields are presented as 2 fields, integral seconds and integral microseconds
- If --dump is not specified, elapsed time fields are presented as [[days-]hours:]minutes:seconds.hundredths.
- The default input file is the file named in the
AccountingStorageLoc parameter in slurm.conf.
OPTIONS¶
- -a, --allusers
- Displays all users jobs when run by user root or if PrivateData is not configurred to jobs. Otherwise display the current user's jobs
- -A account_list , --accounts =account_list
- Displays jobs when a comma separated list of accounts are given as the argument.
- -b, --brief
- Displays a brief listing, which includes the following data:
- jobid
- status
- exitcode
- This option has no effect when the ---dump option is
also specified.
- -c, --completion
- Use job completion instead of job accounting. The JobCompType parameter in the slurm.conf file must be defined to a non-none option.
- -d, --dump
- Dumps the raw data records.
- --duplicates
- If SLURM job ids are reset, some job numbers will probably
appear more than once in the accounting log file but refer to different
jobs. Such jobs can be distinguished by the "submit" time stamp
in the data records.
- When data for specific jobs are requested with the --jobs
option, sacct returns the most recent job with that number. This
behavior can be overridden by specifying --duplicates, in which case all
records that match the selection criteria will be returned.
- -e, --helpformat
- Print a list of fields that can be specified with the --format option.
Fields available: AllocCPUS Account AssocID AveCPU AvePages AveRSS AveVMSize BlockID Cluster Comment CPUTime CPUTimeRAW DerivedExitCode Elapsed Eligible End ExitCode GID Group JobID JobName Layout MaxPages MaxPagesNode MaxPagesTask MaxRSS MaxRSSNode MaxRSSTask MaxVMSize MaxVMSizeNode MaxVMSizeTask MinCPU MinCPUNode MinCPUTask NCPUS NNodes NodeList NTasks Priority Partition QOS QOSRAW ReqCPUS Reserved ResvCPU ResvCPURAW Start State Submit Suspended SystemCPU Timelimit TotalCPU UID User UserCPU WCKey WCKeyID
- The section titled "Job Accounting Fields"
describes these fields.
- -E end_time, --endtime =end_time
- Select jobs eligible before time. If states are given with
the -s option return jobs in this state before this period.
- -f file, --file=file
- Causes the sacct command to read job accounting data
from the named file instead of the current SLURM job accounting log
file. Only applicable when running the filetxt plugin.
- -g gid_list, --gid=gid_list --group= group_list
- Displays the statistics only for the jobs started with the
GID or the GROUP specified by the gid_list or thegroup_list
operand, which is a comma-separated list. Space characters are not
allowed. Default is no restrictions..
- -h, --help
- Displays a general help message.
- -j job(.step) , --jobs=job(.step)
- Displays information about the specified job(.step) or list of job(.step)s.
- The job(.step) parameter is a comma-separated list of jobs. Space characters are not permitted in this list. NOTE: A step id of 'batch' will display the information about the batch step. The batch step information is only available after the batch job is complete unlike regular steps which are available when they start.
- The default is to display information on all jobs.
- -k, --timelimit-min
- Only send data about jobs with this timelimit. If used with
timelimit_max this will be the minimum timelimit of the range. Default is
no restriction.
- -K, --timelimit-max
- Ignored by itself, but if timelimit_min is set this will be
the maximum timelimit of the range. Default is no restriction.
- -l, --long
- Equivalent to specifying:
- --format=jobid,jobname,partition,maxvmsize,maxvmsizenode,maxvmsizetask,avevmsize,maxrss,maxrssnode,maxrsstask,averss,maxpages,maxpagesnode,maxpagestask,avepages,mincpu,mincpunode,mincputask,avecpu,ntasks,alloccpus,elapsed,state,exitcode
- -L, --allclusters
- Display jobs ran on all clusters. By default, only jobs ran
on the cluster from where sacct is called are displayed.
- -M cluster_list, --clusters=cluster_list
- Displays the statistics only for the jobs started on the
clusters specified by the cluster_list operand, which is a
comma-separated list of clusters. Space characters are not allowed in the
cluster_list. Use -1 for all clusters. The default is current
cluster you are executing the sacct command on.
- -n, --noheader
- No heading will be added to the output. The default action is to display a header.
- This option has no effect when used with the --dump
option.
- -N node_list, --nodelist=node_list
- Display jobs that ran on any of these node(s). node_list can be a ranged string.
- -o, --format
- Comma separated list of fields. (use
"--helpformat" for a list of available fields).
- -O, --formatted_dump
- Dumps accounting records in an easy-to-read format.
- This option is provided for debugging.
- -p, --parsable
- output will be '|' delimited with a '|' at the end
- -P, --parsable2
- output will be '|' delimited without a '|' at the end
- -q, --qos
- Only send data about jobs using these qos. Default is all.
- -r, --partition
-
- -s state_list , --state=state_list
- Selects jobs based on their state during the time period given. Unless otherwise specified, the start and end time will be the current time when the --state option is specified and only currently running jobs can be displayed. A start and/or end time must be specified to view information about jobs not currently running. The following state designators are valid and multiple state names may be specified using comma separators. Either the short or long form of the state name may be used (e.g. CA or CANCELLED) and the the the name is case insensitive (e.g. ca and CA both work).
- CA CANCELLED
- Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated.
- CD COMPLETED
- Job has terminated all processes on all nodes.
- CF CONFIGURING
- Job has been allocated resources, but are waiting for them to become ready for use (e.g. booting).
- CG COMPLETING
- Job is in the process of completing. Some processes on some nodes may still be active.
- F FAILED
- Job terminated with non-zero exit code or other failure condition.
- NF NODE_FAIL
- Job terminated due to failure of one or more allocated nodes.
- PD PENDING
- Job is awaiting resource allocation.
- PR PREEMPTED
- Job terminated due to preemption.
- R RUNNING
- Job currently has an allocation.
- RS RESIZING
- Job is about to change size.
- S SUSPENDED
- Job has an allocation, but execution has been suspended.
- TO TIMEOUT
- Job terminated upon reaching its time limit.
- The state_list operand is a comma-separated list of
these state designators. Space characters are not allowed in the
state_list
- -S, --starttime
- Select jobs eligible after the specified time. Default is
midnight of current day. If states are given with the -s option then
return jobs in this state at this time, 'now' is also used as the default
time.
- -T, --truncate
- Truncate time. So if a job started before --starttime the
start time would be truncated to --starttime. The same for end time and
--endtime.
- -u uid_list, --uid=uid_list, --user= user_list
- Use this comma separated list of uids or user names to
select jobs to display. By default, the running user's uid is used.
- --usage
- Display a command usage summary.
- -v, --verbose
- Primarily for debugging purposes, report the state of
various variables during processing.
- -V, --version
- Print version.
- -W wckey_list, --wckeys=wckey_list
- Displays the statistics only for the jobs started on the
wckeys specified by the wckey_list operand, which is a
comma-separated list of wckey names. Space characters are not allowed in
the wckey_list. Default is all wckeys.
- -x associd_list, --associations=assoc_list
- Displays the statistics only for the jobs running under the
association ids specified by the assoc_list operand, which is a
comma-separated list of association ids. Space characters are not allowed
in the assoc_list. Default is all associations.
- -X, --allocations
- Only show cumulative statistics for each job, not the
intermediate steps.
Job Accounting Fields¶
The following describes each job accounting field:- alloccpus
- Count of allocated processors.
- account
- Account the job ran under.
- associd
- Reference to the association of user, account and cluster.
- AveCPU
- Average (system + user) CPU time of all tasks in job.
- AvePages
- Average number of page faults of all tasks in job.
- AveRSS
- Average resident set size of all tasks in job.
- AveVMSize
- Average Virtual Memory size of all tasks in job.
- blockid
- Block ID, applicable to BlueGene computers only.
- cluster
- Cluster name.
- Comment
- The job's comment string when the AccountingStoreJobComment
parameter in the slurm.conf file is set (or defaults) to YES. The Comment
string can be modified by invoking sacctmgr modify job or the
specialized sjobexitmod command.
- cputime
- Formatted number of cpu seconds a process was allocated.
- cputimeraw
- How much cpu time process was allocated in second format,
not formatted like above.
- DerivedExitCode
- The highest exit code returned by the job's job steps (srun
invocations). Following the colon is the signal that caused the process to
terminate if it was terminated by a signal. The DerivedExitCode can be
modified by invoking sacctmgr modify job or the specialized
sjobexitmod command.
- elapsed
- The jobs elapsed time.
- The format of this fields output is as follows:
[DD-[hh:]]mm:ss
- as defined by the following:
- DD
- days
- hh
- hours
- mm
- minutes
- ss
- seconds
- eligible
- When the job became eligible to run.
- end
- Termination time of the job. Format output is as follows:
MM/DD-hh:mm:ss
- as defined by the following:
- MM
- month
- DD
- day
- hh
- hours
- mm
- minutes
- ss
- seconds
- exitcode
- The exit code returned by the job script or salloc,
typically as set by the exit() function. Following the colon is the signal
that caused the process to terminate if it was terminated by a signal.
- gid
- The group identifier of the user who ran the job.
- group
- The group name of the user who ran the job.
- JobID
- The number of the job or job step. It is in the form:
job.jobstep.
- jobname
- The name of the job or job step. The
slurm_accounting.log file is a space delimited file. Because of
this if a space is used in the jobname an underscore is substituted for
the space before the record is written to the accounting file. So when the
jobname is displayed by sacct the jobname that had a space in it
will now have an underscore in place of the space.
- layout
- What the layout of a step was when it was running. This can
be used to give you an idea of which node ran which rank in your job.
- MaxPages
- Maximum number of page faults of all tasks in job.
- MaxPagesNode
- The node on which the maxpages occurred.
- MaxPagesTask
- The task ID where the maxpages occurred.
- MaxRSS
- Maximum resident set size of all tasks in job.
- MaxRSSNode
- The node on which the maxrss occurred.
- MaxRSSTask
- The task ID where the maxrss occurred.
- MaxVMSize
- Maximum Virtual Memory size of all tasks in job.
- MaxVMSizeNode
- The node on which the maxvmsize occurred.
- MaxVMSizeTask
- The task ID where the maxvmsize occurred.
- MinCPU
- Minimum (system + user) CPU time of all tasks in job.
- MinCPUNode
- The node on which the mincpu occurred.
- MinCPUTask
- The task ID where the mincpu occurred.
- ncpus
- Total number of CPUs allocated to the job.
- nodelist
- List of nodes in job/step.
- nnodes
- Number of nodes in a job or step.
- NTasks
- Total number of tasks in a job or step.
- priority
- Slurm priority.
- partition
- Identifies the partition on which the job ran.
- qos
- Name of Quality of Service.
- qosraw
- Id of Quality of Service.
- reqcpus
- Required CPUs.
- reserved
- How much wall clock time was used as reserved time for this
job. This is derived from how long a job was waiting from eligible time to
when it actually started.
- resvcpu
- Formatted time for how long (cpu secs) a job was reserved
for.
- resvcpuraw
- Reserved CPUs in second format, not formatted.
- start
- Initiation time of the job in the same format as
end.
- state
- Displays the job status, or state.
- submit
- The time and date stamp (in Universal Time Coordinated,
UTC) the job was submitted. The format of the output is identical to that
of the end field.
- suspended
- How long the job was suspended for.
- SystemCPU
- The amount of system CPU time used by the job or job step.
The format of the output is identical to that of the elapsed field.
- timelimit
- What the timelimit was/is for the job.
- TotalCPU
- The sum of the SystemCPU and UserCPU time used by the job
or job step. The total CPU time of the job may exceed the job's elapsed
time for jobs that include multiple job steps. The format of the output is
identical to that of the elapsed field.
- uid
- The user identifier of the user who ran the job.
- user
- The user name of the user who ran the job.
- UserCPU
- The amount of user CPU time used by the job or job step.
The format of the output is identical to that of the elapsed field.
- wckey
- Workload Characterization Key. Arbitrary string for
grouping orthogonal accounts together.
- wckeyid
- Reference to the wckey.
INTERPRETING THE -DUMP OPTION OUTPUT¶
The sacct command's --dump option displays data in a horizontal list of fields depending on the record type. There are three record types: JOB_START, JOB_STEP, and JOB_TERMINATED. There is a subsection that describes the output for each record type. When the data output is a job accounting field, as described in the section titled "Job Accounting Fields", only the name of the job accounting field is listed. Otherwise, additional information is provided.- Note:
- The output for the JOB_STEP and JOB_TERMINATED record types present a pair of fields for the following data: Total CPU time, Total User CPU time, and Total System CPU time. The first field of each pair is the time in seconds expressed as an integer. The second field of each pair is the fractional number of seconds multiplied by one million. Thus, a pair of fields output as "1 024315" means that the time is 1.024315 seconds. The least significant digits in the second field are truncated in formatted displays.
Output for the JOB_START Record Type¶
The following describes the horizontal fields output by the sacct --dump option for the JOB_START record type.- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_START (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (16)
- 10
- uid
- 11
- gid
- 12
- The job name
- 13
- Batch Flag (0=no batch)
- 14
- Relative SLURM priority
- 15
- ncpus
- 16
- nodes
Output for the JOB_STEP Record Type¶
The following describes the horizontal fields output by the sacct --dump option for the JOB_STEP record type.- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_STEP (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (38)
- 10
- jobid
- 11
- end
- 12
- Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
- CA
- Cancelled
- CD
- Completed successfully
- F
- Failed
- NF
- Job terminated from node failure
- R
- Running
- S
- Suspended
- TO
- Timed out
- 13
- exitcode
- 14
- ntasks
- 15
- ncpus
- 16
- elapsed time in seconds expressed as an integer
- 17
- Integer portion of the Total CPU time in seconds for all processes
- 18
- Fractional portion of the Total CPU time for all processes expressed in microseconds
- 19
- Integer portion of the Total User CPU time in seconds for all processes
- 20
- Fractional portion of the Total User CPU time for all processes expressed in microseconds
- 21
- Integer portion of the Total System CPU time in seconds for all processes
- 22
- Fractional portion of the Total System CPU time for all processes expressed in microseconds
- 23
- rss
- 24
- ixrss
- 25
- idrss
- 26
- isrss
- 27
- minflt
- 28
- majflt
- 29
- nswap
- 30
- inblocks
- 31
- outblocks
- 32
- msgsnd
- 33
- msgrcv
- 34
- nsignals
- 35
- nvcsw
- 36
- nivcsw
- 37
- vsize
Output for the JOB_TERMINATED Record Type¶
The following describes the horizontal fields output by the sacct --dump option for the JOB_TERMINATED (literal string) record type.- Field #
- Field
- 1
- job
- 2
- partition
- 3
- submitted
- 4
- The jobs start time; this value is the number of non-leap seconds since the Epoch (00:00:00 UTC, January 1, 1970)
- 5
- uid.gid
- 6
- (Reserved)
- 7
- JOB_TERMINATED (literal string)
- 8
- Job Record Version (1)
- 9
- The number of fields in the record (38)
- Although thirty-eight fields are displayed by the sacct command for the JOB_TERMINATED record, only fields 1 through 12 are recorded in the actual data file. The sacct command aggregates the remainder.
- 10
- The total elapsed time in seconds for the job.
- 11
- end
- 12
- Completion Status; the mnemonics, which may appear in uppercase or lowercase, are as follows:
- CA
- Cancelled
- CD
- Completed successfully
- F
- Failed
- NF
- Job terminated from node failure
- R
- Running
- TO
- Timed out
- 13
- exitcode
- 14
- ntasks
- 15
- ncpus
- 16
- elapsed time in seconds expressed as an integer
- 17
- Integer portion of the Total CPU time in seconds for all processes
- 18
- Fractional portion of the Total CPU time for all processes expressed in microseconds
- 19
- Integer portion of the Total User CPU time in seconds for all processes
- 20
- Fractional portion of the Total User CPU time for all processes expressed in microseconds
- 21
- Integer portion of the Total System CPU time in seconds for all processes
- 22
- Fractional portion of the Total System CPU time for all processes expressed in microseconds
- 23
- rss
- 24
- ixrss
- 25
- idrss
- 26
- isrss
- 27
- minflt
- 28
- majflt
- 29
- nswap
- 30
- inblocks
- 31
- outblocks
- 32
- msgsnd
- 33
- msgrcv
- 34
- nsignals
- 35
- nvcsw
- 36
- nivcsw
- 37
- vsize
EXAMPLES¶
This example illustrates the default invocation of the sacct command:# sacct Jobid Jobname Partition Account AllocCPUS State ExitCode ---------- ---------- ---------- ---------- ---------- ---------- -------- 2 script01 srun acct1 1 RUNNING 0 3 script02 srun acct1 1 RUNNING 0 4 endscript srun acct1 1 RUNNING 0 4.0 srun acct1 1 COMPLETED 0
# sacct --brief Jobid State ExitCode ---------- ---------- -------- 2 RUNNING 0 3 RUNNING 0 4 RUNNING 0 4.0 COMPLETED 0
# sacct --allocations Jobid Jobname Partition Account AllocCPUS State ExitCode ---------- ---------- ---------- ---------- ------- ---------- -------- 3 sja_init andy acct1 1 COMPLETED 0 4 sjaload andy acct1 2 COMPLETED 0 5 sja_scr1 andy acct1 1 COMPLETED 0 6 sja_scr2 andy acct1 18 COMPLETED 2 7 sja_scr3 andy acct1 18 COMPLETED 0 8 sja_scr5 andy acct1 2 COMPLETED 0 9 sja_scr7 andy acct1 90 COMPLETED 1 10 endscript andy acct1 186 COMPLETED 0
# sacct --format=jobid,elapsed,ncpus,ntasks,state Jobid Elapsed Ncpus Ntasks State ---------- ---------- ---------- -------- ---------- 3 00:01:30 2 1 COMPLETED 3.0 00:01:30 2 1 COMPLETED 4 00:00:00 2 2 COMPLETED 4.0 00:00:01 2 2 COMPLETED 5 00:01:23 2 1 COMPLETED 5.0 00:01:31 2 1 COMPLETED
COPYING¶
Copyright (C) 2005-2007 Copyright Hewlett-Packard Development Company L.P. Copyright (C) 2008-2009 Lawrence Livermore National Security. Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). CODE-OCEC-09-009. All rights reserved. This file is part of SLURM, a resource management program. For details, see <http://www.schedmd.com/slurmdocs/>. SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.FILES¶
- /etc/slurm.conf
- Entries to this file enable job accounting and designate the job accounting log file that collects system job accounting.
- /var/log/slurm_accounting.log
- The default job accounting log file. By default, this file is set to read and write permission for root only.
SEE ALSO¶
sstat(1), ps (1), srun(1), squeue(1), getrusage (2), time (2)March 2010 | sacct 2.2 |