'\" t
.\" Title: perf-stat
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.79.1
.\" Date: 2018-04-19
.\" Manual: perf Manual
.\" Source: perf
.\" Language: English
.\"
.TH "PERF_4.15\-STAT" "1" "2018\-04\-19" "perf" "perf Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
perf-stat \- Run a command and gather performance counter statistics
.SH "SYNOPSIS"
.sp
.nf
\fIperf stat\fR [\-e | \-\-event=EVENT] [\-a]
\fIperf stat\fR [\-e | \-\-event=EVENT] [\-a] \(em []
\fIperf stat\fR [\-e | \-\-event=EVENT] [\-a] record [\-o file] \(em []
\fIperf stat\fR report [\-i file]
.fi
.SH "DESCRIPTION"
.sp
This command runs a command and gathers performance counter statistics from it\&.
.SH "OPTIONS"
.PP
\&...
.RS 4
Any command you can specify in a shell\&.
.RE
.PP
record
.RS 4
See STAT RECORD\&.
.RE
.PP
report
.RS 4
See STAT REPORT\&.
.RE
.PP
\-e, \-\-event=
.RS 4
Select the PMU event\&. Selection can be:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
a symbolic event name (use
\fIperf list\fR
to list all events)
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
a symbolically formed event like
\fIpmu/param1=0x3,param2/\fR
where param1 and param2 are defined as formats for the PMU in /sys/bus/event_source/devices//format/*
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
a symbolically formed event like
\fIpmu/config=M,config1=N,config2=K/\fR
where M, N, K are numbers (in decimal, hex, octal format)\&. Acceptable values for each of
\fIconfig\fR,
\fIconfig1\fR
and
\fIconfig2\fR
parameters are defined by corresponding entries in /sys/bus/event_source/devices//format/*
.RE
.RE
.PP
\-i, \-\-no\-inherit
.RS 4
child tasks do not inherit counters
.RE
.PP
\-p, \-\-pid=
.RS 4
stat events on existing process id (comma separated list)
.RE
.PP
\-t, \-\-tid=
.RS 4
stat events on existing thread id (comma separated list)
.RE
.PP
\-a, \-\-all\-cpus
.RS 4
system\-wide collection from all CPUs (default if no target is specified)
.RE
.PP
\-c, \-\-scale
.RS 4
scale/normalize counter values
.RE
.PP
\-d, \-\-detailed
.RS 4
print more detailed statistics, can be specified up to 3 times
.sp
.if n \{\
.RS 4
.\}
.nf
\-d: detailed events, L1 and LLC data cache
\-d \-d: more detailed events, dTLB and iTLB events
\-d \-d \-d: very detailed events, adding prefetch events
.fi
.if n \{\
.RE
.\}
.RE
.PP
\-r, \-\-repeat=
.RS 4
repeat command and print average + stddev (max: 100)\&. 0 means forever\&.
.RE
.PP
\-B, \-\-big\-num
.RS 4
print large numbers with thousands\*(Aq separators according to locale
.RE
.PP
\-C, \-\-cpu=
.RS 4
Count only on the list of CPUs provided\&. Multiple CPUs can be provided as a comma\-separated list with no space: 0,1\&. Ranges of CPUs are specified with \-: 0\-2\&. In per\-thread mode, this option is ignored\&. The \-a option is still necessary to activate system\-wide monitoring\&. Default is to count on all CPUs\&.
.RE
.PP
\-A, \-\-no\-aggr
.RS 4
Do not aggregate counts across all monitored CPUs\&.
.RE
.PP
\-n, \-\-null
.RS 4
null run \- don\(cqt start any counters
.RE
.PP
\-v, \-\-verbose
.RS 4
be more verbose (show counter open errors, etc)
.RE
.PP
\-x SEP, \-\-field\-separator SEP
.RS 4
print counts using a CSV\-style output to make it easy to import directly into spreadsheets\&. Columns are separated by the string specified in SEP\&.
.RE
.PP
\-G name, \-\-cgroup name
.RS 4
monitor only in the container (cgroup) called "name"\&. This option is available only in per\-cpu mode\&. The cgroup filesystem must be mounted\&. All threads belonging to container "name" are monitored when they run on the monitored CPUs\&. Multiple cgroups can be provided\&. Each cgroup is applied to the corresponding event, i\&.e\&., first cgroup to first event, second cgroup to second event and so on\&. It is possible to provide an empty cgroup (monitor all the time) using, e\&.g\&., \-G foo,,bar\&. Cgroups must have corresponding events, i\&.e\&., they always refer to events defined earlier on the command line\&.
.RE
.PP
\-o file, \-\-output file
.RS 4
Print the output into the designated file\&.
.RE
.PP
\-\-append
.RS 4
Append to the output file designated with the \-o option\&. Ignored if \-o is not specified\&.
.RE
.PP
\-\-log\-fd
.RS 4
Log output to fd, instead of stderr\&. Complementary to \-\-output, and mutually exclusive with it\&. \-\-append may be used here\&. Examples: 3>results perf stat \-\-log\-fd 3 \(em $cmd 3>>results perf stat \-\-log\-fd 3 \-\-append \(em $cmd
.RE
.PP
\-\-pre, \-\-post
.RS 4
Pre and post measurement hooks, e\&.g\&.:
.RE
.sp
perf stat \-\-repeat 10 \-\-null \-\-sync \-\-pre \fImake \-s O=defconfig\-build/clean\fR \(em make \-s \-j64 O=defconfig\-build/ bzImage
.PP
\-I msecs, \-\-interval\-print msecs
.RS 4
Print count deltas every N milliseconds (minimum: 10ms) The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals\&. Use with caution\&. example:
\fIperf stat \-I 1000 \-e cycles \-a sleep 5\fR
.RE
.PP
\-\-metric\-only
.RS 4
Only print computed metrics\&. Print them in a single line\&. Don\(cqt show any raw values\&. Not supported with \-\-per\-thread\&.
.RE
.PP
\-\-per\-socket
.RS 4
Aggregate counts per processor socket for system\-wide mode measurements\&. This is a useful mode to detect imbalance between sockets\&. To enable this mode, use \-\-per\-socket in addition to \-a\&. (system\-wide)\&. The output includes the socket number and the number of online processors on that socket\&. This is useful to gauge the amount of aggregation\&.
.RE
.PP
\-\-per\-core
.RS 4
Aggregate counts per physical processor for system\-wide mode measurements\&. This is a useful mode to detect imbalance between physical cores\&. To enable this mode, use \-\-per\-core in addition to \-a\&. (system\-wide)\&. The output includes the core number and the number of online logical processors on that physical processor\&.
.RE
.PP
\-\-per\-thread
.RS 4
Aggregate counts per monitored threads, when monitoring threads (\-t option) or processes (\-p option)\&.
.RE
.PP
\-D msecs, \-\-delay msecs
.RS 4
After starting the program, wait msecs before measuring\&. This is useful to filter out the startup phase of the program, which is often very different\&.
.RE
.PP
\-T, \-\-transaction
.RS 4
Print statistics of transactional execution if supported\&.
.RE
.SH "STAT RECORD"
.sp
Stores stat data into perf data file\&.
.PP
\-o file, \-\-output file
.RS 4
Output file name\&.
.RE
.SH "STAT REPORT"
.sp
Reads and reports stat data from perf data file\&.
.PP
\-i file, \-\-input file
.RS 4
Input file name\&.
.RE
.PP
\-\-per\-socket
.RS 4
Aggregate counts per processor socket for system\-wide mode measurements\&.
.RE
.PP
\-\-per\-core
.RS 4
Aggregate counts per physical processor for system\-wide mode measurements\&.
.RE
.PP
\-M, \-\-metrics
.RS 4
Print metrics or metricgroups specified in a comma separated list\&. For a group all metrics from the group are added\&. The events from the metrics are automatically measured\&. See perf list output for the possble metrics and metricgroups\&.
.RE
.PP
\-A, \-\-no\-aggr
.RS 4
Do not aggregate counts across all monitored CPUs\&.
.RE
.PP
\-\-topdown
.RS 4
Print top down level 1 metrics if supported by the CPU\&. This allows to determine bottle necks in the CPU pipeline for CPU bound workloads, by breaking the cycles consumed down into frontend bound, backend bound, bad speculation and retiring\&.
.RE
.sp
Frontend bound means that the CPU cannot fetch and decode instructions fast enough\&. Backend bound means that computation or memory access is the bottle neck\&. Bad Speculation means that the CPU wasted cycles due to branch mispredictions and similar issues\&. Retiring means that the CPU computed without an apparently bottleneck\&. The bottleneck is only the real bottleneck if the workload is actually bound by the CPU and not by something else\&.
.sp
For best results it is usually a good idea to use it with interval mode like \-I 1000, as the bottleneck of workloads can change often\&.
.sp
The top down metrics are collected per core instead of per CPU thread\&. Per core mode is automatically enabled and \-a (global monitoring) is needed, requiring root rights or perf\&.perf_event_paranoid=\-1\&.
.sp
Topdown uses the full Performance Monitoring Unit, and needs disabling of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog for best results\&. Otherwise the bottlenecks may be inconsistent on workload with changing phases\&.
.sp
This enables \-\-metric\-only, unless overriden with \-\-no\-metric\-only\&.
.sp
To interpret the results it is usually needed to know on which CPUs the workload runs on\&. If needed the CPUs can be forced using taskset\&.
.PP
\-\-no\-merge
.RS 4
Do not merge results from same PMUs\&.
.RE
.PP
\-\-smi\-cost
.RS 4
Measure SMI cost if msr/aperf/ and msr/smi/ events are supported\&.
.RE
.sp
During the measurement, the /sys/device/cpu/freeze_on_smi will be set to freeze core counters on SMI\&. The aperf counter will not be effected by the setting\&. The cost of SMI can be measured by (aperf \- unhalted core cycles)\&.
.sp
In practice, the percentages of SMI cycles is very useful for performance oriented analysis\&. \-\-metric_only will be applied by default\&. The output is SMI cycles%, equals to (aperf \- unhalted core cycles) / aperf
.sp
Users who wants to get the actual value can apply \-\-no\-metric\-only\&.
.SH "EXAMPLES"
.sp
$ perf stat \(em make \-j
.sp
.if n \{\
.RS 4
.\}
.nf
Performance counter stats for \*(Aqmake \-j\*(Aq:
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
8117\&.370256 task clock ticks # 11\&.281 CPU utilization factor
678 context switches # 0\&.000 M/sec
133 CPU migrations # 0\&.000 M/sec
235724 pagefaults # 0\&.029 M/sec
24821162526 CPU cycles # 3057\&.784 M/sec
18687303457 instructions # 2302\&.138 M/sec
172158895 cache references # 21\&.209 M/sec
27075259 cache misses # 3\&.335 M/sec
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
Wall\-clock time elapsed: 719\&.554352 msecs
.fi
.if n \{\
.RE
.\}
.SH "CSV FORMAT"
.sp
With \-x, perf stat is able to output a not\-quite\-CSV format output Commas in the output are not put into ""\&. To make it easy to parse it is recommended to use a different character like \-x \e;
.sp
The fields are in this order:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional usec time stamp in fractions of second (with \-I xxx)
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional CPU, core, or socket identifier
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional number of logical CPUs aggregated
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
counter value
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
unit of the counter value or empty
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
event name
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
run time of counter
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
percentage of measurement time the counter was running
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional variance if multiple values are collected with \-r
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional metric value
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
optional unit of metric
.RE
.sp
Additional metrics may be printed with all earlier fields being empty\&.
.SH "SEE ALSO"
.sp
\fBperf_4.15-top\fR(1), \fBperf_4.15-list\fR(1)