'\" t .\" Title: perf-stat .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.79.1 .\" Date: 2017-09-19 .\" Manual: perf Manual .\" Source: perf .\" Language: English .\" .TH "PERF_4.13\-STAT" "1" "2017\-09\-19" "perf" "perf Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" perf-stat \- Run a command and gather performance counter statistics .SH "SYNOPSIS" .sp .nf \fIperf stat\fR [\-e | \-\-event=EVENT] [\-a] \fIperf stat\fR [\-e | \-\-event=EVENT] [\-a] \(em [] \fIperf stat\fR [\-e | \-\-event=EVENT] [\-a] record [\-o file] \(em [] \fIperf stat\fR report [\-i file] .fi .SH "DESCRIPTION" .sp This command runs a command and gathers performance counter statistics from it\&. .SH "OPTIONS" .PP \&... .RS 4 Any command you can specify in a shell\&. .RE .PP record .RS 4 See STAT RECORD\&. .RE .PP report .RS 4 See STAT REPORT\&. .RE .PP \-e, \-\-event= .RS 4 Select the PMU event\&. Selection can be: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolic event name (use \fIperf list\fR to list all events) .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolically formed event like \fIpmu/param1=0x3,param2/\fR where param1 and param2 are defined as formats for the PMU in /sys/bus/event_sources/devices//format/* .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolically formed event like \fIpmu/config=M,config1=N,config2=K/\fR where M, N, K are numbers (in decimal, hex, octal format)\&. Acceptable values for each of \fIconfig\fR, \fIconfig1\fR and \fIconfig2\fR parameters are defined by corresponding entries in /sys/bus/event_sources/devices//format/* .RE .RE .PP \-i, \-\-no\-inherit .RS 4 child tasks do not inherit counters .RE .PP \-p, \-\-pid= .RS 4 stat events on existing process id (comma separated list) .RE .PP \-t, \-\-tid= .RS 4 stat events on existing thread id (comma separated list) .RE .PP \-a, \-\-all\-cpus .RS 4 system\-wide collection from all CPUs (default if no target is specified) .RE .PP \-c, \-\-scale .RS 4 scale/normalize counter values .RE .PP \-d, \-\-detailed .RS 4 print more detailed statistics, can be specified up to 3 times .sp .if n \{\ .RS 4 .\} .nf \-d: detailed events, L1 and LLC data cache \-d \-d: more detailed events, dTLB and iTLB events \-d \-d \-d: very detailed events, adding prefetch events .fi .if n \{\ .RE .\} .RE .PP \-r, \-\-repeat= .RS 4 repeat command and print average + stddev (max: 100)\&. 0 means forever\&. .RE .PP \-B, \-\-big\-num .RS 4 print large numbers with thousands\*(Aq separators according to locale .RE .PP \-C, \-\-cpu= .RS 4 Count only on the list of CPUs provided\&. Multiple CPUs can be provided as a comma\-separated list with no space: 0,1\&. Ranges of CPUs are specified with \-: 0\-2\&. In per\-thread mode, this option is ignored\&. The \-a option is still necessary to activate system\-wide monitoring\&. Default is to count on all CPUs\&. .RE .PP \-A, \-\-no\-aggr .RS 4 Do not aggregate counts across all monitored CPUs\&. .RE .PP \-n, \-\-null .RS 4 null run \- don\(cqt start any counters .RE .PP \-v, \-\-verbose .RS 4 be more verbose (show counter open errors, etc) .RE .PP \-x SEP, \-\-field\-separator SEP .RS 4 print counts using a CSV\-style output to make it easy to import directly into spreadsheets\&. Columns are separated by the string specified in SEP\&. .RE .PP \-G name, \-\-cgroup name .RS 4 monitor only in the container (cgroup) called "name"\&. This option is available only in per\-cpu mode\&. The cgroup filesystem must be mounted\&. All threads belonging to container "name" are monitored when they run on the monitored CPUs\&. Multiple cgroups can be provided\&. Each cgroup is applied to the corresponding event, i\&.e\&., first cgroup to first event, second cgroup to second event and so on\&. It is possible to provide an empty cgroup (monitor all the time) using, e\&.g\&., \-G foo,,bar\&. Cgroups must have corresponding events, i\&.e\&., they always refer to events defined earlier on the command line\&. .RE .PP \-o file, \-\-output file .RS 4 Print the output into the designated file\&. .RE .PP \-\-append .RS 4 Append to the output file designated with the \-o option\&. Ignored if \-o is not specified\&. .RE .PP \-\-log\-fd .RS 4 Log output to fd, instead of stderr\&. Complementary to \-\-output, and mutually exclusive with it\&. \-\-append may be used here\&. Examples: 3>results perf stat \-\-log\-fd 3 \(em $cmd 3>>results perf stat \-\-log\-fd 3 \-\-append \(em $cmd .RE .PP \-\-pre, \-\-post .RS 4 Pre and post measurement hooks, e\&.g\&.: .RE .sp perf stat \-\-repeat 10 \-\-null \-\-sync \-\-pre \fImake \-s O=defconfig\-build/clean\fR \(em make \-s \-j64 O=defconfig\-build/ bzImage .PP \-I msecs, \-\-interval\-print msecs .RS 4 Print count deltas every N milliseconds (minimum: 10ms) The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals\&. Use with caution\&. example: \fIperf stat \-I 1000 \-e cycles \-a sleep 5\fR .RE .PP \-\-metric\-only .RS 4 Only print computed metrics\&. Print them in a single line\&. Don\(cqt show any raw values\&. Not supported with \-\-per\-thread\&. .RE .PP \-\-per\-socket .RS 4 Aggregate counts per processor socket for system\-wide mode measurements\&. This is a useful mode to detect imbalance between sockets\&. To enable this mode, use \-\-per\-socket in addition to \-a\&. (system\-wide)\&. The output includes the socket number and the number of online processors on that socket\&. This is useful to gauge the amount of aggregation\&. .RE .PP \-\-per\-core .RS 4 Aggregate counts per physical processor for system\-wide mode measurements\&. This is a useful mode to detect imbalance between physical cores\&. To enable this mode, use \-\-per\-core in addition to \-a\&. (system\-wide)\&. The output includes the core number and the number of online logical processors on that physical processor\&. .RE .PP \-\-per\-thread .RS 4 Aggregate counts per monitored threads, when monitoring threads (\-t option) or processes (\-p option)\&. .RE .PP \-D msecs, \-\-delay msecs .RS 4 After starting the program, wait msecs before measuring\&. This is useful to filter out the startup phase of the program, which is often very different\&. .RE .PP \-T, \-\-transaction .RS 4 Print statistics of transactional execution if supported\&. .RE .SH "STAT RECORD" .sp Stores stat data into perf data file\&. .PP \-o file, \-\-output file .RS 4 Output file name\&. .RE .SH "STAT REPORT" .sp Reads and reports stat data from perf data file\&. .PP \-i file, \-\-input file .RS 4 Input file name\&. .RE .PP \-\-per\-socket .RS 4 Aggregate counts per processor socket for system\-wide mode measurements\&. .RE .PP \-\-per\-core .RS 4 Aggregate counts per physical processor for system\-wide mode measurements\&. .RE .PP \-A, \-\-no\-aggr .RS 4 Do not aggregate counts across all monitored CPUs\&. .RE .PP \-\-topdown .RS 4 Print top down level 1 metrics if supported by the CPU\&. This allows to determine bottle necks in the CPU pipeline for CPU bound workloads, by breaking the cycles consumed down into frontend bound, backend bound, bad speculation and retiring\&. .RE .sp Frontend bound means that the CPU cannot fetch and decode instructions fast enough\&. Backend bound means that computation or memory access is the bottle neck\&. Bad Speculation means that the CPU wasted cycles due to branch mispredictions and similar issues\&. Retiring means that the CPU computed without an apparently bottleneck\&. The bottleneck is only the real bottleneck if the workload is actually bound by the CPU and not by something else\&. .sp For best results it is usually a good idea to use it with interval mode like \-I 1000, as the bottleneck of workloads can change often\&. .sp The top down metrics are collected per core instead of per CPU thread\&. Per core mode is automatically enabled and \-a (global monitoring) is needed, requiring root rights or perf\&.perf_event_paranoid=\-1\&. .sp Topdown uses the full Performance Monitoring Unit, and needs disabling of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog for best results\&. Otherwise the bottlenecks may be inconsistent on workload with changing phases\&. .sp This enables \-\-metric\-only, unless overriden with \-\-no\-metric\-only\&. .sp To interpret the results it is usually needed to know on which CPUs the workload runs on\&. If needed the CPUs can be forced using taskset\&. .PP \-\-no\-merge .RS 4 Do not merge results from same PMUs\&. .RE .PP \-\-smi\-cost .RS 4 Measure SMI cost if msr/aperf/ and msr/smi/ events are supported\&. .RE .sp During the measurement, the /sys/device/cpu/freeze_on_smi will be set to freeze core counters on SMI\&. The aperf counter will not be effected by the setting\&. The cost of SMI can be measured by (aperf \- unhalted core cycles)\&. .sp In practice, the percentages of SMI cycles is very useful for performance oriented analysis\&. \-\-metric_only will be applied by default\&. The output is SMI cycles%, equals to (aperf \- unhalted core cycles) / aperf .sp Users who wants to get the actual value can apply \-\-no\-metric\-only\&. .SH "EXAMPLES" .sp $ perf stat \(em make \-j .sp .if n \{\ .RS 4 .\} .nf Performance counter stats for \*(Aqmake \-j\*(Aq: .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf 8117\&.370256 task clock ticks # 11\&.281 CPU utilization factor 678 context switches # 0\&.000 M/sec 133 CPU migrations # 0\&.000 M/sec 235724 pagefaults # 0\&.029 M/sec 24821162526 CPU cycles # 3057\&.784 M/sec 18687303457 instructions # 2302\&.138 M/sec 172158895 cache references # 21\&.209 M/sec 27075259 cache misses # 3\&.335 M/sec .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf Wall\-clock time elapsed: 719\&.554352 msecs .fi .if n \{\ .RE .\} .SH "CSV FORMAT" .sp With \-x, perf stat is able to output a not\-quite\-CSV format output Commas in the output are not put into ""\&. To make it easy to parse it is recommended to use a different character like \-x \e; .sp The fields are in this order: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional usec time stamp in fractions of second (with \-I xxx) .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional CPU, core, or socket identifier .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional number of logical CPUs aggregated .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} counter value .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} unit of the counter value or empty .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} event name .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} run time of counter .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} percentage of measurement time the counter was running .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional variance if multiple values are collected with \-r .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional metric value .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} optional unit of metric .RE .sp Additional metrics may be printed with all earlier fields being empty\&. .SH "SEE ALSO" .sp \fBperf_4.13-top\fR(1), \fBperf_4.13-list\fR(1)