'\" t .\" Title: perf-record .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.78.1 .\" Date: 2016-03-20 .\" Manual: perf Manual .\" Source: perf .\" Language: English .\" .TH "PERF_4.4\-RECORD" "1" "2016\-03\-20" "perf" "perf Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" perf-record \- Run a command and record its profile into perf\&.data .SH "SYNOPSIS" .sp .nf \fIperf record\fR [\-e | \-\-event=EVENT] [\-l] [\-a] \fIperf record\fR [\-e | \-\-event=EVENT] [\-l] [\-a] \(em [] .fi .SH "DESCRIPTION" .sp This command runs a command and gathers a performance counter profile from it, into perf\&.data \- without displaying anything\&. .sp This file can then be inspected later on, using \fIperf report\fR\&. .SH "OPTIONS" .PP \&... .RS 4 Any command you can specify in a shell\&. .RE .PP \-e, \-\-event= .RS 4 Select the PMU event\&. Selection can be: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolic event name (use \fIperf list\fR to list all events) .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolically formed PMU event like \fIpmu/param1=0x3,param2/\fR where \fIparam1\fR, \fIparam2\fR, etc are defined as formats for the PMU in /sys/bus/event_sources/devices//format/*\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a symbolically formed event like \fIpmu/config=M,config1=N,config3=K/\fR .sp .if n \{\ .RS 4 .\} .nf where M, N, K are numbers (in decimal, hex, octal format)\&. Acceptable values for each of \*(Aqconfig\*(Aq, \*(Aqconfig1\*(Aq and \*(Aqconfig2\*(Aq are defined by corresponding entries in /sys/bus/event_sources/devices//format/* param1 and param2 are defined as formats for the PMU in: /sys/bus/event_sources/devices//format/* .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf There are also some params which are not defined in \&.\&.\&.//format/*\&. These params can be used to overload default config values per event\&. Here is a list of the params\&. \- \*(Aqperiod\*(Aq: Set event sampling period \- \*(Aqfreq\*(Aq: Set event sampling frequency \- \*(Aqtime\*(Aq: Disable/enable time stamping\&. Acceptable values are 1 for enabling time stamping\&. 0 for disabling time stamping\&. The default is 1\&. \- \*(Aqcall\-graph\*(Aq: Disable/enable callgraph\&. Acceptable str are "fp" for FP mode, "dwarf" for DWARF mode, "lbr" for LBR mode and "no" for disable callgraph\&. \- \*(Aqstack\-size\*(Aq: user stack size for dwarf mode Note: If user explicitly sets options which conflict with the params, the value set by the params will be overridden\&. .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a hardware breakpoint event in the form of \fI\emem:addr[/len][:access]\fR where addr is the address in memory you want to break in\&. Access is the memory access type (read, write, execute) it can be passed as follows: \fI\emem:addr[:[r][w][x]]\fR\&. len is the range, number of bytes from specified addr, which the breakpoint will cover\&. If you want to profile read\-write accesses in 0x1000, just set \fImem:0x1000:rw\fR\&. If you want to profile write accesses in [0x1000~1008), just set \fImem:0x1000/8:w\fR\&. .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} a group of events surrounded by a pair of brace ("{event1,event2,\&...}")\&. Each event is separated by commas and the group should be quoted to prevent the shell interpretation\&. You also need to use \-\-group on "perf report" to view group events together\&. .RE .RE .PP \-\-filter= .RS 4 Event filter\&. This option should follow a event selector (\-e) which selects tracepoint event(s)\&. Multiple \fI\-\-filter\fR options are combined using \fI&&\fR\&. .RE .PP \-\-exclude\-perf .RS 4 Don\(cqt record events issued by perf itself\&. This option should follow a event selector (\-e) which selects tracepoint event(s)\&. It adds a filter expression \fIcommon_pid != $PERFPID\fR to filters\&. If other \fI\-\-filter\fR exists, the new filter expression will be combined with them by \fI&&\fR\&. .RE .PP \-a, \-\-all\-cpus .RS 4 System\-wide collection from all CPUs\&. .RE .PP \-p, \-\-pid= .RS 4 Record events on existing process ID (comma separated list)\&. .RE .PP \-t, \-\-tid= .RS 4 Record events on existing thread ID (comma separated list)\&. This option also disables inheritance by default\&. Enable it by adding \-\-inherit\&. .RE .PP \-u, \-\-uid= .RS 4 Record events in threads owned by uid\&. Name or number\&. .RE .PP \-r, \-\-realtime= .RS 4 Collect data with this RT SCHED_FIFO priority\&. .RE .PP \-\-no\-buffering .RS 4 Collect data without buffering\&. .RE .PP \-c, \-\-count= .RS 4 Event period to sample\&. .RE .PP \-o, \-\-output= .RS 4 Output file name\&. .RE .PP \-i, \-\-no\-inherit .RS 4 Child tasks do not inherit counters\&. .RE .PP \-F, \-\-freq= .RS 4 Profile at this frequency\&. .RE .PP \-m, \-\-mmap\-pages= .RS 4 Number of mmap data pages (must be a power of two) or size specification with appended unit character \- B/K/M/G\&. The size is rounded up to have nearest pages power of two value\&. Also, by adding a comma, the number of mmap pages for AUX area tracing can be specified\&. .RE .PP \-\-group .RS 4 Put all events in a single event group\&. This precedes the \-\-event option and remains only for backward compatibility\&. See \-\-event\&. .RE .PP \-g .RS 4 Enables call\-graph (stack chain/backtrace) recording\&. .RE .PP \-\-call\-graph .RS 4 Setup and enable call\-graph (stack chain/backtrace) recording, implies \-g\&. Default is "fp"\&. .sp .if n \{\ .RS 4 .\} .nf Allows specifying "fp" (frame pointer) or "dwarf" (DWARF\*(Aqs CFI \- Call Frame Information) or "lbr" (Hardware Last Branch Record facility) as the method to collect the information used to show the call graphs\&. .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf In some systems, where binaries are build with gcc \-\-fomit\-frame\-pointer, using the "fp" method will produce bogus call graphs, using "dwarf", if available (perf tools linked to the libunwind or libdw library) should be used instead\&. Using the "lbr" method doesn\*(Aqt require any compiler options\&. It will produce call graphs from the hardware LBR registers\&. The main limition is that it is only available on new Intel platforms, such as Haswell\&. It can only get user call chain\&. It doesn\*(Aqt work with branch stack sampling at the same time\&. .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf When "dwarf" recording is used, perf also records (user) stack dump when sampled\&. Default size of the stack dump is 8192 (bytes)\&. User can change the size by passing the size after comma like "\-\-call\-graph dwarf,4096"\&. .fi .if n \{\ .RE .\} .RE .PP \-q, \-\-quiet .RS 4 Don\(cqt print any message, useful for scripting\&. .RE .PP \-v, \-\-verbose .RS 4 Be more verbose (show counter open errors, etc)\&. .RE .PP \-s, \-\-stat .RS 4 Record per\-thread event counts\&. Use it with \fIperf report \-T\fR to see the values\&. .RE .PP \-d, \-\-data .RS 4 Record the sample addresses\&. .RE .PP \-T, \-\-timestamp .RS 4 Record the sample timestamps\&. Use it with \fIperf report \-D\fR to see the timestamps, for instance\&. .RE .PP \-P, \-\-period .RS 4 Record the sample period\&. .RE .PP \-n, \-\-no\-samples .RS 4 Don\(cqt sample\&. .RE .PP \-R, \-\-raw\-samples .RS 4 Collect raw sample records from all opened counters (default for tracepoint counters)\&. .RE .PP \-C, \-\-cpu .RS 4 Collect samples only on the list of CPUs provided\&. Multiple CPUs can be provided as a comma\-separated list with no space: 0,1\&. Ranges of CPUs are specified with \-: 0\-2\&. In per\-thread mode with inheritance mode on (default), samples are captured only when the thread executes on the designated CPUs\&. Default is to monitor all CPUs\&. .RE .PP \-N, \-\-no\-buildid\-cache .RS 4 Do not update the buildid cache\&. This saves some overhead in situations where the information in the perf\&.data file (which includes buildids) is sufficient\&. .RE .PP \-G name,\&..., \-\-cgroup name,\&... .RS 4 monitor only in the container (cgroup) called "name"\&. This option is available only in per\-cpu mode\&. The cgroup filesystem must be mounted\&. All threads belonging to container "name" are monitored when they run on the monitored CPUs\&. Multiple cgroups can be provided\&. Each cgroup is applied to the corresponding event, i\&.e\&., first cgroup to first event, second cgroup to second event and so on\&. It is possible to provide an empty cgroup (monitor all the time) using, e\&.g\&., \-G foo,,bar\&. Cgroups must have corresponding events, i\&.e\&., they always refer to events defined earlier on the command line\&. .RE .PP \-b, \-\-branch\-any .RS 4 Enable taken branch stack sampling\&. Any type of taken branch may be sampled\&. This is a shortcut for \-\-branch\-filter any\&. See \-\-branch\-filter for more infos\&. .RE .PP \-j, \-\-branch\-filter .RS 4 Enable taken branch stack sampling\&. Each sample captures a series of consecutive taken branches\&. The number of branches captured with each sample depends on the underlying hardware, the type of branches of interest, and the executed code\&. It is possible to select the types of branches captured by enabling filters\&. The following filters are defined: .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} any: any type of branches .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} any_call: any function call or system call .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} any_ret: any function return or system call return .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} ind_call: any indirect branch .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} call: direct calls, including far (to/from kernel) calls .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} u: only when the branch target is at the user level .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} k: only when the branch target is in the kernel .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} hv: only when the target is at the hypervisor level .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} in_tx: only when the target is in a hardware transaction .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} no_tx: only when the target is not in a hardware transaction .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} abort_tx: only when the target is a hardware transaction abort .RE .sp .RS 4 .ie n \{\ \h'-04'\(bu\h'+03'\c .\} .el \{\ .sp -1 .IP \(bu 2.3 .\} cond: conditional branches .RE .sp The option requires at least one branch type among any, any_call, any_ret, ind_call, cond\&. The privilege levels may be omitted, in which case, the privilege levels of the associated event are applied to the branch filter\&. Both kernel (k) and hypervisor (hv) privilege levels are subject to permissions\&. When sampling on multiple events, branch stack sampling is enabled for all the sampling events\&. The sampled branch type is the same for all events\&. The various filters must be specified as a comma separated list: \-\-branch\-filter any_ret,u,k Note that this feature may not be available on all processors\&. .RE .PP \-\-weight .RS 4 Enable weightened sampling\&. An additional weight is recorded per sample and can be displayed with the weight and local_weight sort keys\&. This currently works for TSX abort events and some memory events in precise mode on modern Intel CPUs\&. .RE .PP \-\-transaction .RS 4 Record transaction flags for transaction related events\&. .RE .PP \-\-per\-thread .RS 4 Use per\-thread mmaps\&. By default per\-cpu mmaps are created\&. This option overrides that and uses per\-thread mmaps\&. A side\-effect of that is that inheritance is automatically disabled\&. \-\-per\-thread is ignored with a warning if combined with \-a or \-C options\&. .RE .PP \-D, \-\-delay= .RS 4 After starting the program, wait msecs before measuring\&. This is useful to filter out the startup phase of the program, which is often very different\&. .RE .PP \-I, \-\-intr\-regs .RS 4 Capture machine state (registers) at interrupt, i\&.e\&., on counter overflows for each sample\&. List of captured registers depends on the architecture\&. This option is off by default\&. It is possible to select the registers to sample using their symbolic names, e\&.g\&. on x86, ax, si\&. To list the available registers use \-\-intr\-regs=\e?\&. To name registers, pass a comma separated list such as \-\-intr\-regs=ax,bx\&. The list of register is architecture dependent\&. .RE .PP \-\-running\-time .RS 4 Record running and enabled time for read events (:S) .RE .PP \-k, \-\-clockid .RS 4 Sets the clock id to use for the various time fields in the perf_event_type records\&. See clock_gettime()\&. In particular CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW are supported, some events might also allow CLOCK_BOOTTIME, CLOCK_REALTIME and CLOCK_TAI\&. .RE .PP \-S, \-\-snapshot .RS 4 Select AUX area tracing Snapshot Mode\&. This option is valid only with an AUX area tracing event\&. Optionally the number of bytes to capture per snapshot can be specified\&. In Snapshot Mode, trace data is captured only when signal SIGUSR2 is received\&. .RE .PP \-\-proc\-map\-timeout .RS 4 When processing pre\-existing threads /proc/XXX/mmap, it may take a long time, because the file may be huge\&. A time out is needed in such cases\&. This option sets the time out limit\&. The default value is 500 ms\&. .RE .PP \-\-switch\-events .RS 4 Record context switch events i\&.e\&. events of type PERF_RECORD_SWITCH or PERF_RECORD_SWITCH_CPU_WIDE\&. .RE .PP \-\-clang\-path .RS 4 Path to clang binary to use for compiling BPF scriptlets\&. .RE .PP \-\-clang\-opt .RS 4 Options passed to clang when compiling BPF scriptlets\&. .RE .SH "SEE ALSO" .sp \fBperf_4.4-stat\fR(1), \fBperf_4.4-list\fR(1)