.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "PT-STALK 1p" .TH PT-STALK 1p "2012-06-15" "perl v5.14.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" pt\-stalk \- Gather forensic data about MySQL when a problem occurs. .SH "SYNOPSIS" .IX Header "SYNOPSIS" Usage: pt-stalk [\s-1OPTIONS\s0] [\-\- \s-1MYSQL\s0 \s-1OPTIONS\s0] .PP pt-stalk watches for a trigger condition to become true, and then collects data to help in diagnosing problems. It is designed to run as a daemon with root privileges, so that you can diagnose intermittent problems that you cannot observe directly. You can also use it to execute a custom command, or to gather the data on demand without waiting for the trigger to happen. .SH "RISKS" .IX Header "RISKS" The following section is included to inform users about the potential risks, whether known or unknown, of using this tool. The two main categories of risks are those created by the nature of the tool (e.g. read-only tools vs. read-write tools) and those created by bugs. .PP pt-stalk is a read-write tool; it collects data from the system and writes it into a series of files. It should be very low-risk. Some of the options can cause intrusive data collection to be performed, however, so if you enable any non-default options, you should read their documentation carefully. .PP At the time of this release, we know of no bugs that could cause serious harm to users. .PP The authoritative source for updated information is always the online issue tracking system. Issues that affect this tool will be marked as such. You can see a list of such issues at the following \s-1URL:\s0 http://www.percona.com/bugs/pt\-stalk . .PP See also \*(L"\s-1BUGS\s0\*(R" for more information on filing bugs and getting help. .SH "DESCRIPTION" .IX Header "DESCRIPTION" Sometimes a problem happens infrequently and for a short time, giving you no chance to see the system when it happens. How do you solve intermittent MySQL problems when you can't observe them? That's why pt-stalk exists. In addition to using it when there's a known problem on your servers, it is a good idea to run pt-stalk all the time, even when you think nothing is wrong. You will appreciate the data it gathers when a problem occurs, because problems such as MySQL lockups or spikes of activity typically leave no evidence to use in root cause analysis. .PP This tool does two things: it watches a server (typically MySQL) for a trigger to occur, and it gathers diagnostic data. To use it effectively, you need to define a good trigger condition. A good trigger is sensitive enough to fire reliably when a problem occurs, so that you don't miss a chance to solve problems. On the other hand, a good trigger isn't prone to false positives, so you don't gather information when the server is functioning normally. .PP The most reliable triggers for MySQL tend to be the number of connections to the server, and the number of queries running concurrently. These are available in the \s-1SHOW\s0 \s-1GLOBAL\s0 \s-1STATUS\s0 command as Threads_connected and Threads_running. Sometimes Threads_connected is not a reliable indicator of trouble, but Threads_running usually is. Your job, as the tool's user, is to define an appropriate trigger condition for the tool. Choose carefully, because the quality of your results will depend on the trigger you choose. .PP You can define the trigger with the \*(L"\-\-function\*(R", \*(L"\-\-variable\*(R", and \&\*(L"\-\-threshold\*(R" options, among others. Please read the documentation for \&\-\-function to learn how to do this. .PP The pt-stalk tool, by default, simply watches MySQL repeatedly until the trigger becomes true. It then gathers diagnostics for a while, and sleeps afterwards for some time to prevent repeatedly gathering data if the condition remains true. In crude pseudocode, omitting some subtleties, .PP .Vb 12 \& while true; do \& if \-\-variable from \-\-function is greater than \-\-threshold; then \& observations++ \& if observations is greater than \-\-cycles; then \& capture diagnostics for \-\-run\-time seconds \& exit if \-\-iterations is exceeded \& sleep for \-\-sleep seconds \& done \& done \& clean up data that\*(Aqs older than \-\-retention\-time \& sleep for \-\-interval seconds \& done .Ve .PP The diagnostic data is written to files whose names begin with a timestamp, so you can distinguish samples from each other in case the tool collects data multiple times. The pt-sift tool is designed to help you browse and analyze the resulting samples of data. .PP Although this sounds simple enough, in practice there are a number of subtleties, such as detecting when the disk is beginning to fill up so that the tool doesn't cause the server to run out of disk space. This tool handles these types of potential problems, so it's a good idea to use this tool instead of writing something from scratch and possibly experiencing some of the hazards this tool is designed to prevent. .SH "CONFIGURING" .IX Header "CONFIGURING" You can use standard Percona Toolkit configuration files to set commandline options. .PP You will probably want to run the tool as a daemon and customize at least the diagnostic threshold. Here's a sample configuration file for triggering when there are more than 20 queries running at once: .PP .Vb 2 \& daemonize \& threshold=20 .Ve .PP If you're not running the tool as it's designed (as a root user, daemonized) then you'll need to set several options, such as \*(L"\-\-dest\*(R", to locations that are writable by non-root users. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\-\-collect" 4 .IX Item "--collect" default: yes; negatable: yes .Sp Collect system information. You can negate this option to make the tool watch the system but not actually gather any diagnostic data. .Sp See also \*(L"\-\-stalk\*(R". .IP "\-\-collect\-gdb" 4 .IX Item "--collect-gdb" Collect \s-1GDB\s0 stacktraces. This is achieved by attaching to MySQL and printing stack traces from all threads. This will freeze the server for some period of time, ranging from a second or so to much longer on very busy systems with a lot of memory and many threads in the server. For this reason, it is disabled by default. However, if you are trying to diagnose a server stall or lockup, freezing the server causes no additional harm, and the stack traces can be vital for diagnosis. .Sp In addition to freezing the server, there is also some risk of the server crashing or performing badly after \s-1GDB\s0 detaches from it. .IP "\-\-collect\-oprofile" 4 .IX Item "--collect-oprofile" Collect oprofile data. This is achieved by starting an oprofile session, letting it run for the collection time, and then stopping and saving the resulting profile data in the system's default location. Please read your system's oprofile documentation to learn more about this. .IP "\-\-collect\-strace" 4 .IX Item "--collect-strace" Collect strace data. This is achieved by attaching strace to the server, which will make it run very slowly until strace detaches. The same cautions apply as those listed in \-\-collect\-gdb. You should not enable this option together with \&\-\-collect\-gdb, because \s-1GDB\s0 and strace can't attach to the server process simultaneously. .IP "\-\-collect\-tcpdump" 4 .IX Item "--collect-tcpdump" Collect tcpdump data. This option causes tcpdump to capture all traffic on all interfaces for the port on which MySQL is listening. You can later use pt-query-digest to decode the MySQL protocol and extract a log of query traffic from it. .IP "\-\-config" 4 .IX Item "--config" type: string .Sp Read this comma-separated list of config files. If specified, this must be the first option on the command line. .IP "\-\-cycles" 4 .IX Item "--cycles" type: int; default: 5 .Sp The number of times the trigger condition must be true before collecting data. This helps prevent false positives, and makes the trigger condition less likely to fire when the problem recovers quickly. .IP "\-\-daemonize" 4 .IX Item "--daemonize" Daemonize the tool. This causes the tool to fork into the background and log its output as specified in \-\-log. .IP "\-\-dest" 4 .IX Item "--dest" type: string; default: /var/lib/pt\-stalk .Sp Where to store the diagnostic data. Each time the tool collects data, it writes to a new set of files, which are named with the current system timestamp. .IP "\-\-disk\-bytes\-free" 4 .IX Item "--disk-bytes-free" type: size; default: 100M .Sp Don't collect data if the disk has less than this much free space. This prevents the tool from filling up the disk with diagnostic data. .Sp If the \*(L"\-\-dest\*(R" directory contains a previously captured sample of data, the tool will measure its size and use that as an estimate of how much data is likely to be gathered this time, too. It will then be even more pessimistic, and will refuse to collect data unless the disk has enough free space to hold the sample and still have the desired amount of free space. For example, if you'd like 100MB of free space and the previous diagnostic sample consumed 100MB, the tool won't collect any data unless the disk has 200MB free. .Sp Valid size value suffixes are k, M, G, and T. .IP "\-\-disk\-pct\-free" 4 .IX Item "--disk-pct-free" type: int; default: 5 .Sp Don't collect data if the disk has less than this percent free space. This prevents the tool from filling up the disk with diagnostic data. .Sp This option works similarly to \*(L"\-\-disk\-bytes\-free\*(R" but specifies a percentage margin of safety instead of a bytes margin of safety. The tool honors both options, and will not collect any data unless both margins are satisfied. .IP "\-\-function" 4 .IX Item "--function" type: string; default: status .Sp Specifies what to watch for a diagnostic trigger. The default value watches \&\s-1SHOW\s0 \s-1GLOBAL\s0 \s-1STATUS\s0, but you can also watch \s-1SHOW\s0 \s-1PROCESSLIST\s0 or supply a plugin file with your own custom code. This function supplies the value of \&\*(L"\-\-variable\*(R", which is then compared against \*(L"\-\-threshold\*(R" to see if the trigger condition is met. Additional options may be required as well; see below. Possible values: .RS 4 .IP "\(bu" 4 status .Sp This value specifies that the source of data for the diagnostic trigger is \s-1SHOW\s0 \&\s-1GLOBAL\s0 \s-1STATUS\s0. The value of \*(L"\-\-variable\*(R" then defines which status counter is the trigger. .IP "\(bu" 4 processlist .Sp This value specifies that the data for the diagnostic trigger comes from \s-1SHOW\s0 \&\s-1FULL\s0 \s-1PROCESSLIST\s0. The trigger value is the count of processes whose \&\*(L"\-\-variable\*(R" column matches the \*(L"\-\-match\*(R" option. For example, to trigger when more than 10 processes are in the \*(L"statistics\*(R" state, use the following options: .Sp .Vb 2 \& \-\-function processlist \-\-variable State \e \& \-\-match statistics \-\-threshold 10 .Ve .RE .RS 4 .Sp In addition, you can specify a file that contains your custom trigger function, written in Unix shell script. This can be a wrapper that executes anything you wish. If the argument to \-\-function is a file, then it takes precedence over builtin functions, so if there is a file in the working directory named \*(L"status\*(R" or \*(L"processlist\*(R" then the tool will use that file as a plugin, even though those are otherwise recognized as reserved words for this option. .Sp The plugin file works by providing a function called \f(CW\*(C`trg_plugin\*(C'\fR, and the tool simply sources the file and executes the function. For example, the function might look like the following: .Sp .Vb 4 \& trg_plugin() { \& mysql $EXT_ARGV \-e "SHOW ENGINE INNODB STATUS" \e \& | grep \-c "has waited at" \& } .Ve .Sp This snippet will count the number of mutex waits inside of InnoDB. It illustrates the general principle: the function must output a number, which is then compared to the threshold as usual. The \f(CW$EXT_ARGV\fR variable contains the MySQL options mentioned in the \*(L"\s-1SYNOPSIS\s0\*(R" above. .Sp The plugin should not alter the tool's existing global variables. Prefix any plugin-specific global variables with \*(L"\s-1PLUGIN_\s0\*(R" or make them local. .RE .IP "\-\-help" 4 .IX Item "--help" Print help and exit. .IP "\-\-interval" 4 .IX Item "--interval" type: int; default: 1 .Sp Interval between checks for the diagnostic trigger. .IP "\-\-iterations" 4 .IX Item "--iterations" type: int .Sp Exit after collecting diagnostics this many times. By default, the tool will continue to watch the server forever, but this is useful for scenarios where you want to capture once and then exit, for example. .IP "\-\-log" 4 .IX Item "--log" type: string; default: /var/log/pt\-stalk.log .Sp Print all output to this file when daemonized. .IP "\-\-match" 4 .IX Item "--match" type: string .Sp The pattern to use when watching \s-1SHOW\s0 \s-1PROCESSLIST\s0. See the documentation for \&\*(L"\-\-function\*(R" for details. .IP "\-\-notify\-by\-email" 4 .IX Item "--notify-by-email" type: string .Sp Send mail to this list of addresses when data is collected. .IP "\-\-pid" 4 .IX Item "--pid" type: string; default: /var/run/pt\-stalk.pid .Sp Create a \s-1PID\s0 file when daemonized. .IP "\-\-prefix" 4 .IX Item "--prefix" type: string .Sp The filename prefix for diagnostic samples. By default, samples have a timestamp prefix based on the current local time, such as 2011_12_06_14_02_02, which is December 6, 2011 at 14:02:02. .IP "\-\-retention\-time" 4 .IX Item "--retention-time" type: int; default: 30 .Sp Number of days to retain collected samples. Any samples that are older will be purged. .IP "\-\-run\-time" 4 .IX Item "--run-time" type: int; default: 30 .Sp How long the tool will collect data when it triggers. This should not be longer than \*(L"\-\-sleep\*(R". It is usually not necessary to change this; if the default 30 seconds hasn't gathered enough diagnostic data, running longer is not likely to do so. In fact, in many cases a shorter collection period is appropriate. .IP "\-\-sleep" 4 .IX Item "--sleep" type: int; default: 300 .Sp How long to sleep after collecting data. This prevents the tool from triggering continuously, which might be a problem if the collection process is intrusive. It also prevents filling up the disk or gathering too much data to analyze reasonably. .IP "\-\-stalk" 4 .IX Item "--stalk" default: yes; negatable: yes .Sp Watch the server and wait for the trigger to occur. You can negate this option to make the tool immediately gather any diagnostic data once and exit. This is useful if a problem is already happening, but pt-stalk is not running, so you only want to collect diagnostic data. .Sp If this option is negate, \*(L"\-\-daemonize\*(R", \*(L"\-\-log\*(R", \*(L"\-\-pid\*(R", and other stalking-related options have no effect; the tool simply collects diagnostic data and exits. Safeguard options, like \*(L"\-\-disk\-bytes\-free\*(R" and \&\*(L"\-\-disk\-pct\-free\*(R", are still respected. .Sp See also \*(L"\-\-collect\*(R". .IP "\-\-threshold" 4 .IX Item "--threshold" type: int; default: 25 .Sp The threshold at which the diagnostic trigger should fire. See \*(L"\-\-function\*(R" for details. .IP "\-\-variable" 4 .IX Item "--variable" type: string; default: Threads_running .Sp The variable to compare against the threshold. See \*(L"\-\-function\*(R" for details. .IP "\-\-version" 4 .IX Item "--version" Print tool's version and exit. .SH "ENVIRONMENT" .IX Header "ENVIRONMENT" This tool does not use any environment variables for configuration. .SH "SYSTEM REQUIREMENTS" .IX Header "SYSTEM REQUIREMENTS" This tool requires Bash v3 or newer. .SH "BUGS" .IX Header "BUGS" For a list of known bugs, see http://www.percona.com/bugs/pt\-stalk . .PP Please report bugs at https://bugs.launchpad.net/percona\-toolkit . Include the following information in your bug report: .IP "\(bu" 4 Complete command-line used to run the tool .IP "\(bu" 4 Tool \*(L"\-\-version\*(R" .IP "\(bu" 4 MySQL version of all servers involved .IP "\(bu" 4 Output from the tool including \s-1STDERR\s0 .IP "\(bu" 4 Input files (log/dump/config files, etc.) .PP If possible, include debugging output by running the tool with \f(CW\*(C`PTDEBUG\*(C'\fR; see \*(L"\s-1ENVIRONMENT\s0\*(R". .SH "DOWNLOADING" .IX Header "DOWNLOADING" Visit http://www.percona.com/software/percona\-toolkit/ to download the latest release of Percona Toolkit. Or, get the latest release from the command line: .PP .Vb 1 \& wget percona.com/get/percona\-toolkit.tar.gz \& \& wget percona.com/get/percona\-toolkit.rpm \& \& wget percona.com/get/percona\-toolkit.deb .Ve .PP You can also get individual tools from the latest release: .PP .Vb 1 \& wget percona.com/get/TOOL .Ve .PP Replace \f(CW\*(C`TOOL\*(C'\fR with the name of any tool. .SH "AUTHORS" .IX Header "AUTHORS" Baron Schwartz, Justin Swanhart, Fernando Ipar, and Daniel Nichter .SH "ABOUT PERCONA TOOLKIT" .IX Header "ABOUT PERCONA TOOLKIT" This tool is part of Percona Toolkit, a collection of advanced command-line tools developed by Percona for MySQL support and consulting. Percona Toolkit was forked from two projects in June, 2011: Maatkit and Aspersa. Those projects were created by Baron Schwartz and developed primarily by him and Daniel Nichter, both of whom are employed by Percona. Visit for more software developed by Percona. .SH "COPYRIGHT, LICENSE, AND WARRANTY" .IX Header "COPYRIGHT, LICENSE, AND WARRANTY" This program is copyright 2010\-2011 Baron Schwartz, 2011\-2012 Percona Inc. Feedback and improvements are welcome. .PP \&\s-1THIS\s0 \s-1PROGRAM\s0 \s-1IS\s0 \s-1PROVIDED\s0 \*(L"\s-1AS\s0 \s-1IS\s0\*(R" \s-1AND\s0 \s-1WITHOUT\s0 \s-1ANY\s0 \s-1EXPRESS\s0 \s-1OR\s0 \s-1IMPLIED\s0 \&\s-1WARRANTIES\s0, \s-1INCLUDING\s0, \s-1WITHOUT\s0 \s-1LIMITATION\s0, \s-1THE\s0 \s-1IMPLIED\s0 \s-1WARRANTIES\s0 \s-1OF\s0 \&\s-1MERCHANTABILITY\s0 \s-1AND\s0 \s-1FITNESS\s0 \s-1FOR\s0 A \s-1PARTICULAR\s0 \s-1PURPOSE\s0. .PP This program is free software; you can redistribute it and/or modify it under the terms of the \s-1GNU\s0 General Public License as published by the Free Software Foundation, version 2; \s-1OR\s0 the Perl Artistic License. On \s-1UNIX\s0 and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses. .PP You should have received a copy of the \s-1GNU\s0 General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, \s-1MA\s0 02111\-1307 \s-1USA\s0. .SH "VERSION" .IX Header "VERSION" pt-stalk 2.1.2