NAME¶
pt-table-checksum - Verify MySQL replication integrity.
SYNOPSIS¶
Usage: pt-table-checksum [OPTION...] [DSN]
pt-table-checksum performs an online replication consistency check by executing
checksum queries on the master, which produces different results on replicas
that are inconsistent with the master. The optional DSN specifies the master
host. The tool's exit status is nonzero if any differences are found, or if
any warnings or errors occur.
The following command will connect to the replication master on localhost,
checksum every table, and report the results on every detected replica:
pt-table-checksum
This tool is focused on finding data differences efficiently. If any data is
different, you can resolve the problem with pt-table-sync.
RISKS¶
The following section is included to inform users about the potential risks,
whether known or unknown, of using this tool. The two main categories of risks
are those created by the nature of the tool (e.g. read-only tools vs.
read-write tools) and those created by bugs.
pt-table-checksum can add load to the MySQL server, although it has many
safeguards to prevent this. It inserts a small amount of data into a table
that contains checksum results. It has checks that, if disabled, can
potentially cause replication to fail when unsafe replication options are
used. In short, it is safe by default, but it permits you to turn off its
safety checks.
The tool presumes that schemas and tables are identical on the master and all
replicas. Replication will break if, for example, a replica does not have a
schema that exists on the master (and that schema is checksummed), or if the
structure of a table on a replica is different than on the master.
At the time of this release, we know of no bugs that could cause harm to users.
The authoritative source for updated information is always the online issue
tracking system. Issues that affect this tool will be marked as such. You can
see a list of such issues at the following URL:
http://www.percona.com/bugs/pt-table-checksum
<
http://www.percona.com/bugs/pt-table-checksum>.
See also "BUGS" for more information on filing bugs and getting help.
DESCRIPTION¶
pt-table-checksum is designed to do the right thing by default in almost every
case. When in doubt, use "--explain" to see how the tool will
checksum a table. The following is a high-level overview of how the tool
functions.
In contrast to older versions of pt-table-checksum, this tool is focused on a
single purpose, and does not have a lot of complexity or support many
different checksumming techniques. It executes checksum queries on only one
server, and these flow through replication to re-execute on replicas. If you
need the older behavior, you can use Percona Toolkit version 1.0.
pt-table-checksum connects to the server you specify, and finds databases and
tables that match the filters you specify (if any). It works one table at a
time, so it does not accumulate large amounts of memory or do a lot of work
before beginning to checksum. This makes it usable on very large servers. We
have used it on servers with hundreds of thousands of databases and tables,
and trillions of rows. No matter how large the server is, pt-table-checksum
works equally well.
One reason it can work on very large tables is that it divides each table into
chunks of rows, and checksums each chunk with a single REPLACE..SELECT query.
It varies the chunk size to make the checksum queries run in the desired
amount of time. The goal of chunking the tables, instead of doing each table
with a single big query, is to ensure that checksums are unintrusive and don't
cause too much replication lag or load on the server. That's why the target
time for each chunk is 0.5 seconds by default.
The tool keeps track of how quickly the server is able to execute the queries,
and adjusts the chunks as it learns more about the server's performance. It
uses an exponentially decaying weighted average to keep the chunk size stable,
yet remain responsive if the server's performance changes during checksumming
for any reason. This means that the tool will quickly throttle itself if your
server becomes heavily loaded during a traffic spike or a background task, for
example.
Chunking is accomplished by a technique that we used to call
"nibbling" in other tools in Percona Toolkit. It is the same
technique used for pt-archiver, for example. The legacy chunking algorithms
used in older versions of pt-table-checksum are removed, because they did not
result in predictably sized chunks, and didn't work well on many tables. All
that is required to divide a table into chunks is an index of some sort
(preferably a primary key or unique index). If there is no index, and the
table contains a suitably small number of rows, the tool will checksum the
table in a single chunk.
pt-table-checksum has many other safeguards to ensure that it does not interfere
with any server's operation, including replicas. To accomplish this,
pt-table-checksum detects replicas and connects to them automatically. (If
this fails, you can give it a hint with the "--recursion-method"
option.)
The tool monitors replicas continually. If any replica falls too far behind in
replication, pt-table-checksum pauses to allow it to catch up. If any replica
has an error, or replication stops, pt-table-checksum pauses and waits. In
addition, pt-table-checksum looks for common causes of problems, such as
replication filters, and refuses to operate unless you force it to.
Replication filters are dangerous, because the queries that pt-table-checksum
executes could potentially conflict with them and cause replication to fail.
pt-table-checksum verifies that chunks are not too large to checksum safely. It
performs an EXPLAIN query on each chunk, and skips chunks that might be larger
than the desired number of rows. You can configure the sensitivity of this
safeguard with the "--chunk-size-limit" option. If a table will be
checksummed in a single chunk because it has a small number of rows, then
pt-table-checksum additionally verifies that the table isn't oversized on
replicas. This avoids the following scenario: a table is empty on the master
but is very large on a replica, and is checksummed in a single large query,
which causes a very long delay in replication.
There are several other safeguards. For example, pt-table-checksum sets its
session-level innodb_lock_wait_timeout to 1 second, so that if there is a lock
wait, it will be the victim instead of causing other queries to time out.
Another safeguard checks the load on the database server, and pauses if the
load is too high. There is no single right answer for how to do this, but by
default pt-table-checksum will pause if there are more than 25 concurrently
executing queries. You should probably set a sane value for your server with
the "--max-load" option.
Checksumming usually is a low-priority task that should yield to other work on
the server. However, a tool that must be restarted constantly is difficult to
use. Thus, pt-table-checksum is very resilient to errors. For example, if the
database administrator needs to kill pt-table-checksum's queries for any
reason, that is not a fatal error. Users often run pt-kill to kill any
long-running checksum queries. The tool will retry a killed query once, and if
it fails again, it will move on to the next chunk of that table. The same
behavior applies if there is a lock wait timeout. The tool will print a
warning if such an error happens, but only once per table. If the connection
to any server fails, pt-table-checksum will attempt to reconnect and continue
working.
If pt-table-checksum encounters a condition that causes it to stop completely,
it is easy to resume it with the "--resume" option. It will begin
from the last chunk of the last table that it processed. You can also safely
stop the tool with CTRL-C. It will finish the chunk it is currently
processing, and then exit. You can resume it as usual afterwards.
After pt-table-checksum finishes checksumming all of the chunks in a table, it
pauses and waits for all detected replicas to finish executing the checksum
queries. Once that is finished, it checks all of the replicas to see if they
have the same data as the master, and then prints a line of output with the
results. You can see a sample of its output later in this documentation.
The tool prints progress indicators during time-consuming operations. It prints
a progress indicator as each table is checksummed. The progress is computed by
the estimated number of rows in the table. It will also print a progress
report when it pauses to wait for replication to catch up, and when it is
waiting to check replicas for differences from the master. You can make the
output less verbose with the "--quiet" option.
If you wish, you can query the checksum tables manually to get a report of which
tables and chunks have differences from the master. The following query will
report every database and table with differences, along with a summary of the
number of chunks and rows possibly affected:
SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM percona.checksums
WHERE (
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;
The table referenced in that query is the checksum table, where the checksums
are stored. Each row in the table contains the checksum of one chunk of data
from some table in the server.
Version 2.0 of pt-table-checksum is not backwards compatible with pt-table-sync
version 1.0. In some cases this is not a serious problem. Adding a
"boundaries" column to the table, and then updating it with a
manually generated WHERE clause, may suffice to let pt-table-sync version 1.0
interoperate with pt-table-checksum version 2.0. Assuming an integer primary
key named 'id', You can try something like the following:
ALTER TABLE checksums ADD boundaries VARCHAR(500);
UPDATE checksums
SET boundaries = COALESCE(CONCAT('id BETWEEN ', lower_boundary,
' AND ', upper_boundary), '1=1');
OUTPUT¶
The tool prints tabular results, one line per table:
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
10-20T08:36:50 0 0 200 1 0 0.005 db1.tbl1
10-20T08:36:50 0 0 603 7 0 0.035 db1.tbl2
10-20T08:36:50 0 0 16 1 0 0.003 db2.tbl3
10-20T08:36:50 0 0 600 6 0 0.024 db2.tbl4
Errors, warnings, and progress reports are printed to standard error. See also
"--quiet".
Each table's results are printed when the tool finishes checksumming the table.
The columns are as follows:
- TS
- The timestamp (without the year) when the tool finished
checksumming the table.
- ERRORS
- The number of errors and warnings that occurred while
checksumming the table. Errors and warnings are printed to standard error
while the table is in progress.
- DIFFS
- The number of chunks that differ from the master on one or
more replicas. If "--no-replicate-check" is specified, this
column will always have zeros. If "--replicate-check-only" is
specified, then only tables with differences are printed.
- ROWS
- The number of rows selected and checksummed from the table.
It might be different from the number of rows in the table if you use the
--where option.
- CHUNKS
- The number of chunks into which the table was divided.
- SKIPPED
- The number of chunks that were skipped due to errors or
warnings, or because they were oversized.
- TIME
- The time elapsed while checksumming the table.
- TABLE
- The database and table that was checksummed.
If "--replicate-check-only" is specified, only checksum differences on
detected replicas are printed. The output is different: one paragraph per
replica, one checksum difference per line, and values are separted by spaces:
Differences on h=127.0.0.1,P=12346
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
db1.tbl1 1 0 1 PRIMARY 1 100
db1.tbl1 6 0 1 PRIMARY 501 600
Differences on h=127.0.0.1,P=12347
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
db1.tbl1 1 0 1 PRIMARY 1 100
db2.tbl2 9 5 0 PRIMARY 101 200
The first line of a paragraph indicates the replica with differences. In this
example there are two: h=127.0.0.1,P=12346 and h=127.0.0.1,P=12347. The
columns are as follows:
- TABLE
- The database and table that differs from the master.
- CHUNK
- The chunk number of the table that differs from the
master.
- CNT_DIFF
- The number of chunk rows on the replica minus the number of
chunk rows on the master.
- CRC_DIFF
- 1 if the CRC of the chunk on the replica is different than
the CRC of the chunk on the master, else 0.
- CHUNK_INDEX
- The index used to chunk the table.
- LOWER_BOUNDARY
- The index values that define the lower boundary of the
chunk.
- UPPER_BOUNDARY
- The index values that define the upper boundary of the
chunk.
EXIT STATUS¶
A non-zero exit status indicates errors, warnings, or checksum differences.
OPTIONS¶
This tool accepts additional command-line arguments. Refer to the
"SYNOPSIS" and usage information for details.
- --ask-pass
- group: Connection
Prompt for a password when connecting to MySQL.
- --check-interval
- type: time; default: 1; group: Throttle
Sleep time between checks for "--max-lag".
- --[no]check-plan
- default: yes
Check query execution plans for safety. By default, this option causes
pt-table-checksum to run EXPLAIN before running queries that are meant to
access a small amount of data, but which could access many rows if MySQL
chooses a bad execution plan. These include the queries to determine chunk
boundaries and the chunk queries themselves. If it appears that MySQL will
use a bad query execution plan, the tool will skip the chunk of the table.
The tool uses several heuristics to determine whether an execution plan is
bad. The first is whether EXPLAIN reports that MySQL intends to use the
desired index to access the rows. If MySQL chooses a different index, the
tool considers the query unsafe.
The tool also checks how much of the index MySQL reports that it will use
for the query. The EXPLAIN output shows this in the key_len column. The
tool remembers the largest key_len seen, and skips chunks where MySQL
reports that it will use a smaller prefix of the index. This heuristic can
be understood as skipping chunks that have a worse execution plan than
other chunks.
The tool prints a warning the first time a chunk is skipped due to a bad
execution plan in each table. Subsequent chunks are skipped silently,
although you can see the count of skipped chunks in the SKIPPED column in
the tool's output.
This option adds some setup work to each table and chunk. Although the work
is not intrusive for MySQL, it results in more round-trips to the server,
which consumes time. Making chunks too small will cause the overhead to
become relatively larger. It is therefore recommended that you not make
chunks too small, because the tool may take a very long time to complete
if you do.
- --[no]check-replication-filters
- default: yes; group: Safety
Do not checksum if any replication filters are set on any replicas. The tool
looks for server options that filter replication, such as binlog_ignore_db
and replicate_do_db. If it finds any such filters, it aborts with an
error.
If the replicas are configured with any filtering options, you should be
careful not to checksum any databases or tables that exist on the master
and not the replicas. Changes to such tables might normally be skipped on
the replicas because of the filtering options, but the checksum queries
modify the contents of the table that stores the checksums, not the tables
whose data you are checksumming. Therefore, these queries will be executed
on the replica, and if the table or database you're checksumming does not
exist, the queries will cause replication to fail. For more information on
replication rules, see http://dev.mysql.com/doc/en/replication-rules.html
<http://dev.mysql.com/doc/en/replication-rules.html>.
Replication filtering makes it impossible to be sure that the checksum
queries won't break replication (or simply fail to replicate). If you are
sure that it's OK to run the checksum queries, you can negate this option
to disable the checks. See also "--replicate-database".
- --check-slave-lag
- type: string; group: Throttle
Pause checksumming until this replica's lag is less than
"--max-lag". The value is a DSN that inherits properties from
the master host and the connection options ("--port",
"--user", etc.). This option overrides the normal behavior of
finding and continually monitoring replication lag on ALL connected
replicas. If you don't want to monitor ALL replicas, but you want more
than just one replica to be monitored, then use the DSN option to the
"--recursion-method" option instead of this option.
- --chunk-index
- type: string
Prefer this index for chunking tables. By default, pt-table-checksum chooses
the most appropriate index for chunking. This option lets you specify the
index that you prefer. If the index doesn't exist, then pt-table-checksum
will fall back to its default behavior of choosing an index.
pt-table-checksum adds the index to the checksum SQL statements in a
"FORCE INDEX" clause. Be careful when using this option; a poor
choice of index could cause bad performance. This is probably best to use
when you are checksumming only a single table, not an entire server.
- --chunk-index-columns
- type: int
Use only this many left-most columns of a "--chunk-index". This
works only for compound indexes, and is useful in cases where a bug in the
MySQL query optimizer (planner) causes it to scan a large range of rows
instead of using the index to locate starting and ending points precisely.
This problem sometimes occurs on indexes with many columns, such as 4 or
more. If this happens, the tool might print a warning related to the
"--[no]check-plan" option. Instructing the tool to use only the
first N columns of the index is a workaround for the bug in some
cases.
- --chunk-size
- type: size; default: 1000
Number of rows to select for each checksum query. Allowable suffixes are k,
M, G. You should not use this option in most cases; prefer
"--chunk-time" instead.
This option can override the default behavior, which is to adjust chunk size
dynamically to try to make chunks run in exactly "--chunk-time"
seconds. When this option isn't set explicitly, its default value is used
as a starting point, but after that, the tool ignores this option's value.
If you set this option explicitly, however, then it disables the dynamic
adjustment behavior and tries to make all chunks exactly the specified
number of rows.
There is a subtlety: if the chunk index is not unique, then it's possible
that chunks will be larger than desired. For example, if a table is
chunked by an index that contains 10,000 of a given value, there is no way
to write a WHERE clause that matches only 1,000 of the values, and that
chunk will be at least 10,000 rows large. Such a chunk will probably be
skipped because of "--chunk-size-limit".
Selecting a small chunk size will cause the tool to become much slower, in
part because of the setup work required for
"--[no]check-plan".
- --chunk-size-limit
- type: float; default: 2.0; group: Safety
Do not checksum chunks this much larger than the desired chunk size.
When a table has no unique indexes, chunk sizes can be inaccurate. This
option specifies a maximum tolerable limit to the inaccuracy. The tool
uses <EXPLAIN> to estimate how many rows are in the chunk. If that
estimate exceeds the desired chunk size times the limit (twice as large,
by default), then the tool skips the chunk.
The minimum value for this option is 1, which means that no chunk can be
larger than "--chunk-size". You probably don't want to specify
1, because rows reported by EXPLAIN are estimates, which can be different
from the real number of rows in the chunk. If the tool skips too many
chunks because they are oversized, you might want to specify a value
larger than the default of 2.
You can disable oversized chunk checking by specifying a value of 0.
- --chunk-time
- type: float; default: 0.5
Adjust the chunk size dynamically so each checksum query takes this long to
execute.
The tool tracks the checksum rate (rows per second) for all tables and each
table individually. It uses these rates to adjust the chunk size after
each checksum query, so that the next checksum query takes this amount of
time (in seconds) to execute.
The algorithm is as follows: at the beginning of each table, the chunk size
is initialized from the overall average rows per second since the tool
began working, or the value of "--chunk-size" if the tool hasn't
started working yet. For each subsequent chunk of a table, the tool
adjusts the chunk size to try to make queries run in the desired amount of
time. It keeps an exponentially decaying moving average of queries per
second, so that if the server's performance changes due to changes in
server load, the tool adapts quickly. This allows the tool to achieve
predictably timed queries for each table, and for the server overall.
If this option is set to zero, the chunk size doesn't auto-adjust, so query
checksum times will vary, but query checksum sizes will not. Another way
to do the same thing is to specify a value for "--chunk-size"
explicitly, instead of leaving it at the default.
- --columns
- short form: -c; type: array; group: Filter
Checksum only this comma-separated list of columns.
- --config
- type: Array; group: Config
Read this comma-separated list of config files; if specified, this must be
the first option on the command line.
- --[no]create-replicate-table
- default: yes
Create the "--replicate" database and table if they do not exist.
The structure of the replicate table is the same as the suggested table
mentioned in "--replicate".
- --databases
- short form: -d; type: hash; group: Filter
Only checksum this comma-separated list of databases.
- --databases-regex
- type: string; group: Filter
Only checksum databases whose names match this Perl regex.
- --defaults-file
- short form: -F; type: string; group: Connection
Only read mysql options from the given file. You must give an absolute
pathname.
- --[no]empty-replicate-table
- default: yes
Delete previous checksums for each table before checksumming the table. This
option does not truncate the entire table, it only deletes rows
(checksums) for each table just before checksumming the table. Therefore,
if checksumming stops prematurely and there was preexisting data, there
will still be rows for tables that were not checksummed before the tool
was stopped.
If you're resuming from a previous checksum run, then the checksum records
for the table from which the tool resumes won't be emptied.
- --engines
- short form: -e; type: hash; group: Filter
Only checksum tables which use these storage engines.
- --explain
- cumulative: yes; default: 0; group: Output
Show, but do not execute, checksum queries (disables
"--[no]empty-replicate-table"). If specified twice, the tool
actually iterates through the chunking algorithm, printing the upper and
lower boundary values for each chunk, but not executing the checksum
queries.
- --float-precision
- type: int
Precision for FLOAT and DOUBLE number-to-string conversion. Causes FLOAT and
DOUBLE values to be rounded to the specified number of digits after the
decimal point, with the ROUND() function in MySQL. This can help
avoid checksum mismatches due to different floating-point representations
of the same values on different MySQL versions and hardware. The default
is no rounding; the values are converted to strings by the CONCAT()
function, and MySQL chooses the string representation. If you specify a
value of 2, for example, then the values 1.008 and 1.009 will be rounded
to 1.01, and will checksum as equal.
- --function
- type: string
Hash function for checksums (FNV1A_64, MURMUR_HASH, SHA1, MD5, CRC32, etc).
The default is to use CRC32(), but MD5() and SHA1()
also work, and you can use your own function, such as a compiled UDF, if
you wish. The function you specify is run in SQL, not in Perl, so it must
be available to MySQL.
MySQL doesn't have good built-in hash functions that are fast.
CRC32() is too prone to hash collisions, and MD5() and
SHA1() are very CPU-intensive. The FNV1A_64() UDF that is
distributed with Percona Server is a faster alternative. It is very simple
to compile and install; look at the header in the source code for
instructions. If it is installed, it is preferred over MD5(). You
can also use the MURMUR_HASH() function if you compile and install
that as a UDF; the source is also distributed with Percona Server, and it
might be better than FNV1A_64().
- --help
- group: Help
Show help and exit.
- --host
- short form: -h; type: string; default: localhost; group:
Connection
Host to connect to.
- --ignore-columns
- type: Hash; group: Filter
Ignore this comma-separated list of columns when calculating the
checksum.
- --ignore-databases
- type: Hash; group: Filter
Ignore this comma-separated list of databases.
- --ignore-databases-regex
- type: string; group: Filter
Ignore databases whose names match this Perl regex.
- --ignore-engines
- type: Hash; default: FEDERATED,MRG_MyISAM; group: Filter
Ignore this comma-separated list of storage engines.
- --ignore-tables
- type: Hash; group: Filter
Ignore this comma-separated list of tables. Table names may be qualified
with the database name. The "--replicate" table is always
automatically ignored.
- --ignore-tables-regex
- type: string; group: Filter
Ignore tables whose names match the Perl regex.
- --lock-wait-timeout
- type: int; default: 1
Set the session value of "innodb_lock_wait_timeout" on the master
host. This option helps guard against long lock waits if the checksum
queries become slow for some reason. Setting this option dynamically
requires the InnoDB plugin, so this works only on newer InnoDB and MySQL
versions. If setting the value fails and the current server value is
greater than the specified value, then a warning is printed; else, if the
current server value is less than or equal to the specified value, no
warning is printed.
- --max-lag
- type: time; default: 1s; group: Throttle
Pause checksumming until all replicas' lag is less than this value. After
each checksum query (each chunk), pt-table-checksum looks at the
replication lag of all replicas to which it connects, using
Seconds_Behind_Master. If any replica is lagging more than the value of
this option, then pt-table-checksum will sleep for
"--check-interval" seconds, then check all replicas again. If
you specify "--check-slave-lag", then the tool only examines
that server for lag, not all servers. If you want to control exactly which
servers the tool monitors, use the DSN value to
"--recursion-method".
The tool waits forever for replicas to stop lagging. If any replica is
stopped, the tool waits forever until the replica is started. Checksumming
continues once all replicas are running and not lagging too much.
The tool prints progress reports while waiting. If a replica is stopped, it
prints a progress report immediately, then again at every progress report
interval.
- --max-load
- type: Array; default: Threads_running=25; group: Throttle
Examine SHOW GLOBAL STATUS after every chunk, and pause if any status
variables are higher than the threshold. The option accepts a
comma-separated list of MySQL status variables to check for a threshold.
An optional "=MAX_VALUE" (or ":MAX_VALUE") can follow
each variable. If not given, the tool determines a threshold by examining
the current value and increasing it by 20%.
For example, if you want the tool to pause when Threads_connected gets too
high, you can specify "Threads_connected", and the tool will
check the current value when it starts working and add 20% to that value.
If the current value is 100, then the tool will pause when
Threads_connected exceeds 120, and resume working when it is below 120
again. If you want to specify an explicit threshold, such as 110, you can
use either "Threads_connected:110" or
"Threads_connected=110".
The purpose of this option is to prevent the tool from adding too much load
to the server. If the checksum queries are intrusive, or if they cause
lock waits, then other queries on the server will tend to block and queue.
This will typically cause Threads_running to increase, and the tool can
detect that by running SHOW GLOBAL STATUS immediately after each checksum
query finishes. If you specify a threshold for this variable, then you can
instruct the tool to wait until queries are running normally again. This
will not prevent queueing, however; it will only give the server a chance
to recover from the queueing. If you notice queueing, it is best to
decrease the chunk time.
- --password
- short form: -p; type: string; group: Connection
Password to use when connecting.
- --pid
- type: string
Create the given PID file. The file contains the process ID of the script.
The PID file is removed when the script exits. Before starting, the script
checks if the PID file already exists. If it does not, then the script
creates and writes its own PID to it. If it does, then the script checks
the following: if the file contains a PID and a process is running with
that PID, then the script dies; or, if there is no process running with
that PID, then the script overwrites the file with its own PID and starts;
else, if the file contains no PID, then the script dies.
- --port
- short form: -P; type: int; group: Connection
Port number to use for connection.
- --progress
- type: array; default: time,30
Print progress reports to STDERR.
The value is a comma-separated list with two parts. The first part can be
percentage, time, or iterations; the second part specifies how often an
update should be printed, in percentage, seconds, or number of iterations.
The tool prints progress reports for a variety of time-consuming
operations, including waiting for replicas to catch up if they become
lagged.
- --quiet
- short form: -q; cumulative: yes; default: 0
Print only the most important information (disables "--progress").
Specifying this option once causes the tool to print only errors,
warnings, and tables that have checksum differences.
Specifying this option twice causes the tool to print only errors. In this
case, you can use the tool's exit status to determine if there were any
warnings or checksum differences.
- --recurse
- type: int
Number of levels to recurse in the hierarchy when discovering replicas.
Default is infinite. See also "--recursion-method".
- --recursion-method
- type: string
Preferred recursion method for discovering replicas. Possible methods are:
METHOD USES
=========== ==================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
dsn=DSN DSNs from a table
none Do not find slaves
The processlist method is the default, because SHOW SLAVE HOSTS is not
reliable. However, the hosts method can work better if the server uses a
non-standard port (not 3306). The tool usually does the right thing and
finds all replicas, but you may give a preferred method and it will be
used first.
The hosts method requires replicas to be configured with report_host,
report_port, etc.
The dsn method is special: it specifies a table from which other DSN strings
are read. The specified DSN must specify a D and t, or a
database-qualified t. The DSN table should have the following structure:
CREATE TABLE `dsns` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`dsn` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);
To make the tool monitor only the hosts 10.10.1.16 and 10.10.1.17 for
replication lag and checksum differences, insert the values
"h=10.10.1.16" and "h=10.10.1.17" into the table.
Currently, the DSNs are ordered by id, but id and parent_id are otherwise
ignored.
- --replicate
- type: string; default: percona.checksums
Write checksum results to this table. The replicate table must have this
structure (MAGIC_create_replicate):
CREATE TABLE checksums (
db char(64) NOT NULL,
tbl char(64) NOT NULL,
chunk int NOT NULL,
chunk_time float NULL,
chunk_index varchar(200) NULL,
lower_boundary text NULL,
upper_boundary text NULL,
this_crc char(40) NOT NULL,
this_cnt int NOT NULL,
master_crc char(40) NULL,
master_cnt int NULL,
ts timestamp NOT NULL,
PRIMARY KEY (db, tbl, chunk),
INDEX ts_db_tbl (ts, db, tbl)
) ENGINE=InnoDB;
By default, "--[no]create-replicate-table" is true, so the
database and the table specified by this option are created automatically
if they do not exist.
Be sure to choose an appropriate storage engine for the replicate table. If
you are checksumming InnoDB tables, and you use MyISAM for this table, a
deadlock will break replication, because the mixture of transactional and
non-transactional tables in the checksum statements will cause it to be
written to the binlog even though it had an error. It will then replay
without a deadlock on the replicas, and break replication with
"different error on master and slave." This is not a problem
with pt-table-checksum; it's a problem with MySQL replication, and you can
read more about it in the MySQL manual.
The replicate table is never checksummed (the tool automatically adds this
table to "--ignore-tables").
- --[no]replicate-check
- default: yes
Check replicas for data differences after finishing each table. The tool
finds differences by executing a simple SELECT statement on all detected
replicas. The query compares the replica's checksum results to the
master's checksum results. It reports differences in the DIFFS column of
the output.
- --replicate-check-only
- Check replicas for consistency without executing checksum
queries. This option is used only with "--[no]replicate-check".
If specified, pt-table-checksum doesn't checksum any tables. It checks
replicas for differences found by previous checksumming, and then exits.
It might be useful if you run pt-table-checksum quietly in a cron job, for
example, and later want a report on the results of the cron job, perhaps
to implement a Nagios check.
- --replicate-database
- type: string
USE only this database. By default, pt-table-checksum executes USE to select
the database that contains the table it's currently working on. This is is
a best effort to avoid problems with replication filters such as
binlog_ignore_db and replicate_ignore_db. However, replication filters can
create a situation where there simply is no one right way to do things.
Some statements might not be replicated, and others might cause
replication to fail. In such cases, you can use this option to specify a
default database that pt-table-checksum selects with USE, and never
changes. See also "--[no]check-replication-filters".
- --resume
- Resume checksumming from the last completed chunk (disables
"--[no]empty-replicate-table"). If the tool stops before it
checksums all tables, this option makes checksumming resume from the last
chunk of the last table that it finished.
- --retries
- type: int; default: 2
Retry a chunk this many times when there is a nonfatal error. Nonfatal
errors are problems such as a lock wait timeout or the query being
killed.
- --separator
- type: string; default: #
The separator character used for CONCAT_WS(). This character is used
to join the values of columns when checksumming.
- --set-vars
- type: string; default: wait_timeout=10000; group:
Connection
Set these MySQL variables. Immediately after connecting to MySQL, this
string will be appended to SET and executed.
- --socket
- short form: -S; type: string; group: Connection
Socket file to use for connection.
- --tables
- short form: -t; type: hash; group: Filter
Checksum only this comma-separated list of tables. Table names may be
qualified with the database name.
- --tables-regex
- type: string; group: Filter
Checksum only tables whose names match this Perl regex.
- --trim
- Add TRIM() to VARCHAR columns (helps when comparing
4.1 to >= 5.0). This is useful when you don't care about the trailing
space differences between MySQL versions that vary in their handling of
trailing spaces. MySQL 5.0 and later all retain trailing spaces in
VARCHAR, while previous versions would remove them. These differences will
cause false checksum differences.
- --user
- short form: -u; type: string; group: Connection
User for login if not current user.
- --version
- group: Help
Show version and exit.
- --where
- type: string
Do only rows matching this WHERE clause. You can use this option to limit
the checksum to only part of the table. This is particularly useful if you
have append-only tables and don't want to constantly re-check all rows;
you could run a daily job to just check yesterday's rows, for instance.
This option is much like the -w option to mysqldump. Do not specify the
WHERE keyword. You might need to quote the value. Here is an example:
pt-table-checksum --where "ts > CURRENT_DATE - INTERVAL 1 DAY"
DSN OPTIONS¶
These DSN options are used to create a DSN. Each option is given like
"option=value". The options are case-sensitive, so P and p are not
the same option. There cannot be whitespace before or after the "="
and if the value contains whitespace it must be quoted. DSN options are
comma-separated. See the percona-toolkit manpage for full details.
- •
- A
dsn: charset; copy: yes
Default character set.
- •
- D
copy: no
DSN table database.
- •
- F
dsn: mysql_read_default_file; copy: no
Only read default options from the given file
- •
- h
dsn: host; copy: yes
Connect to host.
- •
- p
dsn: password; copy: yes
Password to use when connecting.
- •
- P
dsn: port; copy: yes
Port number to use for connection.
- •
- S
dsn: mysql_socket; copy: no
Socket file to use for connection.
- •
- t
copy: no
DSN table table.
- •
- u
dsn: user; copy: yes
User for login if not current user.
ENVIRONMENT¶
The environment variable "PTDEBUG" enables verbose debugging output to
STDERR. To enable debugging and capture all output to a file, run the tool
like:
PTDEBUG=1 pt-table-checksum ... > FILE 2>&1
Be careful: debugging output is voluminous and can generate several megabytes of
output.
SYSTEM REQUIREMENTS¶
You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.
BUGS¶
For a list of known bugs, see
http://www.percona.com/bugs/pt-table-checksum
<
http://www.percona.com/bugs/pt-table-checksum>.
Please report bugs at
https://bugs.launchpad.net/percona-toolkit
<
https://bugs.launchpad.net/percona-toolkit>. Include the following
information in your bug report:
- •
- Complete command-line used to run the tool
- •
- Tool "--version"
- •
- MySQL version of all servers involved
- •
- Output from the tool including STDERR
- •
- Input files (log/dump/config files, etc.)
If possible, include debugging output by running the tool with
"PTDEBUG"; see "ENVIRONMENT".
DOWNLOADING¶
Visit
http://www.percona.com/software/percona-toolkit/
<
http://www.percona.com/software/percona-toolkit/> to download the
latest release of Percona Toolkit. Or, get the latest release from the command
line:
wget percona.com/get/percona-toolkit.tar.gz
wget percona.com/get/percona-toolkit.rpm
wget percona.com/get/percona-toolkit.deb
You can also get individual tools from the latest release:
wget percona.com/get/TOOL
Replace "TOOL" with the name of any tool.
AUTHORS¶
Baron Schwartz and Daniel Nichter
ACKNOWLEDGMENTS¶
Claus Jeppesen, Francois Saint-Jacques, Giuseppe Maxia, Heikki Tuuri, James
Briggs, Martin Friebe, and Sergey Zhuravlev
This tool is part of Percona Toolkit, a collection of advanced command-line
tools developed by Percona for MySQL support and consulting. Percona Toolkit
was forked from two projects in June, 2011: Maatkit and Aspersa. Those
projects were created by Baron Schwartz and developed primarily by him and
Daniel Nichter, both of whom are employed by Percona. Visit
<
http://www.percona.com/software/> for more software developed by
Percona.
COPYRIGHT, LICENSE, AND WARRANTY¶
This program is copyright 2007-2011 Baron Schwartz, 2011-2012 Percona Inc.
Feedback and improvements are welcome.
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
systems, you can issue `man perlgpl' or `man perlartistic' to read these
licenses.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA.
VERSION¶
pt-table-checksum 2.1.2