.\" Man page generated from reStructuredText. . . .nr rst2man-indent-level 0 . .de1 rstReportMargin \\$1 \\n[an-margin] level \\n[rst2man-indent-level] level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] - \\n[rst2man-indent0] \\n[rst2man-indent1] \\n[rst2man-indent2] .. .de1 INDENT .\" .rstReportMargin pre: . RS \\$1 . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] . nr rst2man-indent-level +1 .\" .rstReportMargin post: .. .de UNINDENT . RE .\" indent \\n[an-margin] .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] .nr rst2man-indent-level -1 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] .in \\n[rst2man-indent\\n[rst2man-indent-level]]u .. .TH "PGCOPYDB COPY-DB" "1" "Jul 07, 2022" "0.7" "pgcopydb" .SH NAME pgcopydb copy-db \- pgcopydb copy-db .sp pgcopydb copy\-db \- copy an entire Postgres database from source to target .SH SYNOPSIS .sp The command \fBpgcopydb copy\-db\fP copies a database from the given source Postgres instance to the target Postgres instance. .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C pgcopydb copy\-db: Copy an entire database from source to target usage: pgcopydb copy\-db \-\-source \-\-target [ ... ] \-\-source Postgres URI to the source database \-\-target Postgres URI to the target database \-\-dir Work directory to use \-\-table\-jobs Number of concurrent COPY jobs to run \-\-index\-jobs Number of concurrent CREATE INDEX jobs to run \-\-drop\-if\-exists On the target database, clean\-up from a previous run first \-\-no\-owner Do not set ownership of objects to match the original database \-\-no\-acl Prevent restoration of access privileges (grant/revoke commands). \-\-no\-comments Do not output commands to restore comments \-\-skip\-large\-objects Skip copying large objects (blobs) \-\-filters Use the filters defined in \-\-restart Allow restarting when temp files exist already \-\-resume Allow resuming operations after a failure \-\-not\-consistent Allow taking a new snapshot on the source database \-\-snapshot Use snapshot obtained with pg_export_snapshot .ft P .fi .UNINDENT .UNINDENT .SH DESCRIPTION .sp The \fBpgcopydb copy\-db\fP command implements the following steps: .INDENT 0.0 .INDENT 3.5 .INDENT 0.0 .IP 1. 3 \fBpgcopydb\fP calls into \fBpg_dump\fP to produce the \fBpre\-data\fP section and the \fBpost\-data\fP sections of the dump using Postgres custom format. .IP 2. 3 The \fBpre\-data\fP section of the dump is restored on the target database using the \fBpg_restore\fP command, creating all the Postgres objects from the source database into the target database. .IP 3. 3 \fBpgcopydb\fP gets the list of ordinary and partitioned tables and for each of them runs COPY the data from the source to the target in a dedicated sub\-process, and starts and control the sub\-processes until all the data has been copied over. .sp A Postgres connection and a SQL query to the Postgres catalog table pg_class is used to get the list of tables with data to copy around, and the \fIreltuples\fP is used to start with the tables with the greatest number of rows first, as an attempt to minimize the copy time. .IP 4. 3 An auxiliary process is started concurrently to the main COPY workers. This auxiliary process loops through all the Large Objects found on the source database and copies its data parts over to the target database, much like pg_dump itself would. .sp This step is much like \fBpg_dump | pg_restore\fP for large objects data parts, except that there isn\(aqt a good way to do just that with the tooling. .IP 5. 3 In each copy table sub\-process, as soon as the data copying is done, then \fBpgcopydb\fP gets the list of index definitions attached to the current target table and creates them in parallel. .sp The primary indexes are created as UNIQUE indexes at this stage. .IP 6. 3 Then the PRIMARY KEY constraints are created USING the just built indexes. This two\-steps approach allows the primary key index itself to be created in parallel with other indexes on the same table, avoiding an EXCLUSIVE LOCK while creating the index. .IP 7. 3 Then \fBVACUUM ANALYZE\fP is run on each target table as soon as the data and indexes are all created. .IP 8. 3 Then pgcopydb gets the list of the sequences on the source database and for each of them runs a separate query on the source to fetch the \fBlast_value\fP and the \fBis_called\fP metadata the same way that pg_dump does. .sp For each sequence, pgcopydb then calls \fBpg_catalog.setval()\fP on the target database with the information obtained on the source database. .IP 9. 3 The final stage consists now of running the \fBpg_restore\fP command for the \fBpost\-data\fP section script for the whole database, and that\(aqs where the foreign key constraints and other elements are created. .sp The \fIpost\-data\fP script is filtered out using the \fBpg_restore \-\-use\-list\fP option so that indexes and primary key constraints already created in step 4. are properly skipped now. .UNINDENT .UNINDENT .UNINDENT .SH OPTIONS .sp The following options are available to \fBpgcopydb copy\-db\fP: .INDENT 0.0 .TP .B \-\-source Connection string to the source Postgres instance. See the Postgres documentation for \fI\%connection strings\fP for the details. In short both the quoted form \fB"host=... dbname=..."\fP and the URI form \fBpostgres://user@host:5432/dbname\fP are supported. .TP .B \-\-target Connection string to the target Postgres instance. .TP .B \-\-dir During its normal operations pgcopydb creates a lot of temporary files to track sub\-processes progress. Temporary files are created in the directory location given by this option, or defaults to \fB${TMPDIR}/pgcopydb\fP when the environment variable is set, or then to \fB/tmp/pgcopydb\fP\&. .TP .B \-\-table\-jobs How many tables can be processed in parallel. .sp This limit only applies to the COPY operations, more sub\-processes will be running at the same time that this limit while the CREATE INDEX operations are in progress, though then the processes are only waiting for the target Postgres instance to do all the work. .TP .B \-\-index\-jobs How many indexes can be built in parallel, globally. A good option is to set this option to the count of CPU cores that are available on the Postgres target system, minus some cores that are going to be used for handling the COPY operations. .TP .B \-\-drop\-if\-exists When restoring the schema on the target Postgres instance, \fBpgcopydb\fP actually uses \fBpg_restore\fP\&. When this options is specified, then the following pg_restore options are also used: \fB\-\-clean \-\-if\-exists\fP\&. .sp This option is useful when the same command is run several times in a row, either to fix a previous mistake or for instance when used in a continuous integration system. .sp This option causes \fBDROP TABLE\fP and \fBDROP INDEX\fP and other DROP commands to be used. Make sure you understand what you\(aqre doing here! .TP .B \-\-no\-owner Do not output commands to set ownership of objects to match the original database. By default, \fBpg_restore\fP issues \fBALTER OWNER\fP or \fBSET SESSION AUTHORIZATION\fP statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser (or the same user that owns all of the objects in the script). With \fB\-\-no\-owner\fP, any user name can be used for the initial connection, and this user will own all the created objects. .TP .B \-\-skip\-large\-objects Skip copying large objects, also known as blobs, when copying the data from the source database to the target database. .TP .BI \-\-filters \ This option allows to exclude table and indexes from the copy operations. See \fI\%Filtering\fP for details about the expected file format and the filtering options available. .TP .B \-\-restart When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics. .sp In that case, the \fB\-\-restart\fP option can be used to allow pgcopydb to delete traces from a previous run. .TP .B \-\-resume When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C\-c or SIGTERM) or because it crashed, it is possible to resume the database migration. .sp When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using \fB\-\-resume\fP: the COPY command in Postgres is transactional and was rolled back. .sp Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a \fB\-\-resume\fP run only if known to have run through to completion on the previous one. .sp Finally, using \fB\-\-resume\fP requires the use of \fB\-\-not\-consistent\fP\&. .TP .B \-\-not\-consistent In order to be consistent, pgcopydb exports a Postgres snapshot by calling the \fI\%pg_export_snapshot()\fP function on the source database server. The snapshot is then re\-used in all the connections to the source database server by using the \fBSET TRANSACTION SNAPSHOT\fP command. .sp Per the Postgres documentation about \fBpg_export_snapshot\fP: .INDENT 7.0 .INDENT 3.5 Saves the transaction\(aqs current snapshot and returns a text string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it. .UNINDENT .UNINDENT .sp Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot. .TP .B \-\-snapshot Instead of exporting its own snapshot by calling the PostgreSQL function \fBpg_export_snapshot()\fP it is possible for pgcopydb to re\-use an already exported snapshot. .UNINDENT .SH ENVIRONMENT .sp PGCOPYDB_SOURCE_PGURI .INDENT 0.0 .INDENT 3.5 Connection string to the source Postgres instance. When \fB\-\-source\fP is ommitted from the command line, then this environment variable is used. .UNINDENT .UNINDENT .sp PGCOPYDB_TARGET_PGURI .INDENT 0.0 .INDENT 3.5 Connection string to the target Postgres instance. When \fB\-\-target\fP is ommitted from the command line, then this environment variable is used. .UNINDENT .UNINDENT .sp PGCOPYDB_TARGET_TABLE_JOBS .INDENT 0.0 .INDENT 3.5 Number of concurrent jobs allowed to run COPY operations in parallel. When \fB\-\-table\-jobs\fP is ommitted from the command line, then this environment variable is used. .UNINDENT .UNINDENT .sp PGCOPYDB_TARGET_INDEX_JOBS .INDENT 0.0 .INDENT 3.5 Number of concurrent jobs allowed to run CREATE INDEX operations in parallel. When \fB\-\-index\-jobs\fP is ommitted from the command line, then this environment variable is used. .UNINDENT .UNINDENT .sp PGCOPYDB_DROP_IF_EXISTS .INDENT 0.0 .INDENT 3.5 When true (or \fIyes\fP, or \fIon\fP, or 1, same input as a Postgres boolean) then pgcopydb uses the pg_restore options \fB\-\-clean \-\-if\-exists\fP when creating the schema on the target Postgres instance. .UNINDENT .UNINDENT .sp PGCOPYDB_SNAPSHOT .INDENT 0.0 .INDENT 3.5 Postgres snapshot identifier to re\-use, see also \fB\-\-snapshot\fP\&. .UNINDENT .UNINDENT .sp TMPDIR .INDENT 0.0 .INDENT 3.5 The pgcopydb command creates all its work files and directories in \fB${TMPDIR}/pgcopydb\fP, and defaults to \fB/tmp/pgcopydb\fP\&. .UNINDENT .UNINDENT .SH EXAMPLES .INDENT 0.0 .INDENT 3.5 .sp .nf .ft C $ export PGCOPYDB_SOURCE_PGURI="port=54311 host=localhost dbname=pgloader" $ export PGCOPYDB_TARGET_PGURI="port=54311 dbname=plop" $ export PGCOPYDB_DROP_IF_EXISTS=on $ pgcopydb copy\-db \-\-table\-jobs 8 \-\-index\-jobs 12 10:04:49 29268 INFO [SOURCE] Copying database from "port=54311 host=localhost dbname=pgloader" 10:04:49 29268 INFO [TARGET] Copying database into "port=54311 dbname=plop" 10:04:49 29268 INFO Found a stale pidfile at "/tmp/pgcopydb/pgcopydb.pid" 10:04:49 29268 WARN Removing the stale pid file "/tmp/pgcopydb/pgcopydb.pid" 10:04:49 29268 WARN Directory "/tmp/pgcopydb" already exists: removing it entirely 10:04:49 29268 INFO STEP 1: dump the source database schema (pre/post data) \&... 10:04:52 29268 INFO STEP 3: copy data from source to target in sub\-processes 10:04:52 29268 INFO STEP 4: create indexes and constraints in parallel 10:04:52 29268 INFO STEP 5: vacuum analyze each table 10:04:52 29268 INFO Listing ordinary tables in "port=54311 host=localhost dbname=pgloader" 10:04:52 29268 INFO Fetched information for 56 tables \&... 10:04:53 29268 INFO STEP 6: restore the post\-data section to the target database \&... Step Connection Duration Concurrency \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\- Dump Schema source 1s275 1 Prepare Schema target 1s560 1 COPY, INDEX, CONSTRAINTS, VACUUM (wall clock) both 1s095 8 + 12 COPY (cumulative) both 2s645 8 CREATE INDEX (cumulative) target 333ms 12 Finalize Schema target 29ms 1 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\- Total Wall Clock Duration both 4s013 8 + 12 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\- \-\-\-\-\-\-\-\-\-\-\-\- .ft P .fi .UNINDENT .UNINDENT .SH AUTHOR Dimitri Fontaine .SH COPYRIGHT 2022, Dimitri Fontaine .\" Generated by docutils manpage writer. .